[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: master 02bca34: Utilize new string decoding feature in GTK native in
From: |
Eli Zaretskii |
Subject: |
Re: master 02bca34: Utilize new string decoding feature in GTK native input |
Date: |
Sat, 19 Feb 2022 14:36:43 +0200 |
> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Sat, 19 Feb 2022 18:09:38 +0800
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Is this a good idea? Consing a string when we process input increases
> > GC pressure, and what issues does this change solve as a
> > counter-weight for that disadvantage? Is g_utf8_to_ucs4 a problematic
> > API or something?
>
> No, but some input method modules don't always return valid UTF-8 like
> they're supposed to, thereby causing crashes in g_utf8_to_ucs4_fast.
>
> I should have explained that in the commit message.
You can still explain that in a comment to the code.
> > But in general, decoding UTF-8 encoded C string is better done without
> > consing a string and then using the coding.c stuff. After all, if the
> > original string is 100% guaranteed to be in UTF-8, the decoding is
> > almost trivial.
>
> It's supposedly guaranteed, but some input method modules break that
> guarantee.
And what do we want to do with those invalid UTF-8 sequences? The way
you did it will produce raw bytes for them -- is that really TRT in
this case?
In any case, at the very least consider using decode_string_utf_8
instead of consing a Lisp string and then using the "usual" decoding
stuff -- decode_string_utf_8 will cons only one string.