bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70988: (read FUNCTION) uses Latin-1 [PATCH]


From: Pip Cet
Subject: bug#70988: (read FUNCTION) uses Latin-1 [PATCH]
Date: Thu, 13 Feb 2025 10:08:54 +0000

"Eli Zaretskii" <eliz@gnu.org> writes:

>> Date: Wed, 12 Feb 2025 20:27:58 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: stefankangas@gmail.com, mattias.engdegard@gmail.com, 
>> 70988@debbugs.gnu.org, monnier@iro.umontreal.ca
>>
>> "Eli Zaretskii" <eliz@gnu.org> writes:
>>
>> >> --- a/src/lread.c
>> >> +++ b/src/lread.c
>> >> @@ -398,9 +398,12 @@ readchar (Lisp_Object readcharfun, bool *multibyte)
>> >>
>> >>    tem = call0 (readcharfun);
>> >>
>> >> -  if (NILP (tem))
>> >> +  if (!CHARACTERP (tem))
>> >>      return -1;
>> >> -  return XFIXNUM (tem);
>> >> +  if (multibyte && !ASCII_CHAR_P (XFIXNAT (tem)))
>> >> +    *multibyte = true;
>> >> +
>> >> +  return XFIXNAT (tem);
>> >
>> > AFAIU, the proposed patch was just a bugfix, whereas the above also
>> > changes behavior in backward-incompatible ways.
>>
>> The other way around, I think: the first proposed patch changed the
>> behavior of readchar to always set the multibyte flag when a function
>> was used, resulting in the creation of symbols whose ASCII names are
>> multibyte strings.  The previous behavior was never to set the multibyte
>> flag, which was correct for ASCII strings but not multibyte ones.
>>
>> This patch retains the previous behavior for ASCII symbols, but sets the
>> multibyte flag for non-ASCII symbols, which seems the best we can do if
>> we're given a simple function.
>
> I'm talking about the CHARACTERP test (why not FIXNUMP?), and the

The function is supposed to return a character, not just any fixnum.

> addition of ASCII_CHAR_P test (why would we want an ASCII character
> to never be considered multibyte?).

It's the other way around, again: if there's a non-ASCII character, we
treat the stream as multibyte; if there are ONLY ASCII characters, we
treat it as unibyte.

>> If we want to change symbol names to always be multibyte strings, we can
>> do that, but then we probably want to do that or all streams.
>
> I don't understand why you are talking about symbols: AFAIU this code
> is used in many other cases as well.  But even for symbols: why change
> the current behavior of making their names multibyte?

The current behavior is to make their names unibyte!  The current
behavior is *changed* by the first patch, and *retained* by my patch.

>> It also fixes yet another XFIXNUM crash, but those (there are more in
>> lread.c, it seems) should be fixed independently.
>
> I'm okay with adding a FIXNUMP test (which happens in the debugging
> builds anyway, so any violations probably never happen), but using
> CHARACTERP changes behavior.

If you count "avoids further crashes" as "changes behavior", yes.

readcharfun is supposed to return a character or -1.  Some callers
assume the return value is a valid character, and will crash otherwise.
I haven't checked all of them because there are many.

>> However, it does give us the ability to extend the API so
>> readcharfun could return a single character string, unibyte or
>> multibyte, to be handled appropriately.
>
> This is also a change in behavior.

Yes, of course, which is why it's a separate proposal and not part of
the patch.

Pip






reply via email to

[Prev in Thread] Current Thread [Next in Thread]