[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unibyte characters, strings, and buffers
From: |
David Kastrup |
Subject: |
Re: Unibyte characters, strings, and buffers |
Date: |
Sat, 29 Mar 2014 09:40:03 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) |
Eli Zaretskii <address@hidden> writes:
>> From: David Kastrup <address@hidden>
>> Cc: address@hidden
>> Date: Sat, 29 Mar 2014 08:23:33 +0100
>>
>> Eli Zaretskii <address@hidden> writes:
>>
>> >> From: David Kastrup <address@hidden>
>> >> Cc: address@hidden
>> >> Date: Fri, 28 Mar 2014 20:25:17 +0100
>> >>
>> >> >> > Then what do you call a buffer whose "text" is encoded?
>> >> >>
>> >> >> I can't speak for Stephen, of course, but my impression was he would
>> >> >> call it "a bad idea".
>> >> >
>> >> > Then what other ideas to use when Lisp code needs to encode or decode
>> >> > text manually?
>> >>
>> >> Redecode right to a "binary" coding system would be my guess.
>> >
>> > Sorry, I don't follow. Can you tell more what that means?
>>
>> It means a buffer where each _character_ has the same value that the
>> no-longer-available unibyte buffer would have in its bytes/characters.
>
> This doesn't seem to be a complete description of what is suggested.
> E.g., just by looking at the values of characters, it is impossible to
> distinguish between Latin characters below 256 and raw bytes. In a
> unibyte buffer, we know how to make that distinction,
Uh, what? The point of a unibyte buffer is that it does not make the
distinction.
> but if there are no unibyte buffers, something else is needed for
> doing that.
>> You can do that whether or not the conceptual array of 0..255 characters
>> is internally encoded in unibyte or multibyte encodings.
>
> What do you mean by "multibyte encodings" in this context? Are you
> suggesting to store the bytes 128..255 as Latin-1 characters,
> i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
> characters?
That would make the most sense, yes.
> Or are you suggesting something else?
You could also use the "raw byte" character encodings we use for not
losing information when reading not properly formed utf-8 files into a
multibyte buffer, but that seems less practical when working with the
character codes.
--
David Kastrup
- Re: Unibyte characters, strings, and buffers, (continued)
- Re: Unibyte characters, strings, and buffers, Stephen J. Turnbull, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/28
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/28
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers,
David Kastrup <=
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Stefan Monnier, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Stephen J. Turnbull, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29