[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unibyte characters, strings, and buffers
From: |
Eli Zaretskii |
Subject: |
Re: Unibyte characters, strings, and buffers |
Date: |
Sat, 29 Mar 2014 12:25:43 +0300 |
> From: David Kastrup <address@hidden>
> Cc: address@hidden
> Date: Sat, 29 Mar 2014 09:40:03 +0100
>
> >> It means a buffer where each _character_ has the same value that the
> >> no-longer-available unibyte buffer would have in its bytes/characters.
> >
> > This doesn't seem to be a complete description of what is suggested.
> > E.g., just by looking at the values of characters, it is impossible to
> > distinguish between Latin characters below 256 and raw bytes. In a
> > unibyte buffer, we know how to make that distinction,
>
> Uh, what? The point of a unibyte buffer is that it does not make the
> distinction.
Yes, it does: it treats every character as a raw byte. So the dilemma
is resolved there by definition. How to do that without unibyte
buffers remains to be defined, otherwise plans to remove unibyte
buffers are impractical.
> > but if there are no unibyte buffers, something else is needed for
> > doing that.
>
> >> You can do that whether or not the conceptual array of 0..255 characters
> >> is internally encoded in unibyte or multibyte encodings.
> >
> > What do you mean by "multibyte encodings" in this context? Are you
> > suggesting to store the bytes 128..255 as Latin-1 characters,
> > i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
> > characters?
>
> That would make the most sense, yes.
Then the above distinction is impossible, and all kinds of subtly
incorrect behaviors creep in.
> > Or are you suggesting something else?
>
> You could also use the "raw byte" character encodings we use for not
> losing information when reading not properly formed utf-8 files into a
> multibyte buffer, but that seems less practical when working with the
> character codes.
Why less practical?
- Re: Unibyte characters, strings, and buffers, (continued)
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Stephen J. Turnbull, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/28
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/28
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, David Kastrup, 2014/03/29
- Re: Unibyte characters, strings, and buffers,
Eli Zaretskii <=
- Re: Unibyte characters, strings, and buffers, Stefan Monnier, 2014/03/28
- Re: Unibyte characters, strings, and buffers, Stephen J. Turnbull, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Andreas Schwab, 2014/03/29
- Re: Unibyte characters, strings, and buffers, Eli Zaretskii, 2014/03/29