[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers

From: Eli Zaretskii
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 11:24:05 +0300

> From: David Kastrup <address@hidden>
> Cc: address@hidden
> Date: Sat, 29 Mar 2014 08:23:33 +0100
> Eli Zaretskii <address@hidden> writes:
> >> From: David Kastrup <address@hidden>
> >> Cc: address@hidden
> >> Date: Fri, 28 Mar 2014 20:25:17 +0100
> >> 
> >> >> > Then what do you call a buffer whose "text" is encoded?
> >> >> 
> >> >> I can't speak for Stephen, of course, but my impression was he would
> >> >> call it "a bad idea".
> >> >
> >> > Then what other ideas to use when Lisp code needs to encode or decode
> >> > text manually?
> >> 
> >> Redecode right to a "binary" coding system would be my guess.
> >
> > Sorry, I don't follow.  Can you tell more what that means?
> It means a buffer where each _character_ has the same value that the
> no-longer-available unibyte buffer would have in its bytes/characters.

This doesn't seem to be a complete description of what is suggested.
E.g., just by looking at the values of characters, it is impossible to
distinguish between Latin characters below 256 and raw bytes.  In a
unibyte buffer, we know how to make that distinction, but if there are
no unibyte buffers, something else is needed for doing that.

> > The situation I was describing is that I need to do something with
> > undecoded bytes before decoding them, or after encoding them.
> You can do that whether or not the conceptual array of 0..255 characters
> is internally encoded in unibyte or multibyte encodings.

What do you mean by "multibyte encodings" in this context?  Are you
suggesting to store the bytes 128..255 as Latin-1 characters,
i.e. using the 2-byte UTF-8 sequences of the corresponding Latin
characters?  Or are you suggesting something else?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]