help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Solved] RE: Differences between identical strings in Emacs lisp


From: Eli Zaretskii
Subject: Re: [Solved] RE: Differences between identical strings in Emacs lisp
Date: Thu, 09 Apr 2015 15:45:06 +0300

> From: Jürgen Hartmann <juergen_hartmann_@hotmail.com>
> Date: Thu, 9 Apr 2015 12:38:43 +0200
> 
> > If this `insert' is performed inside a unibyte buffer, then this 160 is
> > instead taken to be a the code of a byte.  Again, regardless of the locale.
> 
> So this is comparable to the output of \xA0 in an unibyte string
> (e.g. in "\xA0\ A") in contrast to the same in a mutibyte string (e.g. in
> "\xA0 Ä"): The former yields the raw byte \240, the latter a no-break space.

Yes, Emacs tries to treat buffers and strings alike.

> I could imagine that the step from the equivalence char=byte to
> char=unicode code point (long(er) integer) is not so difficult.

The problem with this is that an encoded character could span several
bytes, and then how do you call each byte of such a multibyte
sequence?  You cannot call it a character.

> But we have in addition the UTF-8 representation.

If you mean the internal representation, then it's a superset of
UTF-8, not UTF-8.  If you mean the external encoding of text, then
UTF-8 is not the only representation, not even the only multibyte
representation.  There are others, mostly used in Far East, but not
only there.  Even UTF-16, used natively by MS-Windows, is technically
a multibyte representation.

> To what of the two latter--unicode code point (integer, several
> bytes long) or its UTF-8 representation (sequence of several bytes)
> does the term "multibyte" refer?

In the context of Emacs, it refers to the internal representation of
characters, which is a superset of UTF-8.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]