[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers

From: David Kastrup
Subject: Re: Unibyte characters, strings, and buffers
Date: Sat, 29 Mar 2014 18:00:07 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

"Stephen J. Turnbull" <address@hidden> writes:

> David Kastrup writes:


>  > And since Unicode 128..255 happens to be the latin-1 plane where
>  > the latin-1 plane is defined as all, this will mean that the result
>  > will behave like the latin-1 plane.
> That's not necessarily true.

Sure.  It depends on whether you value your users' sanity.

> It just requires a slightly more complex design, which would be
> appropriate for Emacsen (as compared to Python).

If the "slightly more complexity" hits in unexpected places, it's going
to end up a liability.  Having more than one charset to work with if
characters themselves don't contain a charset specification is affecting
a load of stuff that can then conceivably work in more than one way.

Unicode meaningfully uses values 128..255, Bytes meaningfully use values
128..255.  When one wants to work without surprises in both cases,
converting strings to characters will use 128..255 in either case.

Differentiating is, of course, possible.  One reasonably cute choice
would be mapping bytes (as opposed to characters) 128..255 to integers
-128..-1.  But if you are talking about case-fold-search semantics,
you'll actually need to remap 0..127 as well (they are more relevant
than 128..255).  And then things get really ugly.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]