[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Buffer-local variables affect general-purpose functions

From: Stephen J. Turnbull
Subject: Re: Buffer-local variables affect general-purpose functions
Date: Fri, 28 Mar 2014 12:38:10 +0900

Eli Zaretskii writes:

 > Paul seemed to say something more broad: that _all_ behaviors specific
 > to unibyte buffers should go away.  Do you agree?

Yes, please.  XEmacs has never had the unibyte hack with Mule, and
never has had much trouble with that.  It also has never had an
instance of the \201 bug since Mule was declared stable -- where Emacs
has had *many* regressions.  It's arguable that there are performance
implications, but simply aliasing the binary codec to latin1-unix has
*never* caused a bug in handling binary files -- all bugs are due to
autodetection errors, not the buffer representation.  I don't recall a
case where a programmer "did something stupid" with a character
function that technically is inappropriate for true binary (eg,
upcase) -- invariably they were doing something like upcasing all the
HTML tags as they came off the wire.  Ie, the stream was a binary
protocol where all of the syntax was represented with ASCII bytes, and
therefore "readable words".

If the performance implications bother you, then a buffer
representation like http://www.python.org/dev/peps/pep-0393/ may be
useful.  You could do that halfway, as well (ie, buffers containing
pure Latin1 text or binary text would be represented as a flat buffer
of bytes, buffers containing scalars >= 256 would be represented as
UTF-8b, or whatever the hack for representing undecodable bytes
currently is).

 > Anyway, what should replace those hacks?  Arbitrarily interpreting raw
 > bytes as Latin characters is not TRT, IMO.

Python has a bytes/character distinction, but they have completely
separate implementations.  Emacs doesn't need that, unless you want to
compete with the P-languages as a web framework platform.  OTOH Emacs'
unibyte buffer toggle is a design bug, pure and simple, and it should
be backed up against a wall and immersed in insecticide.

If you stick to the interpretation that bytes contain non-negative
integers less than 256, you won't have a problem in practice if you
think them as the first 256 Unicode characters, but choose not to use
functions that make sense only with characters.  Python actually
implements many polymorphic functions (ie, they can be interpreted as
bytes->bytes or characters->characters, etc) by converting bytes to
characters as Latin-1, then using the character implementation of the

reply via email to

[Prev in Thread] Current Thread [Next in Thread]