[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers

From: Stephen J. Turnbull
Subject: Re: Unibyte characters, strings, and buffers
Date: Sun, 30 Mar 2014 01:28:21 +0900

David Kastrup writes:

 > That's not what unibyte buffers are for.  They are for byte
 > streams, not characters.  You would not want to edit a unibyte
 > buffer, for example, by inserting text and stuff.

I beg to differ.  I would like to edit RFC 822 headers for HTTP, SMTP,
and other such wire protocols.  This is precisely the use case that
convinced van Rossum to restore %-formatting for bytes in Python 3.5
(to be released in about 18 months).

 > We have that "extra metadata", it is the unibyte flag.

Yes, I know, but my point is that it should be purely for use of the
internal implementation, and probably restricted to the C level.

 > But I consider it a mistake to use it for anything but "character
 > codes in this buffer happen to range from 0..255 rather than
 > 0..1000000 or whatever".

I sympathize, though I think it's overkill for Emacs to have separate
bytes and text types visible at the Lisp level.  FWIW, that's a big
step toward the design approach taken by Python 3, which has both
bytes and text, but you can't mix them without an explicit encoding or
decoding step, and the internal encoding of text is not exposed to
Python functions at all.

 > And since Unicode 128..255 happens to be the latin-1 plane where the
 > latin-1 plane is defined as all, this will mean that the result will
 > behave like the latin-1 plane.

That's not necessarily true.  It just requires a slightly more complex
design, which would be appropriate for Emacsen (as compared to Python).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]