[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unibyte characters, strings, and buffers

From: David Kastrup
Subject: Re: Unibyte characters, strings, and buffers
Date: Fri, 28 Mar 2014 11:58:27 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux)

"Stephen J. Turnbull" <address@hidden> writes:

> I agree that having a way to represent "undecodable bytes" in a string
> or buffer is extremely convenient.  XEmacs's lack of this capability
> is surely a deficiency (Hi, David K!)

Doing this in an utf-8 based internal coding is somewhat doable by
employing non-utf-8 sequences.  Either using code points above the
Unicode code range (2^20 + something, requiring 4 bytes), or by using
non-minimal encodings (since the minimal ones are two bytes, requiring 3
bytes).  Either way, the size increases significantly.

> But this is a completely different issue from unibyte buffers.  Emacs
> doesn't need unibyte buffers to perform its work, and if they are
> desirable on the grounds of space or time efficiency, they should be
> opaque to Lisp.

Well, Emacs is more following the non-opaque philosophy (XEmacs, in
contrast, has even an opaque character type and several other ones).
That has the advantage that you can use all sorts of available tools as
long as they don't break.

It has the disadvantage that the question "what is the right behavior
for x?"  needs to be answered quite more often since you can't take the
"x does not apply to y anyway" route out as often.

>  > We cannot [...]
> No, I still disagree.

Sure, everything is actually "We cannot efficiently" rather than "We
cannot".  But we still changed buffer positions from byte counts (as in
early Emacs 20) to character counts.  Efficiency took a dive but the
alternatives were just too horrible API-wise.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]