[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Inadequate documentation of silly characters on screen.

From: Stephen J. Turnbull
Subject: Re: Inadequate documentation of silly characters on screen.
Date: Fri, 20 Nov 2009 12:37:13 +0900

Miles Bader writes:
 > Stefan Monnier <address@hidden> writes:
 > > many strings start as unibyte even though they really should start
 > > right away as multibyte.
 > That seems the fundamental problem here.
 > It seems better to make unibyte strings something that can only be
 > created with some explicit operation.

I don't see why you *need* them at all.  Both pre-Emacs-integration
Mule and XEmacs do fine with a multibyte representation for binary.
Nobody has complained about performance of stream operations since
Kyle Jones and Hrvoje Niksic bitched and we did some measurements in
1998 or so.  It turns out that (as you'd expect) multibyte stream
operations (except Boyer-Moore, which takes no performance hit :-) are
about 50% slower because the representation is about 50% bigger.  But
this is rarely noticable to users.  The noticable performance problems
turned out to be a problem with Unix interfaces, not multibyte.

The performance problem is in array operations, since (without
caching) finding a particular character position is O(position).

If you want to turn Emacs into an engine for general network
programming and the like, yes, it would be good to have a separate
unibyte type.  This is what Python does, but Emacs would not have to
go through the agony of switching from a unibyte representation for
human-readable text to a multibyte representation the way Python does
for Python 3.  In that case, Emacs should not create them without an
explicit operation, and there should be a separate notation such as
#b"this is a unibyte string" (although #b may already be taken?) for

reply via email to

[Prev in Thread] Current Thread [Next in Thread]