[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Release plans

From: Thomas Lord
Subject: Re: Release plans
Date: Wed, 27 Aug 2008 18:10:41 -0700
User-agent: Thunderbird (X11/20060808)

Alan Mackenzie wrote:
If you were to fix a bug, or start implementing a cool new feature - you
know, make some sort of contribution, no matter how small - it would be
really appreciated.

I'm working on a new implementation of strings and buffers,
including support for undo, markers, properties, and overlays.
That's going well.

If it finishes well, then I might make a new implementation of
interact loops and redisplay primitives, add a dynamic loader,
and call it a day.   (After which I might work on a dynamically
loadable extension language interpreter.)

There is a fork in the road coming up, though.  I started on the
strings and buffers project for a different purpose, originally:  I
want them for an extended XML parser with SAX and DOM
models.   After I get strings and buffers working I'll have to consider
what to work on next out of the set: XML foo, Emacs interact loop,
or Emacs-style redisplay.   I'm not sure which I'll pick.

The new strings and buffers implementation is inspired by
several needs including but not limited to:

1) to have such as a stand-alone C library
2) to be useful for a lisp implementation without requiring all
   the baggage of a lisp implementation
3) to fix some performance problems that gap buffers have
4) to handle unicode very cleanly
5) to make string-handling C code easier to get right

The string representation is based on splay trees: a string
is a splay tree of gap buffers.   There are various invariants
(that I won't get into here) that make these trees more than
ordinary splay trees -- they are a new data structure (afaict).

The string representation is (semantically) functional: strings
are never modified.     Instead, strings share state (i.e., share
sub-trees) and linear styles of programming are rewarded (e.g.,
sometimes new strings are constructed by cheaply modifying
existing strings *if* there are no other references left to the unmodified

Because strings are functional and share state, "undo" is pretty
trivial.   To snapshot the state of a buffer at some instant,
just make a copy of the string it contains.   That copy
will share state (so it is space efficient and cheap to create). That copy won't change (since the string representation is

The splay tree of gap buffers representation means that the
amortized cost of shifting characters from one side of the
conceptual gap to another (moving the point) is O(log n) (compared
to gap buffers which are O(n)).  At the
same time, the space inefficiency of the tree and time inefficiency
of scanning it are no worse than O(n) (in the length of the string) --
the same as gap buffers.   So the new data structure is at least
asymptotically faster.

Another interesting advantage of this string/buffer representation
is that it is very good at representing *sparse* buffers -- buffers that
are mostly just a string of a single character repeated, only in a few
places interrupted with others. As a consequence, these new strings/buffers
can be used for non-traditional purposes, such as huge sparse bitsets, or
large hash tables.   (A buffer as a hash table?    Roughly speaking: compute
an N-bit hash value, then go to that line of the buffer -- that line of the buffer contains the corresponding (possibly empty) hash bucket. This splay-of-gap-buffers thing can handle that quite efficiently -- O (log N) for the "goto step" with amortized cost much lower if access patterns display locality. It's not really *quite* that
simple for a fast hash table but it's close.)

This kind of project is very much Emacs related and, also, is slower
than just looking up a minor bug with a quick fix. I like to get some orientation
to the terrain along the way, hence the bursts of verbosity (which, really,
aren't *that* much in the overall ebb and flow).

Oh, the meat of the new string data structure looks like it will weigh in
far under 5K lines of code -- the whole buffer support perhaps doubling it.
It's shaping up as pretty tight code that is pretty and simple to follow (another
of the goals).   The main logic for editing primitives and splaying look to
come in under 1K lines of code -- maybe 2K.

Thinking of the possible follow-on projects of interact loop, redisplay, and
dynamic loader:

Interact loops are easy enough and I can certainly make a "stackless"
version (so, no more "recursive minibuffers" hair).

Redisplay is the intimidating one, still, because it just looks hopelessly
tedious.   Conceptually easy, sure, but tedious and requiring too much code.
I'm still thinking about that part.

The combination of the buffers, interact loop, redisplay and loader would
be "an Emacs", stripped of lisp, but ready to dynamically load a lisp
(or any other language) interpreter.

Since you asked so nicely,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]