[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qexo-general] Re: TreeList implementation

From: Per Bothner
Subject: [Qexo-general] Re: TreeList implementation
Date: Tue, 24 Feb 2004 15:18:35 -0800
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.6) Gecko/20040113

Joseph Coffland wrote:

As far as I can tell inserting into the middle of a TreeList is not yet
implemented.  Is this correct?

Mostly.  The data structures have been designed to make that easy
and efficient, however the algorithms are all there.  Generally,
you should be able to insert data at the start of the buffer gap,
and that should mostly work.  Code to move the gap isn't really

Another issue is node ids.  Currently a node id is an index into
the buffer (subtracting the gap).  If we support insertaions and
deletions, than you have to use a mechanism like StableVector,
and you have to be more careful about when node ids are released
and reused.

If so the buffer gap is kind of useless.

Hardly.  It's an enabling technlogy.

There is usually a trade off between space, time and possibly simplicity.
TreeList seems to lean towards conserving space at the cost of time and
simplicity.  What do you think?

TreeList also saves times, depending on what you're doing.  In modern
PCs saving time is all about memory locality, and TreeList does will
there.  (Anything that does not cause a cache miss is free.)
The parsimoneous use of pointers saves GC time as well as object
allocation time.

Most PCs have more space than time and
there is a lot to be said for simplicity.

True.  But we want to be able to handle lange XML datasets.
A standard DOM will cause us to run out of memory or start paging
on much smaller XML files than TreeList can handle.

I don't want to rewrite your code, but I think I could reimplement
TreeList as a linked list with insertion, deletion, Consumer and
Comsumable quicker than I could implement insertion and deletion in the
existing TreeList.

Than we might as use a standard DOM implementation.

I could also easily add type information.  It would
be interesting to compare the performance of the two.

I'd like to be able to handle large (100MB) files.

TreeList certainly will lose for certain operations.  It can be
tweaked to add extra relative "pointers" (like element nodes
point to their parent as a relative offset), using some extra
space and complexity, but speeding up some application.  I don't
know the tradeoff.

Xalan-J has something they call the Document Table Model.
See http://xml.apache.org/xalan-j/dtm.html
It uses more links (represented as ints), so traversal should
be faster and simpler, but it uses more space.  And it doesn't
really support insertions or deletions efficiently, as far as
I can tell.

I sometimes do wonder whether TreeList is overly complicated
or has made the right tradeoffs.  But I'm fairly sure a conventional
linked DOM-style implementation is not the right choice, at
least in terms of performance.  I think the basic TreeList
idea is good, but I'm less sure of the details.
        --Per Bothner
address@hidden   http://per.bothner.com/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]