bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: potential rbtree optimizations


From: Bruno Haible
Subject: Re: potential rbtree optimizations
Date: Sat, 9 May 2009 01:18:18 +0200
User-agent: KMail/1.9.9

Eric Blake wrote:
> http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00419.html
> 
> with some interesting ideas for speeding up iteration of an RB tree, either 
> at 
> the expense of two additional pointers (fastest speed, good cache usage, but 
> large memory penalty), or with the addition of two bits (crammed in the 
> padding 
> adjacent to the red-black color indicator) which control whether existing 
> pointers are tree relations vs. threaded links (no change in memory, faster 
> iteration than current implementation, but slower rebalancing).

It's a nice idea, well explained in [1], which IMO applies when iterating
through the tree as a sequence is the major operation.

The data structures in gnulib are not tied to a particular application. That
is, insertion is just as much optimized as deletion and traversal. By changing
the code, you can always make one operation faster at the expense of some
other operations.

> Would it be worth adding an additional gl_list RB-tree implementation that 
> uses 
> threading, such that it would be easy to for clients to experiment between 
> implementations and determine whether they care more about memory and 
> insertion 
> speed, vs. better iteration speed?

Anyone can add additional variants to the gnulib gl_list or gl_oset types.
I have no objection, but I won't spend time on doing endless variants of the
same thing. One could also implement Treap trees or some other variants [2].

But what I think would be more effective in order to increase locality of
memory references and reduce the balancing effort would be to store several
data entries (say, up to 4 or 8) in a single node of the balanced tree.
This means, the data structure would be a balanced tree of small arrays.
Binary search would be used inside the small arrays.

A specialization of this data structure (with more data per node) would
be a string that supports insertion and removal at any point in time O(log N).
Similar to the Hans Boehm's "cords" [3].

Bruno

[1] http://www.stanford.edu/~blp/avl/libavl.html/Threads.html
[2] http://www.upgrade-cepis.org/issues/2004/5/up5-5Mosaic.pdf
[3] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc-7.1.tar.gz




reply via email to

[Prev in Thread] Current Thread [Next in Thread]