[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] ANN: new eggs "llrb-syntax" and "llrb-tree"

From: Jörg F. Wittenberger
Subject: [Chicken-users] ANN: new eggs "llrb-syntax" and "llrb-tree"
Date: Sat, 24 Oct 2015 20:57:38 +0200
User-agent: Mozilla/5.0 (X11; Linux armv7l; rv:31.0) Gecko/20100101 Icedove/31.7.0

Hi all,

attached the code of two new eggs.

llrb-syntax contains a syntax-rules implementation of left-leaning
red-black trees.  It expands code maintaining LLRB trees in a data
structure defined by the user.  Depending at the use case it will either
allocate temporary space (in the pure case) or allocate no temporary
space at all and use the minimum amount of mutations to maintain the
LLRB tree in place.

llrb-tree uses llrb-syntax to implement generic alternatives for
assoc-list drop-ins (having a much better lookup behavior) and srfi-69
hash table drop-ins.  Plus versions specialized for fixnums, strings and
symbols as key.

Both have a README, which should (hopefully) be ready to be pasted into
the wiki.  (Can't test the syntax, but hey, it's not that much I can
have done wrong.)

As I don't see the point in hosting my own infrastructure to publish
them (after all I assume that once I'm past the inevitable bugs an
shortcomings those will not see frequent updates by nature), I'd prefer
if those would eventually go into the CHICKEN projects repository.
(Where I only have r/o access so far.)


While preparing I learned something I want to share.  CHICKEN is always
good for a surprise.  And I'm astonished to say the least.  Something is
strange here.

1)  Intuitively I'd assume that conserving space to reduce the load on
the garbage collector should pay off.  Even for CHCIKEN, which is
supposed to be highly effective on temporary allocations.  The numbers
(included in "numbers.txt" in the tarball, though those should probably
not make it into svn) did prove me wrong.  The version specialized to
fixnum keys is roughly 4x slower when running allocation free (thus
loading the mutation stack) than the pure version.  The same I found to
be true for mutating versions having string-keys.

2) I never had the idea to actually use LLRB trees as a *replacement*
for hash tables.  After all LLRB maintains an order relation and is
according to theories and urban legends *supposed* to be slower.  See
yourself: in the fixnum case, llrb is almost twice as fast as the
corresponding hash table.  (1M inserts in 90 seconds for Hashtable vs.
46 seconds for LLRB; References are even faster 1M refs in 19sec. vs.
10sec. for LLRB.)

I'm astonished.

Even the LLRB build using generic = and < is almost a fast and much
faster than Hashtables.

3)  Things become slightly more confusing when taking symbols as key.
For those hash tables appear about 30% faster than LLRB on inserts
and about 5% faster on referencing.  (However randomization, which is
commented out in the attached source was still active during the test.
Maybe there's something hidden.)

>From memory: before I tested with smaller numbers of keys, which
resulted in the opposite finding: LLRB was found relative faster for
smaller numbers of keys.  After all it's O(log n), thus this is expected.

4) string->symbol puts #3 upside down:
string->symbol 1000000 calls in 111935.0 ms
caching the result in a LLRB-string tree naturally adds overhead on the
first call: str2sym 1000000 refs in 185748.0ms – almost twice.
Wait, what? *Almost* twice?  How comes?  Should have been around 230%
according to #3.

But on the second call string->symbol still takes those 112403.0ms,
while the LLRB-cached version needs only 16321.0ms.

I guess something is wrong in string->symbol.  So far I've been unable
to spot it.

But who would have predicted that a balanced tree could be sorta faster
than a hash table?  Me not.

Have a nice day!


PS: there's a tblbench.scm in the tarball as well, which produced those
numbers.  After all that's kind of a benchmark and if there's something
lying to me, then it's probably benchmark.

Attachment: llrb.tar.gz
Description: application/gzip

reply via email to

[Prev in Thread] Current Thread [Next in Thread]