[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] Improvement of the hash table in GNU go

From: Gunnar Farneback
Subject: Re: [gnugo-devel] Improvement of the hash table in GNU go
Date: Wed, 05 Feb 2003 19:47:49 +0100
User-agent: EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/20.7 (sparc-sun-solaris2.7) (with unibyte mode)

Inge wrote:
> 1. We have to split the memory available to us into an area with
>    positions and another one with read results.  One of these will run
>    out before the other one does, which will waste memory.  We can't
>    know in advance which will actually run out first.

This is probably not a very big waste. It just looks ugly.

> 2. The pointers that we use to link the read results together waste
>    memory.

We still need some solution to handle table collisions, where linked
lists is one possibility. The new scheme allows more options though.

> 4. To remedy the lack of a replacement scheme we currently have to
>    call hashtable_partially_clear() when the table gets full.  This
>    function clears the hash table of "inexpensive" entries (tactical
>    read results) and keeps the expensive ones - owl reading and (I
>    think) semeai reading.  However, in the process we throw away a lot
>    of valuable information.  There is a also a problem with "open
>    nodes", that are nodes that represent reading that is being
>    performed at the time the cleaning is run.  These open nodes
>    complicate things.

We would still need to be able to handle open nodes.

> What I propose is a very simple but powerful change: We introduce a
> new table of random number for each reading function that we want to
> cache the result of.  Then we use these tables to create a unique hash
> value for a combination of position and read function on a certain
> target.  That way we get a single type of entries and can get rid of a
> lot of the current complications.
> Take for example attack().  If we call attack(A1), and want to find
> out if the result of this reading is in the table we do the following:
> Take the current hash value for the position and xor it with
> attack_table[A1] (pardon the pseudo code here).  The value that we get
> can be immediately used to look up the entry in the hash table and see
> if it is there.  We no longer have to split things into positions and
> read results.

Immediately used? You would still need to map it to the size of the
actual table and handle collisions with some scheme.

> The good thing about this is that we eliminate all of the weaknesses
> above.  We get better memory usage, we can use some nice replacement
> scheme, and we don't have to split the table into 2 parts.

I think the main attraction of this scheme is that it can make the
code simpler, and possibly faster. Better use of memory is certainly
nice, but generally speaking the size of the cache doesn't seem to be
very critical at all currently.

Arend wrote:
> I can see advantages in your approach. One thing you should look out for
> is that hash collisions get more likely, so maybe we should then switch
> to 96 or 128 bit hashing (of course you need to check whether we still
> save memory then).

My intuition is that this scheme wouldn't significantly increase the
risk for collisions, but I wouldn't mind being convinced by some
actual calculations.

> If you want to do a score-based replacement scheme as in the persistent
> caching, I think this is far too expensive. It seems to become a
> bottleneck for the persistent reading caches, once its size gets
> increased to 1000 (which would otherwise be a big gain).

There are probably much less expensive schemes around.

> (The replacement algorithm in persistent.c is clearly O(cache size).
> Of course the same is true for the lookup.)

We should improve those algorithms some day.

> I think we should just store owl results in a separate hash table (which
> can have size of 1% of the full hash table and still have more than
> enough space for all owl results). Then if we run out of spaces in the
> hash table, we can just completely clear the bigger hash table.

Splitting them up is reasonable, but not necessarily simpler in the

> (To avoid the problem of "open nodes" we can do this _before_ we start
> a new tactical reading tree, once the cache is 95% full.)

I don't believe in this idea. For some cache size and some reading
complexity (think high levels) the cache will overflow anyway and if
we can't clear it on the fly the performance hit will be severe. The
partial clearing scheme for the current cache was a major improvement,
practically eliminating all cache related problems we were seeing at
the time.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]