guile-devel
[Top][All Lists]

## super-bit-vectors

 From: Stefan Israelsson Tampe Subject: super-bit-vectors Date: Tue, 7 Sep 2021 21:26:05 +0200

Hi

Maybe you will not know it but bit vectors are sometimes a very good match to represent sets where the world is defined from the beginning. a world of 64 objects can represent subsets of this world as a uint64_t in other words the unit our cpu's are so darn fast at managing.  So whenever you plan to spend time on having sets of a moderate set of enums an huge optimisation is to not use a hashmap, but n stead use a bitvector. And as you know, guile ave arbitrary big integers so a world of 128 elements is just a  quite small bignum. Now this does not scale to millions where hashmaps starts to excel, but many times you do not need scale.
Now what about our favorite set operations you may ask? well we have a translation here

set complement = lognot
set union            = logior
set intersection  = logand

And the processor have ops for even some other set operations like finding the first nonzero bit
in order to loop through the elements in the set that are 1 (this would be a good addition
to guile).

In guile-log I have implemented a very featureful prolog matcher on steroids where the bulk of the work is to manage the set of matching elements, where all set operations are used. And is hence super fast for typical implemented predicates as this includes typically have less than 64 cases.  So if the number of match cases is below 1000 we typically just keep a bigint to represent the set else we move over to hashmaps. Now un-fourtunately in guile, we do not have self modifying bitwise operators so in guile-log I hack guile i C to have those ops to not waste
a lot of memory and time for larger predicates with bigger bit vectors.

This is why I hope that guile will keep having big sized immediate integers as  copying is so cheap and fast (creating new numbers on the stack).

When working with supervectors I realized that a functional approach is sort of viable as we can share bits between different read only structures and even use copy on write if we would
like to mutate.
Using  a supervector approach to bitvectors make sense if we would like to move to higher sizes as well and is much preferable than using hashmaps. we would represent the sets by trees instead which spervectors essentially are. So I realized that there is a good use case for these kinds of data structures in a supervector context and hence I am working on this atm.

Kind of cool is that I keep a bit representing if the bin is negated or not, which means that we can optimize essentially all set operations of the type sparse-bitvector Op fat-bitvector her is how it goes.

lognot  -> just iterate through the bins and change the invert bit
1 or Bitvector    => 1
0 or Bitvector    => Bitvector

1 and Bitvector => Bitvector
0 and Bitvector => 0
1 xor  Bitvector => lognot(Bitvectoe)
0 xor  Bitvector => Bitvector

cool right. Now the code in stis-supervectors can handle all cases, if the bins are not aligned and try to be smart about it (can treeify a datastructure to match bins) so it is a more complex story here, but in the simplest setup we just have a vector of bins with all bins the same size and we are good to go. A hint is to use binsizes of (ash 1 x), x a number so that we enable the easiest way to match bins else it can be really fragmented.

A final note:
Why on earth does not guile's bitvectors allow more bitwise operations, looks qiuite a bit of underpowered to me.