Re: more advanced bytevector => supervectors

hi,

I think I am happe now with the API for binary operators,

To see where I am heading, consider the bitvector implementation. Here is e.g. how logxor is implemented

s1 logior s2 -> s2

(define (supervector-logxor! s1 s2)
(supervector-binary

;; The general case
logxor

;; The s1 part is a constant bin
(lambda (x)
(if (= x 0)
(values invert 0) ;; will invert the s2 part in case it is a full bin - fast
(values nothing 0) ;; will leace the s2 part non changes - fast

;; the s2 part is constant
(lambda (x)
(if (= x 0)
(values invert 0) ;; if s1 is ro? refere to it with negation if referable (fast)
(values nothing 0))) ;; replace s2 with s1 if referable (fast)

;; Both s1 and s2 are constant and s2 is a complete bin
logxor

s1
s2))

We have the following controls

(values add val) : will add the same val to all elements in the bin e.g. maybe stored in the bin

(values scale val) : will scale the same vale to all elements in the bin

(values nothing _) : will do nothing with the bin

(values replace val) : set the whole bin to val

(values invert _) : take lognot to the whole bin

(values negate _) : take not to the whole bin

With a good alignement of the bins and ability to reference bins if s2 is the constant part,

the code will successfully produce a fast application of the binary operators. This means that

clustered sparse bitvectors are well modeled in this system and for those vectors. especially

for advanced matching constructs with very large numbers of clauses it is possible to make the

matching fast and also the amount of memory needed to store the matcher is small.

We have a similar system for algebraic manipulation and for logical operations.

logxor is not in the current setup not applicated as a fast op however, but it is sent as a function into the

construct in higher order manner. This means that busy loops for fat vectors are a bit slow. When things

are debugged I will move over to macro versions which allow the busy loop to essentially be as fast as if we

did a logxor in a single loop of a bytevector which you can get at 100-200Mops per second. This can go much

faster like 1-2G ops per second if we push it to C.

On Thu, Sep 2, 2021 at 5:45 PM Stefan Israelsson Tampe <stefan.itampe@gmail.com> wrote:

Hi guilers!

My next project is to explore a more advanced bytevector structure than today's bytevectors.
I think that having this infrastructure in guile and core code taken advantage of it like having strings otop of it and allowing our string library to use those (the difficult case is probably to get regexps working properly)

Anyhow this is my initial comment in the code:

#|
The idea of this data structure is to note that when we employ full
large datastructure scans we can allow for a much more rich and featurefull
datastructure then a simple bytevector. The reason is that we can divide
the data in byte chunks and spend most of the time scanning copying maping
those areas with usual methis, even optimized C - code taken advantage of advanced cpu opcodes are possible here. ANd by dividing it in chunks we get a lot of new features with essentiually no cose with more than complexity which we manage mostly in scheme. We gain many things,

1. Many new features that can vary on the pages

2. less memory hogs as
a. it can use copy ion write semantics
b. it does not need to fins 1GB continuous blocks

3. potentially faster operations as we can fast foorward the zeroe on write
pages compared to pure bytevectors

4. we can allow for swaping and refferential datastructures to speed up copying
even further

5. can get better fiber features of C programs that spends seconds or minutes on
performing an operation because it will just spend a microsecond or such
in C-land and then swap back to Scheme. CTRL-C will also work nicely.

6. We could potentially build a string library untop of these datastructures and
also we could add features to pages that lead to a much richer interface.

7. resizing is much faster end efficient

8. reversing is super quick

9. queues and stacks of byte data can have a very efficient implementations

Drawback:
1. More complex as we need to consider boudaries
2. Slower one off operations like bytevector-u8-get as guile compiles the
core operatoins to quite effective JIT CPU encodings. But maybe we can
disign some caching to make those operations much faster and even have
suporting JIT operations.

|#

WDYT ?

From:	Stefan Israelsson Tampe
Subject:	Re: more advanced bytevector => supervectors
Date:	Wed, 15 Sep 2021 02:12:32 +0200