[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: more advanced bytevector => supervectors

From: lloda
Subject: Re: more advanced bytevector => supervectors
Date: Sat, 11 Sep 2021 20:21:27 +0200

A problem is that Guile doesn't really provide a god set of fast rank 1 ops. None of them have strides!=1 for example (this is ok for regular vectors, but it hurts for general arrays), and some are missing start/end or you have to write wrappers yourself, like for the typed vectors (other than u8). So in some cases you have to do the loop in Scheme. That's fine when the body of the loop is Scheme ops but if it's something like copy or fill it really hurts compared to C.

On 11 Sep 2021, at 19:03, Stefan Israelsson Tampe <> wrote:

I did some test ands wingo's superb compiler is about equally fast for a hand made scheme loop as the automatic dispatch for getter and setter. It e.g. can copy from 
e.g. u8 to i16 in about 100 op's / second using native byte order. However compiling it in C lead to nasty 2 Go ops / second. So for these kind of patterns
it is still better to work in C as it probaly vectorises the operation quite well. Supervectors supports pushing busy loops to C very well and I will probably 
enable fast C code for some simple utility ops.

On Wed, Sep 8, 2021 at 9:18 AM lloda <> wrote:

On 8 Sep 2021, at 04:04, Stefan Israelsson Tampe <> wrote:


So using get-setter typically means
((get-setter #f bin1 #f 
   (lambda (set) (set v 2 val)))

   #:is-endian 'little          ;; only consider little endian setters like I know 
   #:is-unsigned  #t         ;; only use unsigned
   #:is-integer      #t         ;; only use integer representations
   #:is-fixed          #t        ;; do not use the scm value vector versions
So a version where we only consider handling nonegative integers of up to 64bit. The gain is faster compilation as this ideom will dispatch
between 4 different versions of the the loop lambda and the compiler could inline all of them or be able to detect the one that are used and hot compile that version
(a feature we do not have yet in guile) now whe you select between a ref and a set you will similarly end up with 4*4 versions = 16 different loops that. full versions
is very large and a double loop with all featurs consists of (2*2 + 3*2*2*2 + 4 + 1)**2 = 33*33 ~ 1000 versions of the loop which is crazy if we should expand the loop
for all cases in the compilation. Now guile would just use a functional approach and not expand the loop everywhere. We will have parameterised versions of
libraries so that one can select which versions to compile for. for example the general functions that performs transform form one supervector to another is a general
ideom that would use the full dispatc which is not practical, 

I'm curious where you're going with this.

I implemented something similar (iiuc) in, specifically , where the lookup/set methods are inlined in the loop. The compilation times indeed grow exponentially so I'm forced to have a default 'generic' case. 

The idea for fixing this was to have some kind of run time compilation cache so only a fixed number of type combinations that actually get used would be compiled, instead of the tensor product of all types. But I haven't figured out, or actually tried to do that yet.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]