axiom-developer
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Axiom-developer] Re: lisp speedups


From: Waldek Hebisch
Subject: [Axiom-developer] Re: lisp speedups
Date: Tue, 20 Feb 2007 04:51:21 +0100 (CET)

> Another technique that significantly improves the speed of lisp
> code is the use of declarations. In general, a function call in 
> lisp has to have a case statement of the possible types of each
> argument and needs to handle the possible types. However, if you
> tell the compiler what the argument types are and the return types
> are you can get significantly faster code. The best way to illustrate
> this is to use the DISASSEMBLE function which will show you the code
> that gets laid down by the lisp compiler. CMUCL and SBCL are very
> good at optimizing code. GCL also does an excellent job. That's what
> the .fn files are for in Axiom.
>

The program I posted uses declarations.  AFAICS sbcl has enough 
information about types, from the declarations I gave it can infer
the rest (in my experiance gcl needs much more declarations).
When I wrote that the code could be faster I examined also output
of dissasemble.  The following snippet:

       normal-start
         (incf pos)
         (incf line-number)
         (if (>= pos end-buff)
             (return-from scan-for-chunks))
         (setf code (aref buff pos))
         (if (eql code start-tag-code-1)
             (go chunk-start-tag-1))
         (if (eql code newline-code)
             (go normal-start))
         (go normal)

corresponds to the following assembly code:

;     2AF4: L1:   488B4DD0         MOV RCX, [RBP-48]
;     2AF8:       4883C108         ADD RCX, 8
;     2AFC:       48894DD0         MOV [RBP-48], RCX
;     2B00:       488B55D0         MOV RDX, [RBP-48]
;     2B04:       488B45C8         MOV RAX, [RBP-56]
;     2B08:       488B48F9         MOV RCX, [RAX-7]
;     2B0C:       488B45C8         MOV RAX, [RBP-56]
;     2B10:       4839D1           CMP RCX, RDX
;     2B13:       0F86ED0B0000     JBE L48
;     2B19:       48C1FA03         SAR RDX, 3
;     2B1D:       488B45C8         MOV RAX, [RBP-56]
;     2B21:       480FB64C1001     MOVZX RCX, BYTE PTR [RAX+RDX+1]
;     2B27:       48C1E103         SHL RCX, 3
;     2B2B:       4883F950         CMP RCX, 80
;     2B2F:       0F840F090000     JEQ L34
;     2B35:       EBBD             JMP L1

Why this code is far from optimal?  First, sbcl failed to allocate
variables to registers.  FYI I use AMD64 machine and sbcl in principle
can use 14 general purpose registers (there is 16 registers, but two,
stack pointer RSP and base pointer RBP are used bu sbcl).  The
'scan-for-chunks' function has 11 variables, so it does not look very
hard to allocate _all_ variables into registers.  But apparently sbcl
not only failed to keep variables in registers, it also reloads value
which is still available in register.  Second, there are useless shifts
-- variable code is always used in comparison with (constant) integers
and pos is mostly used in integer operations (it is also stored in
lists).

As a compiler writer I can understand why sbcl is unable to produce
better code.  But I would not call it excellent.
 
-- 
                              Waldek Hebisch
address@hidden 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]