chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] "argvector" chicken (was: ABI woes)


From: Ivan Raikov
Subject: Re: [Chicken-hackers] "argvector" chicken (was: ABI woes)
Date: Tue, 21 Jul 2015 09:52:00 -0700

Hi Felix,

   If you are interested in further testing for potential performance
impact, the rb-tree and kd-tree libraries rely heavily on CPS calls
for tree traversal, and the test data size can be easily increased to
millions of elements. Unfortunately I won't have time to test this
until mid-August at the earliest.

  -Ivan


On Tue, Jul 21, 2015 at 9:28 AM,  <address@hidden> wrote:
> Hello!
>
> I have implemented an alternative approach for compiling CPS calls in CHICKEN
> to avoid the problem our current way of doing CPS calls.
>
> To recapitulate the situation: Apple uses a modification of the ARM64 ABI that
> ruthlessly punishes C code that assumes that non-vararg function calls are
> compatible with functions declared as varargs (and vice versa, actually even
> depending on the exact numbers of vararg/non-vararg arguments.)
>
> Previously, we passed fixed arguments via normal C function parameters, in the
> hope of generating faster code, as more arguments can be passed in registers
> (on machines that have sufficient registers, that is.) This worked for several
> years but C compilers have recently been starting to cut corners by exploiting
> anything not explicitly defined in the C standard. As we need generic function
> pointers (where the call-site may not know the exact type of function beeing
> called), something else needs to be done.
>
> The new approach passes all arguments in a stack-allocated C_word array. Since
> CPS calls never return, the array just gets popped after the next minor
> garbage collection. The advantage is that CPS calls become much simpler 
> (including
> the code that compiles this) as every CPS function is of type
>
>   void (func)(C_word c, C_word *av) [noreturn]
>
> The disadvantage is more allocation in the nursery. This doesn't increase GC
> time as such, because only live data is traced during a reclamation, but
> may increase the number of minor collections (the nursery fills up faster.)
>
> The system seems to work, I was able to run the tests-suite completely. I have
> not tested any other code so far. The performance is, surprisingly (and
> according to my experiments, which may be flawed), quite good. Actually not
> significantly slower and in some cases even faster. This is strange, and more
> real-world testing with long-running, heavily-allocating code may have
> different results. On the other hand CPS calls are much simpler, there is no
> need to use varargs (with a few small exceptions), the implementation of
> multiple values and argument-save/-restore is vastly simpler and the code is
> smaller, as "trampolines" (C functions that take arguments saved in a
> previously triggered GC and unpacks them, calling the original function again)
> can be completely dropped.
>
> There is even some room for more optimization: "av"s (argument vectors) may be
> reused from call to call (if the following call doesn't use more arguments as
> the current), or we could even use the same av for all calls (effectively 
> using
> global variables for call arguments). Also, multiple value handling could in
> some cases be inlined, I think, reducing the overhead of multiple value forms
> quite a lot (and they were quite slow with the old way of compiling stuff.)
>
> Some notable changes in the source code:
>
> - The "apply hack" is gone, completely.
>
> - The hackery for AMD64 is gone, as is the evil way we generate C_procXXX
>   types and the generic apply code in chicken.h/runtime.c.
>
> - The maximal number of arguments is limited by the "temporary stack". Note
>   that this is not fixed (and depends on temp-stack usage), and I had to 
> remove
>   some code in "apply-test.scm", as it assumed a fixed limit. The "official"
>   arg-limit is 2000 now.
>
> - I have pushed to branches: "argvector" and "argvector-bootstrap" (containing
>   only the changes in the C compiler backend.)
>
> - To compile it, you need a modified bootstrapping compiler. The simplest way
>   is to checkout "argvector-bootstrap", make a static "boot-chicken", checkout
>   "argvector", touch all *.scm files and recompile with the static 
> bootstrapping
>   compiler.
>
> Feedback is welcome. As this seems to run well, is not significantly slower,
> solves current and future ABI problems, and simplifies the runtime-system 
> quite
> a lot, I strongly recommend to consider changing CHICKENs code generation
> generally to use this approach. Porting this to CHICKEN 5 should be some work,
> but doable. The changes for hand-coded CPS functions (in runtime.c, which grew
> considerably in CHICKEN 5) are straightforward, but still need manual 
> adaption.
> I can help here, but would like to hear Peter's opinion about this, since he
> wrote the bignum code (the largest part of the changes in runtime.c.)
>
>
> felix
>
> _______________________________________________
> Chicken-hackers mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/chicken-hackers



reply via email to

[Prev in Thread] Current Thread [Next in Thread]