[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

thoughts on native code

From: Stefan Israelsson Tampe
Subject: thoughts on native code
Date: Sat, 10 Nov 2012 15:41:52 +0100

Hi all,

After talking with Mark Weaver about his view on native code, I have been pondering how to best model our needs.

I do have a framework now that translates almost all of the rtl vm directly to native code and it do shows a speed increase of say 4x compared to runing a rtl VM. I can also generate rtl code all the way from guile scheme right now so It's pretty easy to generate test cases. The problem that Mark point out to is that we need to take care to not blow the instructuction cache. This is not seen in these simple examples but we need larger code bases to test out what is actually true. What we can note though is that I expect the size of the code to blow up with a factor of around 10 compared to the instruction feed in the rtl code.

One interesting fact is that SBCL does fairly well by basically using the native instruction as the instruction flow to it's VM. For example if it can deduce that a + operation works with fixnums it simply compiles that as a function call to a general + routine e.g. it will do a long jump to the + routine, do the plus, and longjump back essentially dispatching general instructions like + * / etc, directly e.g. sbcl do have a virtual machine, it just don't to table lookup to do the dispatch, but function call's in stead. If you count longjumps this means that the number of jumps for these instructions are double that of using the original table lookup methods. But for calling functions and returning functions the number of longjumps are the same and moving local variables in place , jumping  is really fast.

Anyway, this method of dispatching would mean a fairly small footprint with respect to the direct assembler. Another big chunk of code that we can speedup without to much bloat in the instruction cache is the lookup of pairs, structs and arrays, the reason is that in many cases we can deduce at compilation so much that we do not need to check the type all the time but can safely lookup the needed infromation.

Now is this method fast? well, looking a the sbcl code for calculating 1+ 2 + 3 + 4 , (disassembling it) I see that it do uses the mechanism above, it manages to sum 150M terms in one second, that's quite a feat for a VM with no JIT. The same with the rtl VM is 65M.

Now, sbcl's compiler is quite matured and uses native registers quite well which explains one of the reasons why the speed. My point is though that we can model efficiently a VM by call's and using the native instructions and a instructions flow.

Regards Stefan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]