[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wip-rtl return location
Stefan Israelsson Tampe
Re: wip-rtl return location
Fri, 3 Aug 2012 13:54:02 +0200
Hi interesting thoughts Mark,
The below refere to to (very) simplistic vm compiler I'm working on right now.
The current overhead found in function call's, function setup and
function return means that it's hard to factor in the lowest level
of optimizations. I view the call overhead so large that function call's is
dispatched to gosub routines and not inlined. E.g. it is very close in spirit
to a VM call. On the other hand the compiler can be improved and and some
point in the future function call's might be so fast (especially for functions where
we can prove features) so that your ideas can be a real boon. I will try to do my
best though to implement some of your ideas at a second rewrite of the compiler.
but at the first step I care to make sure to inline for example + - ash etc. so that
they are fast on fixnums, that the branching is done natively list processing is
fast and a large enough set of translations of VM operations is ready so that we
can translate most of the code in guile. This can increase the speed at the first step
with a factor of maybe 3-5. We can then continue to work from there.
I would also like to say that the current rtl call overhead is less then the old
stable-2.0 versions so plain rtl VM will be faster in this respect.
Also to note is that, by the nature of the new VM, a simple compilation might yield
less of an advantage then the stable-2.0 VM. The reason is that it looks like many
operations in the RTL VM does more things per operations - a boon for it's speed
because those things will mean that we don't gain as much on a native compilation
of the RTL VM as of the stable-2.0 VM.
Could we not implement this logic in the call instructions?
On Fri, Aug 3, 2012 at 4:29 AM, Mark H Weaver <address@hidden>
Hi Andy, thanks for the update! Exciting times for Guile :)
I wonder if it might be better to avoid this branch misprediction by always returning to the same address. Upon return, a special register would contain N-1, where N is the number of return values. The first few return values would also be stored in registers (hopefully at least two), and if necessary the remaining values would be stored elsewhere, perhaps on the stack or in a list or vector pointed to by another register.
On 08/02/2012 10:29 AM, Andy Wingo wrote:
Instead I'd rather just use Dybvig's suggestion: every call instruction
is preceded by an MV return address. For e.g. (values (f)), calling `f'
So the overhead of multiple values in the normal single-value case is
one jump per call. When we do native compilation, this cost will be
negligible. OTOH for MV returns, we return to a different address than
the one on the stack, which will cause a branch misprediction (google
"return stack buffers" for more info).
In the common case where a given call site expects a small constant number of return values, the compiler could emit a statically-predicted conditional branch to verify that N-1 is the expected value (usually zero), and then generate code that expects to find the return values in the appropriate registers.
On some architectures, it might also make sense for the callee to set the processor's "zero?" condition code as if N-1 had been tested, to allow for a shorter check in the common single-value case.
Of course, the calling convention can be chosen independently for each instruction set architecture / ABI.
What do you think?