[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Register VM WIP

From: Andy Wingo
Subject: Re: Register VM WIP
Date: Wed, 16 May 2012 16:58:24 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)


On Wed 16 May 2012 15:44, Mark H Weaver <address@hidden> writes:

>> The design of the wip-rtl VM is to allow 16M registers (24-bit
>> addressing).  However many instructions can just address 2**8 registers
>> (8-bit addressing) or 2**12 registers (12-bit addressing).  We will
>> reserve registers 253 to 255 as temporaries.  If you have so many
>> registers as to need more than that, then you have to shuffle operands
>> down into the temporaries.  That's the plan, anyway.
> I'm very concerned about this design, for the same reason that I was
> concerned about NaN-boxing on 32-bit platforms.  Efficient use of memory
> is extremely important on modern architectures, because of the vast (and
> increasing) disparity between cache speed and RAM speed.  If you can fit
> the active set into the cache, that often makes a profound difference in
> the speed of a program.
> I agree that with VMs, minimizing the number of dispatches is crucial,
> but beyond a certain point, having more registers is not going to save
> you any dispatches, because they will almost never be used anyway.
> 2^12 registers is _far_ beyond that point.

I'm probably not explaining myself clearly.  Here goes.

I willingly grant that 256 registers is usually enough.  But there are
valid reasons to use 2**12 registers: for example in the mov
instruction, if you have an 8-bit opcode, you have 24 bits left.  Using
12 for each operand makes sense.  There are other cases in which you
want to reference 24-bit values, for relative jumps; and even 32-bit
values, to reference constants using relative addressing.  (64 MB is too
small a limit for one compilation unit.  16 GB is fine.)

Likewise I can imagine cases in which you might end up with more than
2**12 active locals, especially in the presence of macros.  In that case
you spill.  But where do you spill?  For Guile, this means spilling to
additional registers, and having to shuffle with long-mov.  Otherwise
you would spill to a vector or something.  The WIP-RTL strategy
adequately captures the edge case while making the normal case fast.

> If I were designing this VM, I'd work hard to allow as many loops as
> possible to run completely in the cache.  That means that three things
> have to fit into the cache together: the VM itself, the user loop code,
> and the user data.  IMO, the sum of these three things should be made as
> small as possible.


> I certainly agree that we should have a generous number of registers,
> but I suspect that the sweet spot for a VM is 256, because it enables
> more compact dispatching code in the VM, and yet is more than enough to
> allow a decent register allocator to generate good code.
> That's my educated guess anyway.  Feel free to prove me wrong :)

I will do better: I will prove you right and prove me right at the same
time :)  The instructions in wip-rtl try to stay in one 32-bit unit.  In
that case they have limits, usually 8 bits.  But when they need to
"spill", they will do so on the stack, not on the heap.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]