[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal: block-based vector allocator

From: Dmitry Antipov
Subject: Re: Proposal: block-based vector allocator
Date: Mon, 12 Dec 2011 07:07:18 +0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111115 Thunderbird/8.0

On 12/10/2011 01:04 AM, Stefan Monnier wrote:

Let us how it turns out.

Results for the byte-compile benchmark, an average of 16 runs:

       CPU time spent in user mode, seconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        74.07        84.87
default, GCPROs            72.90        81.37
patched, stack mark        71.35        81.57
patched, GCPROs            70.16        82.18

         Peak heap utilization, KBytes
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        41499        73651
default, GCPROs            37918        65648
patched, stack mark        38310        67169
patched, GCPROs            38052        65730

         Total time spent in GC, seconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        23.58        32.32
default, GCPROs            21.94        30.43
patched, stack mark        21.64        29.89
patched, GCPROs            21.13        29.22

        Average time per GC, milliseconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        27.62        36.03
default, GCPROs            25.57        33.93
patched, stack mark        25.22        33.34
patched, GCPROs            24.63        32.57

First of all, I was surprised with such a difference between 32- and
64-bit code. Due to this, I'm wondering whether --with-wide-int makes
sense: I expect it to be "too fat, too bloated" for a 32-bit CPU
(but would be glad to mistake).

Next, block vector allocation makes GC a bit faster. I expected it,
and can explain this effect with the better locality of vector-like
objects, which takes a good care of sweeping.

Finally, I'm pretty sure that block vector allocation makes a
very negligible effect for the case of GCPROs. In terms of peak
heap utilization, it's 0.35% worse on 32-bit and just 0.12% worse
on 64-bit. In terms of CPU usage, results are more interesting:
1% worse for 64-bit case, but 3.8% better for 32-bit. The only
explanation I have for this effect is that an arithmetic used in
splitting/coalescing operations creates some pressure on the CPU
in 64-bit mode, but 32-bit version of the same code may be implicitly
executed in parallel by the 64-bit core. Due to this, I don't consider
my 32-bit benchmark as fairly representative - it should be done
on a real 32-bit core and not in 'compatibility mode' on 64-bit one.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]