[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal: block-based vector allocator
From: |
Dmitry Antipov |
Subject: |
Re: Proposal: block-based vector allocator |
Date: |
Mon, 12 Dec 2011 07:07:18 +0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:8.0) Gecko/20111115 Thunderbird/8.0 |
On 12/10/2011 01:04 AM, Stefan Monnier wrote:
Let us how it turns out.
Results for the byte-compile benchmark, an average of 16 runs:
CPU time spent in user mode, seconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark 74.07 84.87
default, GCPROs 72.90 81.37
patched, stack mark 71.35 81.57
patched, GCPROs 70.16 82.18
Peak heap utilization, KBytes
----< configuration >----< 32bit >----< 64bit >----
default, stack mark 41499 73651
default, GCPROs 37918 65648
patched, stack mark 38310 67169
patched, GCPROs 38052 65730
Total time spent in GC, seconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark 23.58 32.32
default, GCPROs 21.94 30.43
patched, stack mark 21.64 29.89
patched, GCPROs 21.13 29.22
Average time per GC, milliseconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark 27.62 36.03
default, GCPROs 25.57 33.93
patched, stack mark 25.22 33.34
patched, GCPROs 24.63 32.57
First of all, I was surprised with such a difference between 32- and
64-bit code. Due to this, I'm wondering whether --with-wide-int makes
sense: I expect it to be "too fat, too bloated" for a 32-bit CPU
(but would be glad to mistake).
Next, block vector allocation makes GC a bit faster. I expected it,
and can explain this effect with the better locality of vector-like
objects, which takes a good care of sweeping.
Finally, I'm pretty sure that block vector allocation makes a
very negligible effect for the case of GCPROs. In terms of peak
heap utilization, it's 0.35% worse on 32-bit and just 0.12% worse
on 64-bit. In terms of CPU usage, results are more interesting:
1% worse for 64-bit case, but 3.8% better for 32-bit. The only
explanation I have for this effect is that an arithmetic used in
splitting/coalescing operations creates some pressure on the CPU
in 64-bit mode, but 32-bit version of the same code may be implicitly
executed in parallel by the 64-bit core. Due to this, I don't consider
my 32-bit benchmark as fairly representative - it should be done
on a real 32-bit core and not in 'compatibility mode' on 64-bit one.
Dmitry
- Re: Proposal: block-based vector allocator, (continued)
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/07
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/08
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/08
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/08
- Re: Proposal: block-based vector allocator, Eli Zaretskii, 2011/12/09
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/09
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/09
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/09
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/09
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/11
- Re: Proposal: block-based vector allocator,
Dmitry Antipov <=
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/12
- Re: Proposal: block-based vector allocator, Stephen J. Turnbull, 2011/12/08