[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal: block-based vector allocator

From: Stefan Monnier
Subject: Re: Proposal: block-based vector allocator
Date: Mon, 12 Dec 2011 11:24:13 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux)

>> Let us how it turns out.
> Results for the byte-compile benchmark, an average of 16 runs:

>        CPU time spent in user mode, seconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        74.07        84.87
> default, GCPROs            72.90        81.37
> patched, stack mark        71.35        81.57
> patched, GCPROs            70.16        82.18

>          Peak heap utilization, KBytes
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        41499        73651
> default, GCPROs            37918        65648
> patched, stack mark        38310        67169
> patched, GCPROs            38052        65730

>          Total time spent in GC, seconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        23.58        32.32
> default, GCPROs            21.94        30.43
> patched, stack mark        21.64        29.89
> patched, GCPROs            21.13        29.22

>         Average time per GC, milliseconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        27.62        36.03
> default, GCPROs            25.57        33.93
> patched, stack mark        25.22        33.34
> patched, GCPROs            24.63        32.57

Since most small differences are difficult to separate from noise (and
are not important anyway), the summary from where I stand is that
there's no substantial difference (CPU and memory wise) between your new
code (with or without GCPROs) and the current code with GCPROs, whereas
the current code with conservative stack scanning is a tiny bit slower
and uses a non-negligible amount of extra space in peak usage.
This confirms that the main issue is the mem_nodes.

It would also be interesting to see how your new code performs in terms
of fragmentation (i.e. average memory use for a long running interactive
session), but it's very difficult to measure and I doubt we'd see much
difference (other than the impact of mem_nodes of course).

> on 64-bit. In terms of CPU usage, results are more interesting: 1%
> worse for 64-bit case, but 3.8% better for 32-bit.  The only
> explanation I have for this effect is that an arithmetic used in
> splitting/coalescing operations creates some pressure on the CPU in
> 64-bit mode, but 32-bit version of the same code may be implicitly
> executed in parallel by the 64-bit core.

64bit cores don't implicitly parallelize 32bit code to take advantage of
the 64bit datapath.  So that can't be the explanation.  And register
pressure is worse in the x86 architecture than in the
amd64 architecture.

> Due to this, I don't consider my 32-bit benchmark as fairly
> representative - it should be done on a real 32-bit core and not in
> 'compatibility mode' on 64-bit one.

You misunderstand what is the "compatibility mode" of amd64 processors.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]