[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[BDW-GC] Precise stack scanning

From: Ludovic Courtès
Subject: [BDW-GC] Precise stack scanning
Date: Thu, 20 Aug 2009 14:48:07 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)

Hello Guilers!

Andy noted on IRC that it may be useful to not scan the whole VM stack
in the BDW-GC branch, since anything beyond the current stack pointer is
garbage and scanning it may lead to excess data retention.

Precise stack scanning is now implemented following suggestions given on
the BDW-GC list [0]:

I run the Ellis/Kova/Boehm/Clinger `gcbench.scm' using our
`gc-benchmarks/run-benchmark.scm' script [1], both with and without
precise stack scanning [2]:

  - whole stack scanning

                         heap size (MiB) execution time (s.)
    Guile                   53.85 (1.00x)     21.091 (1.00x)
    BDW-GC, FSD=3           52.06 (0.97x)     16.950 (0.80x) !
    BDW-GC, FSD=6           45.88 (0.85x)     17.159 (0.81x) !
    BDW-GC, FSD=9           47.66 (0.88x)     16.283 (0.77x) !
    BDW-GC, FSD=3 incr.     78.60 (1.46x)     20.792 (0.99x)
    BDW-GC, FSD=3 gene.     87.40 (1.62x)     18.449 (0.87x)

  - precise stack scanning

                         heap size (MiB) execution time (s.)
    Guile                   53.85 (1.00x)     21.184 (1.00x)
    BDW-GC, FSD=3           52.37 (0.97x)     15.654 (0.74x) !
    BDW-GC, FSD=6           45.88 (0.85x)     16.499 (0.78x) !
    BDW-GC, FSD=9           43.01 (0.80x)     17.098 (0.81x) !
    BDW-GC, FSD=3 incr.     76.31 (1.42x)     20.529 (0.97x)
    BDW-GC, FSD=3 gene.     87.61 (1.63x)     19.776 (0.93x)

Here precise stack scanning slightly reduces the execution time in the
non-generational case, with comparable heap usage.

The `gc-benchmarks/strings.scm' test leads to similar observations:

  - whole stack scanning

                         heap size (MiB) execution time (s.)
    Guile                   1346.51 (1.00x)      6.950 (1.00x)
    BDW-GC, FSD=3           338.40 (0.25x)      3.114 (0.45x) !
    BDW-GC, FSD=6           346.75 (0.26x)      3.155 (0.45x) !
    BDW-GC, FSD=9           352.42 (0.26x)      3.190 (0.46x) !
    BDW-GC, FSD=3 incr.     428.60 (0.32x)      3.604 (0.52x) !
    BDW-GC, FSD=3 gene.     427.88 (0.32x)      4.088 (0.59x) !

  - precise stack scanning

                         heap size (MiB) execution time (s.)
    Guile                   1346.51 (1.00x)      6.825 (1.00x)
    BDW-GC, FSD=3           338.46 (0.25x)      3.082 (0.45x) !
    BDW-GC, FSD=6           296.35 (0.22x)      3.108 (0.46x) !
    BDW-GC, FSD=9           293.61 (0.22x)      3.126 (0.46x) !
    BDW-GC, FSD=3 incr.     507.05 (0.38x)      3.680 (0.54x) !
    BDW-GC, FSD=3 gene.     596.04 (0.44x)      3.686 (0.54x) !

Here precise stack scanning makes more difference in generational mode,
where it leads to increased heap usage (!) and smaller execution time.

(This particular benchmark appears to be pathological for Guile's
current GC.  It even looks worse than a few months ago [1], but this
time I'm using an x86_64 userland instead of i686.)


Surprisingly, scanning the whole VM stack seems to have only a small
impact on execution time and heap usage.  Perhaps long-running
applications would show a more significant difference between precise
and whole-stack scanning.



[1] The "Guile" line represents a 1.9.2ish Guile (with Guile's own GC),
    before commit 9591a2b016c5c11d2cd92ff0d43cd511f28bc07f ("`load'
    autocompiles").  The rest is an equivalent Guile from the BDW-GC

    for the meaning of each row in these arrays.

[2] Actually, I run a script that does `(compile-and-load "gcbench.scm")'
    to make sure the VM stack is actually used.


Attachment: pgpLSsmD9qlpC.pgp
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]