[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Texmacs-dev] Mandrake (and thought about typing and ref counting)
From: |
Stéphane Payrard |
Subject: |
Re: [Texmacs-dev] Mandrake (and thought about typing and ref counting) |
Date: |
Wed, 25 Sep 2002 21:19:45 -0400 |
User-agent: |
Mutt/1.4i |
A post from perlmonks by Dan Sugalski, the parrot architect.
I think it is relevant to this thread. It compares the
respective merits of real garbage collecting and reference counting
and motivates the use of a real garbage collecting scheme in parrot.
Perl5 is using reference counting.
Parrot will be the run-time of perl6 (probably among other languages).
Reference counting has a number of problems:
Circular garbage leaks
It's horribly error-prone
It requires a lot of code
It's slow
It blows your cache
To take those in more detail:
1 It's not tough to get circular structures at the user (i.e. the
application programmer) level. This means that either a program leaks,
or each application that potentially has to generate circular
structures has to kill them.
2 For refcounting to work right, every increment must have a matching
decrement. Historically (which is to say, in every release of perl
I've seen) this doesn't happen. Sometimes its only a few variables,
sometimes it's a lot of variables. Either way, there's leakage. It
also means that every extension writer must get refcounting right. For
anything other than trivial extensions, they don't.
3 To support refcounting, each and every time a variable is inserted
into a container or removed from a container the refcount must be
changed. This is a lot of code. Go browse through the perl 5 source
(And don't forget about extensions) to see how often this
happens. This is essentially makework code--it does almost nothing
useful, and is entirely bookkeeping.
4 Reference counting is the slowest form of garbage collection for all
but the most extremely degenerate code. Reference counting requires
you to spend time touching each variable at least twice. This is all
your variables, not just the ones that are alive. Tracing garbage
collectors (which include the mark & sweep, compacting, and
generational style collectors) generally only touch live data. Yes,
it's only a quick increment or decrement, but it's a lot of
them. Refcount schemes usually spend about twice as much time doing
garbage collection work than tracing collectors.
5 Refcounting makes the internal data structures and code less dense,
and that means that your program spends more time waiting on main
memory. (Fetching data from RAM can cost up to 200 cycles on some
systems) Your code is less dense, since you have refcount twiddling
code all through it. Your data is also less dense, since you need to
have space for refcounts in it. And even worse, it means more of your
data structure has to be touched when its dealt with. This screws your
L1 cache. Generally data comes into your CPU's L1 cache in small
chunks--8 or 16 bytes. You want to use as few of these chunks as you
can since, while they're fastest to access (usually a few cycles at
most) there's not much of it. And loading a cache line can cost 10 or
20 cycles if the data you're looking for is in your L2 cache. (And 200
or more if it isn't) The less memory that needs to be touched to do
something means the more efficiently the cache is used and the faster
your program runs.
Tracing collectors, which is what parrot's using, generally have
different characteristics.
No mainline code does anything with refcounts or other 'liveness'
bookkeeping data. That means the mainline code is denser, with more of
it actually dedicated to executing your program. Internal data
structures are either smaller (if the GC bookkeeping fields are kept
out of band) or at least use less cache (since while still in the
structure they aren't touched and thus fetched into L1 cache).
There's less code dedicated to GC itself and, while a tracing garbage
collector isn't exactly trivial, it's not a big deal either. More to
the point, the code required to do the GC is collected together in one
single spot, rather than sprinkled througout the entire codebase. (It
took me only a few hours to write the first cut of Parrot's GC)
Tracing collectors take less time to execute. Since they make more
efficient use of the cache (there's only a small chunk of code doing
GC, so it gets pinned in L1 cache while it runs, and it generally
restricts itself to a smallish chunk of memory when it runs so it
gains some cache locality that way) you waste less time on the memory
bus, and since tracing GCs only look at the live data, their runtime's
proportional to the amount of live data you have, generally a much
smaller number than the number of data objects you've allocated since
the last GC run.
So, tracing collectors are generally faster, less error-prone, have
less code, and put less burden on the people writing the core and
extensions in C. The one (and only) advantage that refcounting has is
deterministic destruction. (Assuming no bugs, of course) Other than
that it's a massive pain to deal with for the internals folks.
All of that is why we don't like refcounts. :)
BTW: Your LISP book's numbers may well be really, really old. Much of
GC's bad reputation for pauses come from benchmarks run 20 or more
years ago, on VAX 11/780 machines with 2M of RAM and an RA60 as a swap
disk. These days things run rather a lot faster. A quick run of one of
parrot's benchmarks shows it did 43K GC runs in 22 seconds on my
machine, in addition to the sample program's real work. That's not
bad... (Though, yes, it does argue that the sample program needs more
work to generate less garbage)
-
stef