[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pika-dev] hashq values and weak references (was hackerlab fsplay.[ch] s
From: |
Tom Lord |
Subject: |
[Pika-dev] hashq values and weak references (was hackerlab fsplay.[ch] status) |
Date: |
Sat, 8 May 2004 11:31:41 -0700 (PDT) |
> From: Andreas Rottmann <address@hidden>
> Andreas Rottmann <address@hidden> writes:
> > Since you seem quite opposed to the memchunk idea, I took another look
> > at the object data areas, and it seems instead of having the data in a
> > separate memchunk, we might as well keep it in the hashtree items,
> > thus avoiding the memchunk issue.
> At the additional cost that the memory in object data area must be
> copied, but I guess that's neglectable unless you stick large stuff in
> there (would we?).
I'm a little confused as to what you're talking about.
I'm assuming that you're working solving the hashq problem: that with
our copying collector an address-based HASHQ is no good -- the address
of an object can change over time.
Here is how I think we should solve that problem:
Pick some size for "GC-pages". The GC'ed heap is divided into
GC-pages, each page divided up into objects of a certain size.
Reserve one of those objects at the start of each page for a hash
bucket. So it's (initially, I'll change this further down):
GC page:
| hash-bucket | obj | obj | obj | ... | obj |
|
V
| obj . hashq | obj . hashq | .... |
The hash-bucket only needs to exist if HASHQ has been called for any
of the objects on a page.
During copy collection, have _two_ to-space pages for each size of
object: one for objects with hash values, another for objects without
hash values. That way, after a copy collection, the number of hash
buckets will be minimized.
Also note that, after copy collection, the order of objects in the
hash bucket is sorted by address (because we fill up to-space in
address order). HASHQ can preserve that sorting when adding new hash
values to a bucket. If we want to make hash buckets as small as
possible then a lookup in one could be a binary search.
We can do better than a binary search if we don't mind making hash
buckets larger. So, let's make it (I'll change this again further
down):
GC page:
| hash-bucket | obj1 | obj2 | obj3 | ... | objN |
|
V
| hashq1 | hashq2 | hash3 | .... | hashqN |
where the number of bucket slots equals the number of objects on the
page. Now HASHQ is, once again, an O(1) operation.
That space consumption is not so bad at all. Many scheme
implementations these days add a hash-value slot to every object
(e.g., a cons-pair has storage for CAR, CDR, and HASHQ-VALUE (and
usually something else, too). We'll be using less space than those
implementations and getting very close to the same speed.
Finally, I think we can safely assume that less than 1/2 of all
objects will ever be hashq'ed. Plus, on every copy collection, we'll
be compacting the hash buckets. Therefore we can safely double the
size of our hash bucket entries and still come out ahead of other
implementations, space-wise.
So, let's make it:
GC page:
| hash-bucket | obj1 | obj2 | obj3 | ... | objN |
|
V
| next-bucket | hashq1 . weak1 | hashq2 . weak2 | ....
The WEAK slots contain either NIL or a reference to weak-reference.
A weak reference is a limbless vtable object:
| vtable . obj |
During copy-collection, when copying a hashed object with a non-nil
WEAK, copy the reference to the weak-reference and also schedule the
weak-reference for copying. Set the from-space weak-reference link
in the hash bucket to nil.
To illustrate, suppose we start with six objects just before a
copy-collection:
will survive has been has a weak
collection? HASHQed? reference to
it?
OBJ1 no no no
OBJ2 no yes no
OBJ3 no yes yes
OBJ4 yes no no
OBJ5 yes yes no
OBJ6 yes yes yes
(Note that there will never need be any such thing as an object that
has a weak reference but that has not be HASHQed.)
Prior to collection, the heap looks like this (writing pages
vertically instead of horizontally here)
BEFORE COPYING
GC-page hash bucket weak references
(hashq . weak)
-----------------------------------------------
OBJ1 ( 0 . nil )
OBJ2 ( HASH2 . nil )
OBJ3 ( HASH3 . o--)---> ( vtable . OBJ3 )
OBJ4 ( 0 . nil )
OBJ5 ( HASH5 . nil )
OBJ6 ( HASH6 . o--)---> ( vtable . OBJ6 )
After copying, the live objects will be spread over (at least)
two pages, like this:
POST-COPYING, UN-HASHED OBJECTS PAGE
OBJ4' n/a
POST-COPYING, HASHED OBJECTS PAGE
OBJ5' ( HASH5 . nil )
OBJ6' ( HASH6 . o--)---> ( vtable . OBJ6' )
and from space is left looking like this:
POST-COPYING, FROM-SPACE PAGE
OBJ1 ( 0 . nil )
OBJ2 ( HASH2 . nil )
OBJ3 ( HASH3 . o--)---> ( vtable . OBJ3 )
BH4 ( 0 . nil )
BH5 ( HASH5 . nil )
BH6 ( HASH6 . nil )
;; "BH<n>" is "broken heart for OBJ<n>"
q
So there's one last step added to copying -- a scan of all from-space
hash buckets. After that scan, we get:
POST-COPYING, POST-HASH-BUCKET-SCAN
FROM-SPACE PAGE
OBJ1 ( 0 . nil )
OBJ2 ( HASH2 . nil )
OBJ3 ( HASH3 . o--)---> ( vtable . #dead )
BH4 ( 0 . nil )
BH5 ( HASH5 . nil )
BH6 ( HASH6 . nil )
;; "BH<n>" is "broken heart for OBJ<n>"
And, voila, we have pretty inexpensive weak references and
space-efficient O(1) hashq.
-t
- [Pika-dev] hackerlab fsplay.[ch] status, Andreas Rottmann, 2004/05/02
- Re: [Pika-dev] hackerlab fsplay.[ch] status, Tom Lord, 2004/05/02
- Re: [Pika-dev] hackerlab fsplay.[ch] status, Tom Lord, 2004/05/06
- Re: [Pika-dev] hackerlab fsplay.[ch] status, Andreas Rottmann, 2004/05/06
- Re: [Pika-dev] hackerlab fsplay.[ch] status, Tom Lord, 2004/05/07
- Re: [Pika-dev] hackerlab fsplay.[ch] status, Andreas Rottmann, 2004/05/07
- [Pika-dev] Re: hackerlab fsplay.[ch] status, Andreas Rottmann, 2004/05/07
- [Pika-dev] hashq values and weak references (was hackerlab fsplay.[ch] status),
Tom Lord <=
- [Pika-dev] Re: hashq values and weak references, Andreas Rottmann, 2004/05/08
- [Pika-dev] Re: hashq values and weak references, Tom Lord, 2004/05/10
- [Pika-dev] Re: hashq values and weak references, Andreas Rottmann, 2004/05/10
Re: [Pika-dev] hackerlab fsplay.[ch] status, Tom Lord, 2004/05/06