|
From: | Steve Fink |
Subject: | Re: [EXPERIMENT] Emacs with the SpiderMonkey garbage collector |
Date: | Sat, 2 Dec 2017 21:24:50 -0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Thunderbird/57.0 |
On 12/2/17 4:37 PM, Pip Cet wrote:
On Fri, Dec 1, 2017 at 5:57 PM, Steve Fink <address@hidden> wrote:The one big thing that the analysis doesn't handle particularly well is internal pointers. For now, it sounds like you're mostly safe from these moving because all your data is hanging off of the JS_GetPrivate indirection, so you wouldn't have any pointers internal to GC things. But you can still run into issues keeping things alive. We mostly have this problem with strings, since small strings are stored inline in the GC things and so it's very easy to end up with a char* that will move. We tend to work around this by either (1) passing around the GC pointer instead of the char*; (2) making functions that accept such a char* to also declare that they won't GC by requiring a reference to an 'AutoRequireCannotGC&' token, which is validated by the static analysis; or (3) forcing the contents to be stored in the malloc heap if they aren't already, and just being careful with keeping the owning JSString* in a Rooted<JSString*> somewhere higher on the stack. Collections are also an issue, if you want to index them by GC pointer value.It seems I have two options: rewrite all hashed collections whenever something moves, or make up a hash value and store it in a private slot for each object upon creation. My understanding is SpiderMonkey does the former for WeakMaps, and those seem to perform okay, so that might be the better option long-term, but I haven't given much thought to this and the made-up hash value seems easier to implement...
We've done it two ways, gradually shifting almost everything over to the second.
The first was what we called "rekeying", which is just removing and re-inserting anything whose pointer changed (or more generally, when any pointer that formed part of the key changed.) We had to do some careful dancing in our hashtable implementation to be certain rekeying can't cause the table to grow (we have tombstones, so the naive approach accumulates tombstones.) Most things are now switching over to the second approach, which is to key tables off of unique ids. We have a separate table mapping anything that needs one to a unique id. But that's still just a space optimization; if all of your objects are going to end up needing a unique id, then you're paying the extra memory cost for everything anyway and you may as well store a hash (or unique id), as you say.
If these tables aren't holding strong references (if being a key in the table shouldn't keep the key alive), then you need to sweep them too, to throw out all of the dead stuff. Even if nothing ever looks up those keys again, hash collisions will have you comparing their dead memory with lookup keys. And if you have code to sweep the table, I guess you can always reuse it during the moving part of the collection to rekey everything that is moving. (And if they *are* holding strong references, they should be traced.)
The embedder API to hook into this can be seen at https://searchfox.org/mozilla-central/source/js/public/GCAPI.h#926
We use GC-aware data structures within spidermonkey (GCHashMap, GCHashSet, GCVector) to automate most of this. See eg https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#28 though even those still require something to call their sweep() methods. The WeakCache at https://searchfox.org/mozilla-central/source/js/public/SweepingAPI.h#54 sets itself up to be swept automatically at the right time.
And like I said, those generally use unique IDs. https://searchfox.org/mozilla-central/source/js/public/GCHashTable.h#105 is the hashtable that rekeys instead. (Note that our hashtable implementation isn't great. It uses double hashing, and probably ought to be replaced with one of the Robin Hood variants, which would change rekeying.)
[Prev in Thread] | Current Thread | [Next in Thread] |