Good evening 🙂 A quick note, tonight: I’ve long thought that
ephemerons
are primitive and can’t be implemented with mark functions and/or
finalizers, but today I think I have a counterexample.
For context, one of the goals of the GC implementation I have been
working on on is to replace
Guile‘s current use of the Boehm-Demers-Weiser (BDW) conservative
collector. Of course, changing a
garbage collector for a production language runtime is risky, and for Guile one of the mitigation
strategies for this work is that the new collector is behind an abstract API
whose implementation can be chosen at compile-time, without requiring
changes to user code. That way we can first switch to
BDW-implementing-the-new-GC-API, then switch the implementation behind
that API to something else.
Abstracting GC is a tricky problem to get right, and I thank the MMTk
project for showing that this is possible — you
have user-facing APIs that need to be implemented by concrete collectors,
but also extension points so that the user can provide some compile-time
configuration too, for example to provide field-tracing visitors that take into account how a user wants to lay out objects.
Anyway. As we discussed last time, ephemerons are usually have explicit
support from the GC, so we need an ephemeron abstraction as part of the
abstract GC API. The question is, can BDW-GC provide an implementation of
this API?
I think the answer is “yes, but it’s very gnarly and will kill
performance so bad that you won’t want to do it.”
the contenders
Consider that the primitives that you get with BDW-GC are custom mark
functions, run on objects when they are found to be live by the mark
workers; disappearing links, a kind of weak reference; and finalizers, which receive the object being finalized, can allocate, and indeed can resurrect the object.
BDW-GC’s finalizers are a powerful
primitive, but not one that is useful for implementing the “conjunction”
aspect of ephemerons, as they cannot constrain the marker’s idea of
graph connectivity: a finalizer can only prolong the life of an object subgraph,
not cut it short. So let’s put finalizers aside.
Weak references have a tantalizingly close kind of conjunction property:
if the weak reference itself is alive, and the referent is also
otherwise reachable, then the weak reference can be dereferenced.
However this primitive only involves the two objects E and K; there’s no
way to then condition traceability of a third object V to E and K.
We are left with mark functions. These are an extraordinarily powerful
interface in BDW-GC, but somewhat expensive also: not inlined, and going
against the grain of what BDW-GC is really about (heaps in which the
majority of all references are conservative). But, OK. They way they
work is, your program allocates a number of GC “kinds”, and associates
mark functions with those kinds. Then when you allocate objects, you
use those kinds. BDW-GC will call your mark functions when tracing an object of those kinds.
Let’s assume firstly that you have a kind for ephemerons; then when you
go to mark an ephemeron E, you mark the value V only if the key K has
been marked. Problem solved, right? Only halfway: you also have to
handle the case in which E is marked first, then K. So you publish E to
a global hash table, and… well. You would mark V when you mark a K for
which there is a published E. But, for that you need a hook into
marking V, and V can be any object…
So now we assume additionally that all objects are allocated with
user-provided custom mark functions, and that all mark functions check
if the marked object is in the published table of pending ephemerons,
and if so marks values. This is essentially what a proper ephemeron
implementation would do, though there are some optimizations one can do
to avoid checking the table for each object before the mark stack runs
empty for the first time. In this case, yes you can do it!
Additionally if you register disappearing links for the K field in each
E, you can know if an ephemeron E was marked dead in a previous
collection. Add a pre-mark hook (something BDW-GC provides) to clear
the pending ephemeron table, and you are in business.
yes, but no
So, it is possible to implement ephemerons with just custom mark
functions. I wouldn’t want to do it, though: missing the
mostly-avoid-pending-ephemeron-check optimization would be devastating,
and really what you want is support in the GC implementation. I think
that for the BDW-GC implementation in
whippet I’ll just implement
weak-key associations, in which the value is always marked strongly
unless the key was dead on a previous collection, using disappearing
links on the key field. That way a (possibly indirect) reference from a
value V to a key K can indeed keep K alive, but oh well: it’s a
conservative approximation of what should happen, and not worse than
what Guile has currently.
Good night and happy hacking!