Author:
Source
Good evening, dear hackfriends. Tonight’s missive is an apology: not
quite in the sense of expiation, though not quite
not that, either; rather, apology in the sense of explanation, of
exegesis: apologia. See, I accidentally made a Scheme. I know I have
enough Scheme implementations already, but I went and made another one.
It’s for a maybe good reason, though!
one does not simply a scheme
I feel like we should make this the decade of leaning into your problems,
and I have a Scheme problem, so here we are. See, I co-maintain
Guile, and have been noodling on a new
garbage collector (GC) for Guile,
Whippet. Whippet is designed to be
embedded in the project that uses it, so one day I hope it will be just
copied into Guile’s source tree, replacing the venerable
BDW-GC that we currently use.
The thing is, though, that GC implementations are complicated. A bug in
a GC usually manifests itself far away in time and space from the code
that caused the bug. Language implementations are also complicated, for
similar reasons. Swapping one GC for another is something to be done
very carefully. This is even more the case when the switching cost is
high, which is the case with BDW-GC: as a collector written as a library
to link into “uncooperative”
programs, there is more cost to moving to a conventional collector than
in the case where the embedding program is already aware that (for
example) garbage collection may relocate objects.
So, you need to start small. First, we need to prove that the new GC
implementation is promising in some way, that it might improve on
BDW-GC. Then… embed it directly into Guile? That sounds like a bug
farm. Is there not any intermediate step that one might take?
But also, how do you actually test that a GC algorithm or
implementation is interesting? You need a workload, and you need the
ability to compare the new collector to the old, for that workload. In
Whippet I had been writing some benchmarks in C
(example),
but this approach wasn’t scaling: besides not sparking joy, I was
starting to wonder if what I was testing would actually reflect usage in
Guile.
I had an early approach to rewrite a simple language implementation like
the other Scheme implementation I made to demonstrate JIT code
generation in
WebAssembly,
but that soon foundered against what seemed to me an unlikely rock: the
compiler itself. In my wasm-jit
work, the “compiler” itself was in C++, using the C++ allocator for
compile-time allocations, and the result was a tree of AST nodes that were
interpreted at run-time. But to embed the benchmarks in Whippet itself
I needed something C, which is less amenable to abstraction of any
kind… Here I think I could have made a different choice: to somehow
allow C++ or something as a dependency to write tests, or to do more
mallocation in the “compiler”…
But that wasn’t fun. A lesson I learned long ago is that if something
isn’t fun, I need to turn it into a compiler. So I started writing a
compiler to a little bytecode VM, initially in C, then in Scheme because
C is a drag and why not? Why not just generate the bytecode C file from
Scheme? Same dependency set, once the C file is generated. And then,
as long as you’re generating C, why go through bytecode at all? Why not
just, you know, generate C?
after all, why not? why shouldn’t i keep it?
And that’s how I accidentally made a
Scheme, Whiffle. Tomorrow I’ll write a little
more on what Whiffle is and isn’t, and what it’s doing for me. Until
then, happy hacking!