We are delighted and somewhat relieved to announce that the third
reduction of the Guix bootstrap binaries has now been merged in the
main branch of Guix! If you run guix pull
today, you get a package
graph of more than 22,000 nodes rooted in a 357-byte program—something
that had never been achieved, to our knowledge, since the birth of Unix.
We refer to this as the Full-Source Bootstrap. In this post, we
explain what this means concretely. This is a major milestone—if not the
major milestone—in our quest for building everything from source, all
the way down.
How did we get there, and why? In two previous
blog
posts,
we elaborated on why this reduction and bootstrappability in general
is so important.
One reason is to properly address supply chain security concerns. The
Bitcoin community was one of the first to recognize its importance
well enough to put the idea into practice. At the Breaking Bitcoin
conference 2020, Carl Dong gave a fun
and remarkably gentle
introduction.
At the end of the talk, Carl states:
The holy grail for bootstrappability will be connecting
hex0
tomes
.
Two years ago, at FOSDEM 2021, I (Janneke)
gave a short talk about how we
were planning to continue this quest.
If you think one should always be able to build software from source,
then it follows that the “trusting
trust”
attack is only a symptom of an incomplete or missing bootstrap story.
The Road to Full-Source Bootstrap
Three years ago, the bootstrap binaries were reduced to just GNU
Mes and
MesCC-Tools (and
the driver to build Guix packages: a static
build
of GNU Guile 2.0.9).
The new Full-Source Bootstrap, merged in Guix master
yesterday,
removes the binaries for Mes and MesCC-Tools and replaces them by bootstrap-seeds. For x86-linux (which is also used by the x86_64-linux build), this means this program
hex0-seed, with ASCII-equivalent
hex0_x86.hex0. Hex0 is self-hosting and its source looks like this:
; Where the ELF Header is going to hit
; Simply jump to _start
; Our main function
# :_start ; (0x8048054)
58 # POP_EAX ; Get the number of arguments
you can spot two types of line-comment: hex0 (;
) and assembly (#
).
The only program-code in this snippet is 58
: two hexidecimal digits
that are taken as two nibbles and compiled into the corresponding byte
with binary value 58
.
Starting from this 357-byte hex0-seed binary provided by the
bootstrap-seeds
, the stage0-posix
package created by Jeremiah
Orians first builds hex0 and then all the way up: hex1, catm, hex2,
M0, cc_x86, M1, M2, get_machine (that’s all of MesCC-Tools), and
finally M2-Planet.
The new GNU Mes v0.24 release can be built with
M2-Planet. This time with only a remarkably small
change, the bottom of the package
graph now looks like this (woohoo!):
gcc-mesboot (4.9.4)
^
|
(...)
^
|
binutils-mesboot (2.20.1a), glibc-mesboot (2.2.5),
gcc-core-mesboot (2.95.3)
^
|
patch-mesboot (2.5.9)
^
|
bootstrappable-tcc (0.9.26+31 patches)
^
|
gnu-make-mesboot0 (3.80)
^
|
gzip-mesboot (1.2.4)
^
|
tcc-boot (0.9.27)
^
|
mes-boot (0.24.2)
^
|
stage0-posix (hex0..M2-Planet)
^
|
gash-boot, gash-utils-boot
^
|
*
bootstrap-seeds (357-bytes for x86)
~~~
[bootstrap-guile-2.0.9 driver (~25 MiB)]
We are excited that the NLnet Foundation has been
sponsoring this work!
However, we aren’t done yet; far from it.
Lost Paths
The idea of reproducible builds and bootstrappable software is not
very
new.
Much of that was implemented for the GNU tools in the early 1990s.
Working to recreate it in present time shows us much of that practice
was forgotten.
Most bootstrap problems or loops are not so easy to solve and
sometimes there are no obvious answers, for example:
-
In 2013, the year that Reproducible
Builds started to gain some
traction, the GNU Compiler Collection released
version 4.8.0,
making C++ a build requirement, and -
Even more recently (2018), the GNU C Library glibc-2.28 adds Python
as a build
requirement,
While these examples make for a delightful puzzle from a
bootstrappability perspective, we would love to see the maintainers of
GNU packages consider bootstrappability and start taking more
responsibility for the bootstrap story of their packages.
Next Steps
Despite this major achievement, there is still work ahead.
First, while the package graph is rooted in a 357-byte program, the set
of binaries from which packages are built includes a 25 MiB
statically-linked Guile, guile-bootstrap
, that Guix uses as its driver
to build the initial packages. 25 MiB is a tenth of what the initial
bootstrap binaries use to weigh, but it is a lot compared to those 357
bytes. Can we get rid of this driver, and how?
A development effort with Timothy Sample addresses the dependency on
guile-bootstrap
of Gash and
Gash-Utils, the
pure-Scheme POSIX shell implementation central to our second
milestone.
On the one hand, Mes is gaining a higher level of Guile compatibility:
hash table interface, record interface, variables and variable-lookup,
and Guile (source) module loading support. On the other hand, Gash
and Gash-Utils are getting Mes compatibility for features that Mes is
lacking (notably syntax-case
macros). If we pull this off,
guile-bootstrap
will only be used as a dependency of bootar and as
the driver for Guix.
Second, the full-source bootstrap that just landed in Guix master
is
limited to x86_64-linux and i686-linux, but ARM and RISC-V will be
joining soon. We are most grateful and excited that the NLnet
Foundation has decided to continue sponsoring this
work!
Some time ago, Wladimir van der Laan contributed initial RISC-V
support for Mes but a major obstacle for the RISC-V bootstrap is that
the “vintage” GCC-2.95.3 that was such a helpful stepping stone does
not support RISC-V. Worse, the RISC-V port of GCC was introduced only
in GCC 7.5.0—a version that requires C++ and cannot be
bootstrapped! To this end, we have been improving MesCC, the C
compiler that comes with Mes, so it is able to
build GCC 4.6.5; meanwhile, Ekaitz Zarraga
backported RISC-V support to GCC
4.6.5, and backported RISC-V
support from the latest tcc to our
bootstrappable-tcc.
Outlook
The full-source bootstrap was once deemed impossible. Yet, here we are,
building the foundations of a GNU/Linux distro entirely from source, a
long way towards the ideal that the Guix project has been aiming for
from the
start.
There are still some daunting tasks ahead. For example, what about the
Linux kernel? The good news is that the bootstrappable community has
grown a lot, from two people six years ago there are now around 100
people in the #bootstrappable
IRC channel. Interesting times ahead!
About Bootstrappable Builds and GNU Mes
Software is bootstrappable when it does not depend on a binary seed
that cannot be built from source. Software that is not
bootstrappable—even if it is free software—is a serious security
risk (supply chain security)
for
a
variety
of
reasons.
The Bootstrappable Builds project aims
to reduce the number and size of binary seeds to a bare minimum.
GNU Mes is closely related to the
Bootstrappable Builds project. Mes is used in the full-source
bootstrap path for the Guix System.
Currently, Mes consists of a mutual self-hosting scheme interpreter
and C compiler. It also implements a C library. Mes, the scheme
interpreter, is written in about 5,000 lines of code of simple C and
can be built with M2-Planet.
MesCC, the C compiler, is written in scheme. Together, Mes and MesCC
can compile bootstrappable TinyCC
that is self-hosting. Using this TinyCC and the Mes C library, the
entire Guix System for i686-linux and x86_64-linux is bootstrapped.
About GNU Guix
GNU Guix is a transactional package manager and
an advanced distribution of the GNU system that respects user
freedom.
Guix can be used on top of any system running the Hurd or the Linux
kernel, or it can be used as a standalone operating system distribution
for i686, x86_64, ARMv7, AArch64 and POWER9 machines.
In addition to standard package management features, Guix supports
transactional upgrades and roll-backs, unprivileged package management,
per-user profiles, and garbage collection. When used as a standalone
GNU/Linux distribution, Guix offers a declarative, stateless approach to
operating system configuration management. Guix is highly customizable
and hackable through Guile
programming interfaces and extensions to the
Scheme language.