Are you developing or contributing to software that is affected by the bootstrapping problem? The following sections list best practises and practical examples that can help you pull yourself up by your own bootstraps, no matter if you are a compiler writer, a build system developer, or a system distribution developer.
If you're working on a compiler that is written in a language other than the one it's compiling, you're all set!
If your compiler is written in the language that it's compiling (“self-hosted”), it probably falls in one of the following categories.
If other implementations of this programming language exist, please make sure your compiler can be built with one of these. Examples include:
If your compiler targets a language for which no other implementation exists, then please consider maintaining a (minimal) implementation of the language written in a different language. Most likely this implementation exists, or existed at the point the programming language was created. Maintaining this alternate implementation has a cost; however, this cost should be minimal if this alternate implementation is used routinely to build the compiler, and if this implementation is kept simple—it does not need to be optimized.
Please let us know if you’d like to add your compiler to this list!
Build systems sometimes have chicken-and-egg problems: they may need a version of themselves to get built. If you are developing a build system, this can be avoided. We recommend that you provide an alternative way to build your build system.
makeimplementation. It can be built using a shell script.
Build systems are generally easier to safely bootstrap than a self-hosted compiler that may need a full language compiler of its language. A slow and inefficient build written in shell scripts or a different older build system (Ant, GNU Make) may suffice to generate a minimal version of the build system to bootstrap a complete version of it.
It is unavoidable that distributions use some binaries as part of their bootstrap chain. However, distributions should endeavour to provide traceacibility and automated reproducibility for such binaries. This means that:
For example, a distribution might use a binary package of GCC to build GCC from source. This bootstrap binary is in most cases built from a previous revision of the distribution's GCC package. Thus, the distribution can label the binary with something like "this package was built by running <command> on revision <hash> of the distribution's package repository." A user can then easily reproduce the binary by fetching the specified sources and running the specified command. This build will in most cases depend on a previous generation of bootstrap binaries. Thus, we get a chain of verifiable bootstrap binaries stretching back in time.
Bootstrap binaries may also come from upstream. This would typically be the case when a language is first added to a distribution. In this case, it may not be obvious how the binary can be reproduced, but the distribution should at least clearly label the provenance of the binary, e.g. this binary was downloaded from https://upstream-compiler.example.org/upstream-compiler-20161211-x86_64-linux.tar.xz.