More details about the toolchain
For those unfamiliar with the tools described in the other blogpost “Running a D game in the browser”, here are some short descriptions of what they are, and how they work.
LLVM is a compiler project, whose aim is to have a modular architecture and hacker-friendly codebase. The goal is to allow people to develop custom tools around the compilation process.
The LLVM project revolves around its intermediate representation format, also known as “LLVM bitcode”. Most compilers use some kind of IR internally, but they don’t generally make it possible for the user to mess with that IR.
Not having this flexibility with the compiler intermediate representation prevents lots of cool things. As a quick example, this IR could be used for caching, instead of using preprocessed files, like ccache does, thus allowing language-agnostic compilation cache.
And it also makes it a lot harder to write custom frontends or custom backends. If the set of backends isn’t available as a library, and the intermediate representation can’t be put into a file, then it means you have to integrate your custom backend in the main compiler executable, which isn’t always pretty if the original compiler isn’t modular enough.
LLVM has made a strategic design choice here. LLVM frontends can be built separately from LLVM, and they allow the user to generate LLVM bitcode from her source file, instead of generating machine code for a specific target.
This bitcode format is mostly target architecture agnostic. This doesn’t mean compiled programs are architecture agnostic too: for example, C programs can explicitely introduce architecture-dependent things by using directives like #ifdef _X86_ or the sizeof operator. Actually, this bitcode format could be seen as some kind of flattened C, i.e without structures for the control flow. There are structs, there are function pointers, there are integer types, etc.
Once a program has been compiled to LLVM bitcode:
- it’s freed from the specificities of the language it was written in
- it has not yet been polluted with the specificities of any target architecture
At this point, you can compile it to executable code for a real architecture (with the LLVM compiler, ‘llc’), or you can interpret it (with the LLVM interpreter, ‘lli’).
Technically, a program in LLVM bitcode is a file whose extension is “.bc” (binarized LLVM bitcode), or “.ll” (human readable version of LLVM bitcode, looks like assembly language). LLVM provide tools allowing you to switch between both representations.
Some LLVM frontends: clang and ldc
The C++ frontend targetting LLVM is called “clang” (pronounced “Klang!”). clang can output LLVM bitcode by adding “-emit-llvm” to the command line.
The LLVM frontend for the D programming language is called “ldc”. It’s a clever assemblage between the official D frontend from dmd (the reference D compiler), and code generation logic to emit LLVM bitcode. ldc can output LLVM bitcode by adding “-output-ll” or “-output-bc” to the command line.
At this point, I should mention that, although lots of good things can be said about LLVM and its frontends, there is also a lot of clumsinesss that can sometimes get painful to work with. The command lines of nearly all LLVM tools, including the frontends, are messy and heterogeneous. Amongst all the LLVM tools I used, it seems they can’t agree on the command line syntax. For example, to specify the output file, sometimes it’s “-o <file>”, sometimes, it’s “-of<file”, sometimes, it’s “-o=<file>”. The way of specifying target architectures on the command line is horribly counterintuitive.
The LLVM frontends are pluggable, which means they can be built out-of-LLVM-tree, LLVM doesn’t need to know about them. For example, if you’re on Debian, you can just “apt-get install llvm-3.8-dev” and start writing your custom frontend for your custom toy language.
The LLVM backends, implementing targets, on the other side, aren’t pluggable. Indeed, the frontends need to know about them, as each frontend executable will somehow embed the list of available backends.
In a perfect world, words like “x86”, “arm”, “linux”, “windows” should never appear in the frontend code. They should be dynamically provided by the backend, so the frontend can, for example, #define the appropriate symbols (_WIN32, __linux__, etc.) so the source code can potentially depend on it.
In a perfect world, backends would be compiled as plugins, and writing your own backend should be as easy as writing your own frontend. In one word, to add a new backend, you shouldn’t have to modify the LLVM code itself, nor should you have to even recompile it, and you shouldn’t also have to recompile any frontend.
This point is a serious design weakness, which is the root of nearly all the difficulties I had to deal with in this project..
If you haven’t seen the Emscripten-powered demo, “BananaBread”, go see it now: https://kripken.github.io/BananaBread/wasm-demo/index.html
Then you will understand that this technique is a game changer, because now we can play first-person shooters into our browsers!
asm.js is not exactly the future, though. Technically, it’s a hack, an attempt to change the way browsers work, to make them a decent target platform for non-web static languages, like C++, Rust … or D.
A non-hacky solution, WebAssembly, is currently being developed, and its support is making its way into our browsers.
However, at the moment, I consider WebAssembly not being mature enough: I’m going to stick to asm.js here, and its main compiler: Emscripten.
In theory, once a program is converted to LLVM IR, it doesn’t matter what language it was written in: it could be C, C++, Rust, or anything that compiles to LLVM bitcode (I’m disregarding the runtime environment here). So Emscripten should work with any language having an LLVM frontend (and whose semantics don’t differ too much from C++) especially … the D programming language.
To add some difficulties to the issue, as LLVM backends aren’t pluggable, Emscripten actually comes with its own forked versions of LLVM and clang (namely “fastcomp” and “fastcomp-clang”), against which ldc and other LLVM tools generally cannot be compiled, because of incompatible API versions.
Emscripten also consists of a great runtime environment, where a lot of I/O APIs (SDL, SDL_mixer, OpenGL…) have been re-implemented.