Fancy tools can’t tame over-complicated designs

Bugs are only the tip of the iceberg … and over-complicated designs eat fancy tools for breakfast.

The other day, I was playing with reversing challenges (crackmes), and I had a funny though.

The principle of crackmes is simple: you’re given a small executable file, for which you don’t have the source code. When launched, it asks for some secret password. Your goal is to find the password, generally by the analysis of the given executable binary.

Binary reversing requires many tools, amongst them a debugger (generally gdb/Ollydbg/x64dbg), with every conceivable type of conditionnal breakpoint, time travel, highly precise execution traces, etc.

And I realized a funny thing: to understand the workings of a closed source program that what obfuscated on-purpose, I was using nearly the same tools and techniques that people use to understand their own code.

“The amount of energy necessary to refute bullshit is an order of magnitude bigger than to produce it.” (Alberto Brandolini)

Everyone knows that debugging is twice as hard as writing a program in the first place” (Denis Kernighan)

Suppose we play the following game: I must write a program, following some known spec, and you must find all bugs in it. However, I’m an evil coder, and my goal is to sneak in bugs that you won’t catch.

You can’t win at this game. Sneaking in a bug will take me “one order of magnitude” less energy that what you will need to detect it. To be able to beat me at this game, you might need, at least, a team of 10 skilled QA engineers, probably with access to the source code.

I think there’s a deep idea behind this ; probably something related to entropy ; the idea that’s it’s always easier to make a mess than to clean it afterwards. And, probably, easier “by one order of magnitude”.

Tools work exclusively on the “mess cleaning” side. I’m not talking exclusively about debuggers ; I’m also talking about sanitizers, static analyzers, and, to some extent, fuzzers and integrated tests.

These tools are thus fundamentally condemned to require horribly sophisticated techniques/heuristics, in order to “untangle” what the programmer wrote.

Relying on this “mess cleaning” side to understand the program you’re working on is an extremely slow approach. Sometimes, it’s the only possible one (e.g malware analysis).

But more often, it’s simply the default approach most developers think of (on this specific point, one could argue that Visual Studio’s great debugger harmed the software industry).

Finding bugs is well and nice, but it’s “one order of magnitude” less efficient than having a codebase that’s not full of traps, a codebase that minimizes your probability of writing code that would introduce new mistakes.

Let’s suppose that one day, you have a magical compiler who is able to tell you, instantly and with certainty, if you’ve just introduced a bug or not.

Bad software designers would still end up in some dead end: at this point, the design would have become so complicated that the vast majority of attempts at modification would introduce a new mistake. The magical compiler would catch them early, and the bad software designer would have to undo them, and try again.

They would become trapped in their own over-complicated design. Fragile code would have been transformed into rigid code ; and the goal would have been missed.

(You don’t need a magical compiler to experience this situation, a badly written suite of unit tests is enough – and it’s possible that many of our codebases already are in such dead-ends, it’s just that we can’t directly see it).

Thus, even the best imaginable tool doesn’t solve the issue. This is because catching all my mistakes early is only the tip of the iceberg. I also want to be able to add features in a reasonnable time.

In one word, I want my code to be correct, and flexible.

And flexibility isn’t a refinement of correctness. Quite the opposite, in fact.

Non-intrusive test methods are exactly the tools that act on the “mess cleaning” side I was describing earlier.

Let’s suppose you put a 9-year old in front of a text processor, and you ask him to write a 300-page novel.

Now, you read the result. The first thing you notice is that it’s full of spelling mistakes, and ambiguous wordings.

Now, you say to yourself: “ok, so now, I just have to fix all the mistakes”, and you launch your favorite spellchecker.

Interestingly, after the spellchecking + corrections, the text is still very poor, and you don’t understand why. Yet, you’ve fixed all the mistakes!

You read the text again, and this time, you notice that some very subtle spelling mistakes have made their way through the spellchecker. So you think: I need a better spellchecker!

If you can find a spellchecker able to tell you what corrections you need to do to transform your initial text into Shakespeare, good for you ; your tool is very powerfull (and can probably replace you).

But you don’t necessarily want Shakespeare ; Dan Brown would be okay!

So you decide to ignore the remaining spelling mistakes and start digging into semantics. And then, you realize that the text is full of inconsistencies, like characters using information they’re not supposed to know. Or things happening simultaneously in several places. Or causality violations.

And then, you say to yourself: I know what I need: a tool to check the consistency of a story! And by chance, it’s exactly the subject of a research paper published 3 monthes ago, describing a heuristic to detect inconsistencies in written stories. Great!

Except that it doesn’t detect all inconsistencies, and worse: it doesn’t tell you how to fix the ones it detects.

So you start patching one by one all the errors reported by your new tool. Two weeks of tedious rewording later, the tool doesn’t report anything anymore.

You end up with a patched text. Each paragraph is now nearly consistent, but still, the full text isn’t. By focusing on mistakes detected by the tool, you have, without knowing it, applied fixes to very local issues, and ignored global ones.

And this way of fixing has simply moved the errors (inconsistencies) out of the detection zone or your new fancy tool. Even worse: it’s now even more difficult to modify your story without creating new inconsistencies.

In one word, you just made your novel more difficult to validate, and more rigid. And in the process, you have blinded a state of the art error-detection tool. And your novel still doesn’t even come near Dan Brown.

I think we all agree that this clearly isn’t a good way to write a novel. A better way would probably to teach your writer how to write better.

Don’t get me wrong: I’m not saying we should trash our error checking tools, absolutely not! I’m saying that we see them more like  educationnal tools, rather than validation tools.

This means that this is your 9-year old, not you, that must analyze and understand each reported mistake, in order not to fix his text, but in order to fix his writing technique.

It’s intrusive ; it’s a feedback to the “mess making” side from the “mess cleaning” side, that allows the writer to understand how not to make a mess (and by the way, it’s, to my opinion, the biggest benefit of TDD).

With software development, it’s roughly the same story.

There’s some “writer” process, which corresponds to the modification of your source code, and a “reviewer” process, which corresponds to the analysis and understanding of what you have written.

Of course, in practice, both processes are intertwined (except for adepts of “Debug-Later Programming”, as described in http://blog.wingman-sw.com/archives/16 ).

There are lots of ways you can “analyze” what you’ve written. Anything that gives you feedback about you code counts. This includes: unit test failures, poor code coverage, static analysis failure, bug reports from customers, build times, high cyclomatic complexity, integration tests failures, high defect rate, etc.

This is how one learns to keep the code simple.

If your testing methodology doesn’t have feedback to modify the way you write code, you’re doing it in a suboptimal way.

I postulate that having this feedback is the only viable way to write software.

Bugs are only the tip of the iceberg. You don’t want to waste your time attacking them one by one. You have to attack instead the underwater factory that’s producing those bugs at a faster-than-you-can-handle rate.

And it has a name, it’s called software engineering.

LLVM, LDC, and Emscripten: an introduction

More details about the toolchain

For those unfamiliar with the tools described in the other blogpost “Running a D game in the browser”, here are some short descriptions of what they are, and how they work.

LLVM

LLVM is a compiler project, whose aim is to have a modular architecture and hacker-friendly codebase. The goal is to allow people to develop custom tools around the compilation process.

The LLVM project revolves around its intermediate representation format, also known as “LLVM bitcode”. Most compilers use some kind of IR internally, but they don’t generally make it possible for the user to mess with that IR.

Not having this flexibility with the compiler intermediate representation prevents lots of cool things. As a quick example, this IR could be used for caching, instead of using preprocessed files, like ccache does, thus allowing language-agnostic compilation cache.

And it also makes it a lot harder to write custom frontends or custom backends. If the set of backends isn’t available as a library, and the intermediate representation can’t be put into a file, then it means you have to integrate your custom backend in the main compiler executable, which isn’t always pretty if the original compiler isn’t modular enough.

LLVM has made a strategic design choice here. LLVM frontends can be built separately from LLVM, and they allow the user to generate LLVM bitcode from her source file, instead of generating machine code for a specific target.

This bitcode format is mostly target architecture agnostic. This doesn’t mean compiled programs are architecture agnostic too:  for example, C programs can explicitely introduce architecture-dependent things by using directives like #ifdef _X86_ or the sizeof operator. Actually, this bitcode format could be seen as some kind of flattened C, i.e without structures for the control flow. There are structs, there are function pointers, there are integer types, etc.

Once a program has been compiled to LLVM bitcode:

  • it’s freed from the specificities of the language it was written in
  • it has not yet been polluted with the specificities of any target architecture

At this point, you can compile it to executable code for a real architecture (with the LLVM compiler, ‘llc’), or you can interpret it (with the LLVM interpreter, ‘lli’).

Technically, a program in LLVM bitcode is a file whose extension is “.bc” (binarized LLVM bitcode), or “.ll” (human readable version of LLVM bitcode, looks like assembly language). LLVM provide tools allowing you to switch between both representations.

 

Some LLVM frontends: clang and ldc

The C++ frontend targetting LLVM is called “clang” (pronounced “Klang!”). clang can output LLVM bitcode by adding “-emit-llvm” to the command line.

The LLVM frontend for the D programming language is called “ldc”. It’s a clever assemblage between the official D frontend from dmd (the reference D compiler), and code generation logic to emit LLVM bitcode. ldc can output LLVM bitcode by adding “-output-ll” or “-output-bc” to the command line.

At this point, I should mention that, although lots of good things can be said about LLVM and its frontends, there is also a lot of clumsinesss that can sometimes get painful to work with. The command lines of nearly all LLVM tools, including the frontends, are messy and heterogeneous. Amongst all the LLVM tools I used, it seems they can’t agree on the command line syntax. For example, to specify the output file, sometimes it’s “-o <file>”, sometimes, it’s “-of<file”, sometimes, it’s “-o=<file>”. The way of specifying target architectures on the command line is horribly counterintuitive.

The LLVM frontends are pluggable, which means they can be built out-of-LLVM-tree, LLVM doesn’t need to know about them. For example, if you’re on Debian, you can just “apt-get install llvm-3.8-dev” and start writing your custom frontend for your custom toy language.

The LLVM backends, implementing targets, on the other side, aren’t pluggable. Indeed, the frontends need to know about them, as each frontend executable will somehow embed the list of available backends.

In a perfect world, words like “x86”, “arm”, “linux”, “windows” should never appear in the frontend code. They should be dynamically provided by the backend, so the frontend can, for example, #define the appropriate symbols (_WIN32, __linux__, etc.) so the source code can potentially depend on it.

In a perfect world, backends would be compiled as plugins, and writing your own backend should be as easy as writing your own frontend. In one word, to add a new backend, you shouldn’t have to modify the LLVM code itself, nor should you have to even recompile it, and you shouldn’t also have to recompile any frontend.

This point is a serious design weakness, which is the root of nearly all the difficulties I had to deal with in this project..

asm.js

asm.js is a highly-optimizable subset of Javascript. It mostly relies on enforcing static typing through standard Javascript constructs. So, asm.js code is technically valid Javascript code, although most modern browsers will recognize asm.js code as such, and apply special optimizations before feeding it to their Javascript VM.

If you haven’t seen the Emscripten-powered demo, “BananaBread”, go see it now:  https://kripken.github.io/BananaBread/wasm-demo/index.html

Then you will understand that this technique is a game changer, because now we can play first-person shooters into our browsers!

asm.js is not exactly the future, though. Technically, it’s a hack, an attempt to change the way browsers work, to make them a decent target platform for non-web static languages, like C++, Rust … or D.

A non-hacky solution, WebAssembly, is currently being developed, and its support is making its way into our browsers.

However, at the moment, I consider WebAssembly not being mature enough: I’m going to stick to asm.js here, and its main compiler: Emscripten.

Emscripten

Emscripten is a hell of a cool idea. In one word, it’s a translator from LLVM bitcode to asm.js (a highly-optimizable subset of Javascript).

In theory, once a program is converted to LLVM IR, it doesn’t matter what language it was written in: it could be C, C++, Rust, or anything that compiles to LLVM bitcode (I’m disregarding the runtime environment here). So Emscripten should work with any language having an LLVM frontend (and whose semantics don’t differ too much from C++) especially … the D programming language.

To add some difficulties to the issue, as LLVM backends aren’t pluggable, Emscripten actually comes with its own forked versions of LLVM and clang (namely “fastcomp” and “fastcomp-clang”), against which ldc and other LLVM tools generally cannot be compiled, because of incompatible API versions.

Emscripten also consists of a great runtime environment, where a lot of I/O APIs (SDL, SDL_mixer, OpenGL…) have been re-implemented.

This is where most of the magic occurs, actually. It doesn’t matter how a function call like “SDL_SetVideoMode(640, 480, 32, 0)” gets translated to it’s Javascript equivalent.

What really matters here is how the callee is implemented (hint: in Javascript), compiling SDL to Javascript using the same method would just push the problem further: the SDL code also does calls to other APIs (winmm, DirectDraw, X11 …) that of course you do not have in a browser.

 

Running a D game in the browser

TL;DR: After two years of experimentation, I finally managed to run D code inside my browser. The conversion chain is D-to-LLVM-to-C-to-ASMJS, and uses ldc, Emscripten, and an unexpectedly useful tool from the Julia community.

Demo here: http://code.alaiwan.org/dscripten/full.html

First attempt with LDC

My initial idea was to compile ldc against emscripten-fastcomp (the Emscripten fork of LLVM), so I would get a D compiler that would target Javascript (spoiler: it doesn’t work).

First I had to get it to build. LLVM API are incompatible between major releases, and the ldc build system relies on the version number which is reported by llvm-config. However, as emscripten-fastcomp API is halfway between LLVM major releases, this version number makes almost no sense, and I had to modify ldc to get it to build.

At this point, however, the only thing I could get when trying to compiler D code to Javascript were assertion errors.  It turns out the generated bitcode doesn’t pass the “PNaCl legalization passes”. I have no clue what this means, however I do understand that the bitcode generated by ldc is different in nature to the one generated by clang (most probably, it uses parts of the bitcode that clang doesn’t use).

After monthes of trial and error, I finally gave up, and, with a heavy heart, went back to C++. I made two toy projects using the SDL and vanilla Emscripten.

ss2

Writing these two projects was an incredibly instructive experience. It allowed me to fully realize the inherent limitations of running inside a browser. This was also the occasion to learn the OpenGL core profile, which was, at the time, the only way to do 3D graphics from Emscripten.

The source code for these two projects is available at the following Bazaar repositories:

$ bzr checkout http://code.alaiwan.org/bzr/games/shooter

$ bzr checkout http://code.alaiwan.org/bzr/games/deeep

A new hope: the “resurrected” LLVM C backend

So, compiling C++ to Javascript works. If I could translate my D programs to C++, then I could feed them to Emscripten without having to mess too much with the bitcode files, and without having to mess at all with fastcomp internals.

It turns out LLVM currently has a “C++ backend”: however, it produces C++ code making LLVM API calls, whose execution results in the construction of LLVM data structures representing your program. Obviously, this isn’t what we want here.

LLVM used to have a real C backend, translating LLVM bitcode to standalone C code. This is exactly what we need. Alas, this project was unmaintained, and it was discontinued by the LLVM developers mid 2012, with the release of LLVM 3.1.

The LLVM C backend seems to have been resurrected by Julia developers in the of summer 2014. The project is named llvm-cbe and is available on github (https://github.com/JuliaComputing). And it’s great, because it means now I can translate a D program back to C code … and then feed it to Emscripten!

Well, in theory. In practice, things didn’t come out very smoothly.

The C backend strikes back

Firstly, the LLVM C backend in its current state requires LLVM 3.7 exactly. And of course, you can’t compile it against fastcomp (the emscripten LLVM fork). So we’re gonna have to deal with two LLVM toolchains here, let’s hope everything will be compatible.

Secondly, the LLVM C backend sometimes generates invalid C code, i.e code which doesn’t compile. Once again, it seems that its authors restricted themselves to the bitcode subset targetted by clang (although, with a lot of effort, it’s possible to craft a C++ file whose compilation with clang will produce evil bitcode, and this evil bitcode will lead llvm-cbe into producing invalid C code: https://github.com/JuliaComputing/llvm-cbe/issues/2#issuecomment-236424508 ) .

I had to fork my version of llvm-cbe, and to fix some issues myself, hoping that my pull requests will be accepted.

The return of Emscripten

At this point, I’m able to translate some D programs to standalone-gcc-compilable C code. Which means the only thing left is to feed it to Emscripten.

So, to summarize the whole working compilation pipeline:

  • I generate LLVM bitcode for all my D source files with LDC (compiled against LLVM 3.7)
  • for practical reasons, I use llvm-link to merge all my bitcode files into one single bitcode file.
  • I compile this single bitcode file back to C source code, using llvm-cbe (also compiled against LLVM 3.7)
  • I feed the resulting C file to Emscripten (which internally uses its own fork of LLVM 3.9)
  • (Optionally, I feed the resulting C file to gcc, to generate a native app. It’s fun to be able to do that, and it’s a lot easier to debug)

Now comes the fun part: writing some D code which uses this toolchain. What I ended up with is a minimalistic real-time game using the SDL.

ss

You can play the demo at: http://code.alaiwan.org/dscripten/full.html

The source code for the demo, the build scripts, and the toolchain deployment scripts are available at:

https://github.com/Ace17/dscripten

Limitations, and the future

The D subset

At the moment, I’m only targetting a tiny subset of the D programming language. The language comes with a runtime, featuring a garbage collector. It’s my first journey to the land of D-asm.js, and there’s no way I’m bringing the D runtime with me – especially if I need to port it to Javascript first.

Although most D programs make heavy use of the runtime, with some effort, programs that don’t require it can be written. Just tell your linker to omit any default libraries, and your code is guaranted not to make any calls to the runtime.

This is what I’m aiming for at this point: having the runtime feature-level of C, which means no GC, no dynamic arrays, no runtime type information, no “new” operator, and no exceptions.

Due to an error in llvm-cbe (related to global variable declaration order), I don’t have support for classes yet.

On the other side, I still benefit from some great parts of D happening at compile-time: mixins, templates, traits, and of course CTFE.

I also don’t have Phobos, the D standard library. Most of it relies on the D runtime, or does API calls to the underlying OS. Technically, these parts would have to be rewritten in Javascript (and maybe one day integrated to Emscripten).

WebAssembly

As you may have seen if you played with the deployment scripts, the toolchain is not very pretty. It requires 3 directories to be added to your PATH, amongst them, two different versions of an LLVM toolchain.

An obvious enhancement would be to use the WebAssembly backend of the upstream LLVM, and then using Emscripten+Binaryen to convert the generated WebAssembly to asm.js. This would allow me to ditch llvm-cbe, emscripten-fastcomp and emscripten-fastcomp-clang, resulting in a lighter and more consistent toolchain.

At the moment, I only gave a quick (one weekend) try to this idea, but the incompatible version requirements of the different tools are driving me crazy. llvm-cbe requires LLVM version 3.7 and doesn’t provide a CMakeLists.txt. However the autoconf-based build system of LLVM is deprecated in favour of cmake, recent versions of LLVM (3.9 at this time of writing) don’t provide a ./configure script anymore.

However, I kind of like the idea of being able to convert anything back to C code. C compilers are going to be around for a long time, C as a target platform is an incredibly safer bet than any other format, including LLVM bitcode.

Your ideas

My goal is to continue writing my programs in D, in the most portable way, without fear of potential future platform restrictions. However, the technique I just described is too fragile at this point. It works around many incompatibilites and limitations. However, I believe this proof-of-concept paves the way for non-hacky enhancements.

If you have any idea on how to improve this technique, please let me know. Don’t hesitate to fork the project on github, to make suggestions, or to ask me questions, or even to suggest enhancements to the current flappy-bird-frustrating gameplay of the demo!

 

 

B.S. is for build system

I have recently stumbled upon this insightfull (TM) article:
The Convergence Of Build Systems And Package Managers

It’s a nice synthesis of the design issues we currently have with software build systems and package managers ; especially their current tendency to go monolithic – and thus, tightening the frontiers between programming languages.

At this in time (2016), each new programming language seems to require its own build system and/or package manager.

Perl has cpan, Python has easy_install, Javascript has npm, D has dub, Rust has cargo …

Clearly something is wrong here. The programming language shouldn’t be the central abstraction around which all my project revolves. I want to be able to mix any two languages in the same project, and have only ONE build system to invoke to build everything. And I mean in parallel, of course.

We can’t seem to get the build system architecture right. There really are lots of them, which means we might not have found the solution yet.

GNU Make, which is language agnostic, is definitely one popular standard, but has lots of limitations. Some of them are superficial (unusual cabalistic syntax, text-based parser), some others are a lot more worrying: automatic dependency generation, ‘remake’ pass, no clear difference between file targets and phony ones, the sole existence of ‘make clean’, the need to manage temporary files yourself, etc.

A lot of build systems try to lure us into thinking all the needed information can be deducted from the source files. It’s a fallacy!
Additional information, supplied at the build system level, will always be needed: think about OS dependent source files, generated source files, additional link libraries, DSL source files, pluggable factories (link-time registration), preprocessor defines, optimization flags, etc…

Some of those issues can be mitigated with compiler hacks like “#pragma comment(“winmm.lib”)” (good luck with portability, or library search directories)  … or text mixins (see http://dlang.org/mixin.html ) (good luck with automatic dependency generation).
My take on the issue is that we’re heading the wrong way. We’re trying to fit into source files information which really don’t belong to source files.

Our source files most probably already contain too much information.

A big part of the problem: build systems have to dig into source files to compute a part of the build configuration (especially the dependency graph). This is a violation of a layered architecture. It’s a lot more harder to be programming language agnostic  if you need to dig into source files. You can be mitigate the programming language agnosticity issue with more hacks like “gcc -MM” (which only outputs GNU make syntax) – but what about generated files? Especially if they’re going to be generated … in a separate “output” directory.

My point is the current compiler interface we all use might be wrong:

ObjectFile compile(SourceFile src)

What we might need is, at compiler command-line level, something more like:
ObjectFile compile(SourceFile src, SourceFile[] imports)

The array “import” files would be dynamically translated by the compiler to “imports/#include/use” directives.

In plain old command-line english, it goes like this:

$ gcc main.cpp --import lib.h --import client.h

You already know this technique. It’s called dependency injection!

This is a significant change in our programming habits. It requires additionnal discipline over our usage of ‘import’ directives, which we already respect most of the time, by having our ‘import’ directives grouped at the top of our source files. C programmers might feel a lot more restricted (they will not be able to include several time the same file with different symbol #defines).

It greatly reduces friction when it comes to testing classes ; no need for complicated dependency injection frameworks, just change your compiler command line to ‘import’ the mock version of its dependencies.

This technique also brings you, by construction, the whole DAG of your source file dependencies. No need for a first build to print a dependency graph, or for funny things like “dynamic dependency discovery”. After all, this graph IS static.

 

Generating mazes, part 2: adding keys and structure

In a previous post, I described how to generate a ‘logic’ maze, i.e a graph representing color-connected rooms. These mazes had no walls, nor corridors. In order to have something displayable, I had to workaround spatial layout issues by using a trick (starting from planar graph whose nodes already had a position – in one word, triangulating a bunch of random dots).

I also left alone the problem of key placement. I turns out having locked doors in your maze takes a whole new dimension, once you have the possibility to unlock them!

At this point, our mazes have no walls, nor keys.  These two issues are completely unrelated ; however, I’m going to describe how I solve them both.

First, the walls. The trick is to start with what I call a “grid-graph”. It’s nothing more than the graph of connections between adjacent cells in a grid ; here’s an example:

initial-grid-graph

Like we did before, we dig our maze by extracting a random spanning tree from this graph. However, there’s a slight twist here. First, by convention, the root node is always the top-left one. Second, In order to have corridors, I decided to favor keeping “horizontal” edges in my spanning tree. This way, the connections between cells are mostly horizontal, causing horizontal corridors to be generated. This kind of maze could be used, for example, in a 2D side-scroller.

spanning-tree

OK, but where do the walls come from? I store them in a 2D matrix of cells, which I call a CellMap. Each cell holds two booleans: connectedLeft and connectedUp. These two booleans represent the presence or absence of a wall at the left and above each cell.

Then, I need to be able to get the CellMap corresponding to one grid-graph. The trick here is to have a bidirectionnal mapping between graph nodes and cell positions, implying a bidirectionnal mapping between graph edges and walls.

The position of a node is simply encoded in its node index, like this:

auto nodeX = nodeId % MAP_WIDTH;
auto nodeY = nodeId / MAP_WIDTH;

Once we have a spanning tree, we can apply what we already did in a previous post: from the tree, extract one sub-tree, whose size is roughly one arbitrary fixed fraction of the whole tree. Then, close the door by coloring the edge connecting this sub-tree to its parent node.

zone1

Now that we have closed a red door, it’s the best time to drop the red key: we just have to avoid dropping it behind the red door. Remember, the root of the spanning tree is at the top left node. Although all gray cells would be theoretical acceptable locations for our red key, I chose to drop the red key in the deepest accessible leaf of the tree.

You can see below the red cross, representing … the red key.

zone1-key1

So far, so good, we have isolated the “red” zone. Now, let’s repeat the process for the next zone. First, delimiting it:

zone2

Then, dropping the corresponding key somewhere deep:

zone2-key2

Please keep in mind – it should be obvious at this point – that we’re locking zones in the reverse order. The player will first need to get the green key, go through the greed door, get the red key, then go through the red door.

One more thing: the simplistic key placement heuristic I described above might select a leaf-cell already holding a key. In reality, the generator computes the list of leaf-cells, and select the deepest one which doesn’t currently hold a key.

Skipping to the end: here are the 6 locked zones.

all-zones-all-keys

Just for fun, here’s the same algorithm, applied to a bigger map:

bigger-one

We now have zones, walls, and keys! What’s still missing? Cycles. I explained in my previous post how to handle them, I just didn’t implemented them in this demo.

The next step will be visual: my goal is to generate mazes suitable for 2D video games, so I’m going to need messing up with tiles and graphics, to achieve something like this:

deeep

As always, the whole source code for the generator is available on Launchpad here.

Reversing dependencies with pluggable factories

What if you could extend your application without modifying existing code, but only by adding new code? In software design circles, this is called the open-closed principle:

software entities (functions, classes, modules, applications, etc.) should be open to extension, but closed to modification.

At first, adding new behaviour to an existing code base without modifying it seems impossible. I’m going to describe here a design pattern called “pluggable factories” which allows to do exactly that.

TL;DR : think about plugins.

Let’s suppose we’re dealing here with a SVG rendering program. This program takes a SVG vector input, renders it to a frame buffer, then writes it to the disk in the BMP format. The high-level code may look like this:

// main.cpp
#include "lib_bmp.h"

int main(int argc, char** argv)
{
  auto input = argv[1];
  auto output = argv[2];

  auto svg = loadSvg(input);
  auto framebuffer = render(svg);
  writeBmp(framebuffer, output);

  return 0;
}

At first, this seems fine, until you decide that you need other output formats. Then the code starts to look like this:

// main.cpp
#include "lib_png.h"
#include "lib_gif.h"
#include "lib_bmp.h"

int main(int argc, char** argv)
{
  auto input = argv[1];
  auto output = argv[2];
  auto format = getExtension(output);

  auto svg = loadSvg(input);
  auto framebuffer = render(svg);

  if(format == "png")
    writePng(framebuffer, output);
  else if(format == "gif")
    writeGif(framebuffer, output);
  else if(format == "bmp")
    writeBmp(framebuffer, output);
  else
    fatal("Unknown format");

  return 0;
}

Clearly, something is wrong here. This code doesn’t conform to the open-closed principle, as it’s not immune to the addition of new formats. Each new format requires to modify the code of the main workflow. Thus, we will never be able to “close” this workflow code. That’s a pity, because the output format is becoming here an annoying detail ; what our application really does is “render SVG files to raster picture files”, regardless of the concrete output format.

One can mitigate the issue by extracting a new method ‘writePicture’:

// main.cpp
#include "lib_png.h"
#include "lib_gif.h"
#include "lib_bmp.h"

int main(int argc, char** argv)
{
  auto input = argv[1];
  auto output = argv[2];
  auto format = getExtension(output);

  auto svg = loadSvg(input);
  auto framebuffer = render(svg);
  writePicture(framebuffer, format, output);

  return 0;
}

void writePicture(Framebuffer pic, string format, string output)
{
  if(format == "png")
    writePng(pic, output);
  else if(format == "gif")
    writeGif(pic, output);
  else if(format == "bmp")
    writeBmp(pic, output);
  else
    fatal("Unknown format");
}

At least, the ‘main’ function now has a clean workflow: read input, render it, write it. It’s becoming readable again.

However, it still depends on the availability of ‘writePicture’, which in turn depends on the availability of … all the format functions. In short, although our ‘main’ workflow seems isolated from the low-level details, it still can’t be re-used, nor tested, without them.

The “dependency inversion principle” tells us that abstractions should not depend on concretions. In other words: don’t depend on something more volatile than you.

Currently, our workflow depend on concrete picture formats. We need to reverse this dependency: the concrete picture formats should depend on the workflow, and the workflow should depend on … nothing.
It might not sound familiar, but I guarantee you it is ; just have a look at the following code, it’s basic object-oriented design (i.e polymorphism):

// main.cpp
int main(int argc, char** argv)
{
  auto input = argv[1];
  auto output = argv[2];
  auto writer = createPictureWriterForFormat(getExtension(output));
  workflow(input, output, writer);
  return 0;
}

// workflow.h

struct IPictureWriter
{
  virtual ~IPictureWriter() {};
  virtual void writePicture(Framebuffer pic, string outputPath) = 0;
};

void workflow(string input, string output, IPictureWriter* format);

// workflow.cpp

void workflow(string input, string output, IPictureWriter* format)
{
  auto svg = loadSvg(input);
  auto framebuffer = render(svg);
  writePicture(framebuffer, format);
}

// output_formats.cpp
#include "workflow.h"
#include "output_format_png.h"
#include "output_format_gif.h"
#include "output_format_bmp.h"

unique_ptr createPictureWriterForFormat(string format)
{
  if(format == "png")
    return make_unique();
  else if(format == "gif")
    return make_unique();
  else if(format == "bmp")
    return make_unique();
  else
    fatal("Unknown format");
}

// output_format_png.h
#include "workflow.h"
#include "lib_png.h"

class PngWriter : IPictureWriter
{
  void writePicture(Framebuffer pic, string path)
  {
    writePng(pic, path);
  }
};

// output_format_gif.h

#include "workflow.h"
#include "lib_gif.h"

class GifWriter : IPictureWriter
{
  void writePicture(Framebuffer pic, string path)
  {
    writeGif(pic, path);
  }
};

// output_format_bmp.h
#include "workflow.h"
#include "lib_bmp.h"

class BmpWriter : IPictureWriter
{
  void writePicture(Framebuffer pic, string path)
  {
    writeBmp(pic, path);
  }
};

It’s a lot better now. Don’t be scared by the number of lines. That’s certainly more text, but that’s exactly the same amount of executable code. The only thing we did was separate this executable code into properly-labelled boxes.

Our SVG-to-raster conversion workflow can now handle new formats without having to be modified.

However, one thing still isn’t quite right: the “createPictureWriterForFormat” function. It’s called a “dependency magnet”, because during the lifecycle of the application, its dependency count is going to only grow, as more formats get added.

Don’t get me wrong: we certainly need to have, somewhere in our application, a list of output formats. However, as soon as such a list is present in one source file, we have created a dependency magnet. What if we could build such a list at link time?

Enter the pluggable factories.

The principle is very simple: each output format will use the global construction mechanism to register itself to the application. The result looks like this:

// output_formats.cpp
#include "workflow.h"

static map<string, WriterCreationFunc> g_formats;

void registerFormat(string format, WriterCreationFunc func)
{
  g_formats[format] = func;
}

unique_ptr<IPictureWriter> createPictureWriterForFormat(string format)
{
  auto i_func = g_formats.find(format);
  if(i_func == g_formats.end())
    fatal("unknown format");

  return i_func->second();
}
// output_format_bmp.cpp
#include "workflow.h"
#include "output_formats.h"
#include "lib_bmp.h"

class BmpWriter : IPictureWriter
{
  void writePicture(Framebuffer pic, string path)
  {
    writeBmp(pic, path);
  }
};

REGISTER(BmpWriter, "bmp")

As you can see, the growing list of #inclusions has now disappeared from “output_formats.cpp”. How does it work?

The macro “REGISTER” is actually defined like this:

#define REGISTER(FormatWriterClass, format) \
  static auto g_registered = registerMe<FormatWriterClass>(format)

So, far so good: the global initialization of “g_registered” will cause “registerMe” to be called ; then, how does the “registerMe” work?

Let’s see the whole file “outut_format.h”:

// output_formats.h
#include <functional>
#include <string>
#include <memory>

using namespace std;

typedef function<unique_ptr<IPictureWriter>()> WriterCreationFunc;

void registerFormat(string format, WriterCreationFunc func)

template<typename FormatWriterClass>
int registerMe(string format)
{
  auto creationFunc = []()
  {
    return make_unique<FormatWriterClass>();
  };
  registerFormat(format, creationFunc);
  return 0; // this value is actually unused
}

#define REGISTER(FormatWriterClass, format) \
  static auto g_registered = registerMe<FormatWriterClass>(format)

Don’t be scared by the lambda, the exact same effect can be achieved using a factory class instead (and please don’t tell me that I shouldn’t do “using namespace std” in a header file – that’s not the point here).

The registerMe function simply wraps the proper instanciation of make_unique (i.e the one which uses a concrete format writer class) into a “WriterCreationFunc”, which we put into a map.

And that’s it ; we can now add a new format by simply creating a new source file, and adding it to the linking process:

// output_format_tga.cpp
#include "workflow.h"
#include "output_formats.h"
#include "extra/libs/fast_tga.h"

class MyTgaWriter : IPictureWriter
{
  void writePicture(Framebuffer pic, string path)
  {
    fast_tga::writeTga(pic, path);
  }
};

REGISTER(MyTgaWriter, "tga");

Our application can now write .tga files, although no other source file contains nor refers to anything related to TGA ; that’s open-closed as its best!

Some comments: this only is the basic technique, which may need to be refined. Some compilers might issue a warning, as the static variable “g_registered” is unused. Also, using a simple global map isn’t going to work, because of the order of construction: the map might get constructed after some “g_registered” instances. All these problems can be solved by technical details that I’m not going to describe in this post. After all, this post is about abstracting details away! 🙂

Prototyping procedural maze generation algorithms using fibers

I’m fascinated by procedural generation algorithms, especially random maze generation. This morning, I was working on a graph-based dungeon generator.

Most maze generation algorithms I know of mix the “maze logic” (i.e the graph corresponding to the maze, including doors and keys) with the “maze layout” (the spatial layout of the rooms and corridors).
Actually, most of them are “maze layout” based.

Here, the idea is to start with the “maze logic”, without worrying too much about the “maze layout”. Spatial placement of rooms and corridors, according to a given “maze logic”, will be the subject of another blog post. However, to keep things simple, I only work here with planar graphs.

Here’s how it works:

  1. create a planar graph
  2. compute a random spanning tree for this graph
  3. let’s say you want to subdivide your maze into 4 zones (= 3 doors). The number of nodes per zone (NPZ) is: number of total nodes / 4.
  4. Repeat 3 times:
    1. for each tree node N, compute the total count of its direct or indirect reachable children ; in other words, the size, in nodes, of the subtree whose root is node N, excluding nodes behind doors.
    2. find the tree node N whose subtree size is the nearest of NPZ. Put a door here.
  5. Add some random edges (from the original planar graph), to create some cycles, but beware of not adding edges who would connect different zones.

Fine-tuning a maze generator requires the ability to see what happens between the steps. This means I’m going to need to stop the generation in the middle of the process, and display the state of the generation at the end of each step.

Here’s what the result looks like:

First, create a random planar graph. I do this by Delaunay-triangulating a bunch of random positions.

planargraph

Then, extract a random spanning tree. The red node represents the root of the tree.

spanningtree

Then, compute the number of grandchildren for each node.

grandchildren1

Find a node having a number of grandchildren nearly equal to 1/4 of the node count. This is the red-circled one, having exactly 20 children.

grandchildren2

Next, repeat the operation.

grandchildren3

Repeat a third – and last – time. The maze is now divided into nearly equally-sized areas.

grandchildren4

In the beginning, my state-machine implementation looked like this:

class MazeGenerator
{
public:
  this()
  {
    m_state = STATE_INIT;
  }

  void oneStep()
  {
    switch(m_state)
    {
      case STATE_INIT:
        createPlanarGraph();
        m_state = STATE_PLANARGRAPH;
        break;
      case STATE_PLANARGRAPH:
        createSpanningTree();
        m_state = STATE_SPANNINGTREE;
        break;
      case STATE_SPANNINGTREE:
        createDoors();
        m_state = STATE_DONE;
        break;
    }
  }
private:
  STATE m_state;

  void createPlanarGraph();
  void computeSpanningTree();
  void createDoors();
}

However, I got the feeling something was wrong. Especially when I started to want to visualize how each individual door was chosen.
Using an enum/switch introduces great friction when trying to add new steps to the generation algorithm. Dividing a step into two substeps is painfull, having variable number of steps is inconvenient.
It seemed like a good opportunity to play with fibers. Here’s how it goes:

class MazeGenerator
{
public:
  this()
  {
    m_fiber = new Fiber(&generate);
  }

  void oneStep()
  {
    m_fiber.call();
  }
private:
  void generate()
  {
    createPlanarGraph();
    Fiber.yield();
    createSpanningTree();
    Fiber.yield();
    createDoors();
  }

  void createPlanarGraph();
  void computeSpanningTree();
  void createDoors();
}

A lot simpler, isn’t it?

The whole maze generator sandbox project is available on Launchpad:
link

The source file corresponding to the very technique described in this article is here.