The Complexity Elephant

Thoughts about software engineering, and game development.

Topics

1. Fancy Tools Can’t Tame Complexity

“The amount of energy necessary to refute bullshit is an order of magnitude bigger than to produce it.” (Alberto Brandolini)

TL,DR: Bugs are only the tip of the iceberg … and over-complicated designs eat fancy tools for breakfast.

1.1. Some reversing

One day, I was playing with reverse-engineering challenges (crackmes), and I had a funny though.

The principle of crackmes is simple: you’re given a small executable file, for which you don’t have the source code. When launched, the executable prompts you for some secret password. Your goal is to find the password, generally by the analysis of the given executable binary.

Binary reversing requires many tools, amongst them a debugger (gdb/Ollydbg/x64dbg/windbg), with every conceivable type of conditionnal breakpoint/tracepoint, time travel, dumps, etc.

And I had a funny thought: to understand the workings of a purposedly-obfuscated closed source program, I was using nearly the same tools and techniques we use to understand our own code.

1.2. The QA game

“Everyone knows that debugging is twice as hard as writing a program in the first place” (Denis Kernighan)

Let’s play a game, which I call "the QA game".

The rule is simple: I write a program, following some known spec, and you must find all bugs in it. However, I’m an evil coder, and my goal is to sneak in bugs that you won’t find.

Do you want to play with me?

You can’t win at this game. Sneaking in a bug will take me “one order of magnitude” less energy that what you will need to detect it. To be able to beat me at this game, you might need, at least, a team of 10 skilled QA engineers, and probably, access to the source code.

I think there’s a deep idea behind this, probably something related to entropy : it’s always easier to make a mess than to clean it afterwards. And, probably, easier “by one order of magnitude”.

Tools work exclusively on the “messy” side. I’m not talking exclusively about debuggers ; I’m also talking about compilers, sanitizers, static analyzers, and, to some extent, fuzzers and integrated tests. Whether we’re looking at the source code or doing live debugging, we’re always dealing with artefacts in which the mess has already occured.

These tools are thus fundamentally condemned to require horribly sophisticated techniques/heuristics, in order to “untangle” the mess that the programmer wrote.

Relying on this “mess cleaning” side to understand the program you’re working on is an extremely slow approach. Sometimes, it’s the only possible one (e.g malware analysis).

But more often, it’s simply the default approach most developers think of. I have a bug? Let’s fire-up the debugger ! (on this specific point, one could argue that Visual Studio’s great debugger harmed the software industry!).

Detecting programing errors is well and nice, but it’s “one order of magnitude” less efficient than not introducing them in the first place. Some codebases are full of traps. You want a codebase that minimizes your probability of introducing new mistakes.

1.3. The QA silver bullet: a magical checker

Let’s suppose that one day, you get a magical checker who is able to tell you, instantly and with certainty, if your code is correct or not (for whatever definition of "correct"). You run it several times per minute, so, after each modification, you know if you’ve just introduced an error or not.

Wouldn’t it be great?

Short answer: no.

Some software designers would end up into some bizarre dead end: They wouldn’t be able to modify their program anymore. The design would have become so complicated than the vast majority of attempts at modification would introduce a new mistake. The magical checker would catch it early, and the bad software designer would have to undo its modification, and try again.

They would become trapped in their own over-complicated design. Fragile code would have been transformed into rigid code ; and the goal would have been missed.

You don’t need a magical checker to experience this situation: a badly written suite of unit tests is enough - and it’s possible that many of our codebases already are trapped in such dead-ends, although we can’t always directly see it.

Thus, even the best possibly imaginable checker tool doesn’t really solve the problem of writing programs.

This is because catching all my mistakes early is only the tip of the iceberg. I also want to be able to add features in a reasonnable time.

In one word, I want my code to be correct, but also flexible. And flexibility isn’t a refinement of correctness. Quite the opposite, in fact.

Non-intrusive test methods are exactly the tools that act on the “mess cleaning” side I was describing earlier.

1.4. Writing and fixing prose

Let’s suppose you put a 9-year old in front of a text processor, and you ask him to write a 300-page novel.

Now, you read the result. The first thing you notice is that it’s full of spelling mistakes, and ambiguous wordings.

Now, you say to yourself: “ok, so now, I just have to fix all the mistakes”, and you launch your favorite spellchecker.

Interestingly, after the spellchecking + corrections, the text is still very poor, and you don’t understand why. Yet, you’ve fixed all the mistakes you could find!

You read the text again, and this time, you notice that some very subtle spelling mistakes have made their way through the spellchecker. So you think: I need a better spellchecker!

If you can find a spellchecker able to tell you what corrections you need to do to transform your initial text into Shakespeare, good for you ; your tool is very powerfull (and can probably replace you).

But you don’t necessarily want Shakespeare ; Dan Brown would be okay!

So you decide to ignore the remaining spelling mistakes and start digging into semantics. And then, you realize that the text is full of inconsistencies, like characters using information they’re not supposed to know. Or things happening simultaneously in several places. Or causality violations.

And then, you say to yourself "I know what I need: a tool to check the consistency of a story!". And by chance, it’s exactly the subject of a research paper published 3 monthes ago, describing a heuristic to detect inconsistencies in written stories. Great!

Except that it doesn’t detect all inconsistencies, and worse: it doesn’t tell you how to fix the ones it detects.

So you start patching one by one all the errors reported by your new tool. Two weeks of tedious rewording later, the tool doesn’t report anything anymore.

You end up with a patched text. Each paragraph is now nearly consistent, but still, the full text isn’t. By focusing on mistakes detected by the tool, you have, without knowing it, applied fixes to very local issues, and ignored global ones.

And this way of fixing has simply moved the errors (inconsistencies) out of the detection zone or your new fancy tool. Even worse: it’s now even more difficult to modify your story without creating new inconsistencies.

In one word, you just made your novel more difficult to validate, and more rigid. And in the process, you have blinded a state of the art error-detection tool. And your novel still doesn’t even come near Dan Brown.

I think we all agree that this clearly isn’t a good way to write a novel. A better way would probably to teach your writer how to write better.

Don’t get me wrong: I’m not saying we should trash our error checking tools, absolutely not! I’m saying that we consider them like educationnal tools, instead of seeing them like validation tools.

This means that this is your 9-year old, not you, who must analyze and understand each reported mistake, in order not to fix his text, but in order to fix his writing technique.

It’s intrusive ; it’s a feedback to the “mess making” side from the “mess cleaning” side, that allows the writer to understand how not to make a mess (and by the way, it’s, to my opinion, the biggest benefit of TDD).

1.5. Back to software development

With software development, it’s roughly the same story.

There’s some “writer” process, which corresponds to the modification of your source code, and a “reviewer” process, which corresponds to the analysis and understanding of what you have written.

Of course, in practice, both processes are intertwined (except for adepts of “Debug-Later Programming”, as described in http://blog.wingman-sw.com/archives/16 ).

There are lots of ways you can “analyze” what you’ve written. Anything that gives you feedback about you code counts. This includes: unit test failures, poor code coverage, static analysis failure, bug reports from customers, build times, high cyclomatic complexity, integration tests failures, high defect rate, etc.

This is how one learns to keep the code simple.

If your testing methodology doesn’t have feedback to modify the way you write code, you’re doing it in a suboptimal way.

I postulate that having this feedback is the only viable way to write software.

Bugs are only the tip of the iceberg. You don’t want to waste your time attacking them one by one. You have to attack instead the underwater factory that’s producing those bugs at a faster-than-you-can-handle rate.

And it has a name, it’s called software engineering.

The Complexity Elephant

Topics

Software Development

Geometry

Procedural content generation

Software Security

Meta