(De)randomize your tests

Table of Contents

Testing your code is important. So much is obvious. But not everything can be tested with static test cases. Also, 100% test coverage is not a one-time achievement but its a process. When the code changes, the tests have to adapt. Plus, if you don’t trust yourself to write bug-free code (thus the tests), why do you assume your tests are bug-free or even sufficient? Even if your tests cover all code lines, some code paths may still have issues.

Randomizing #

For some usecases it may be “useful” to not only have static use cases, but also randomly generated input. If you have full control over all parameters of your generated input, you know before hand the result your code should give and thus validate it.

Lets try to make things more mathematical. Let \(f : P → A\) be your code that computes an answer \(a \in A\) for a problem instance \(p \in P\). Now you need a random variable \(R_A : \Omega → A\) that generates random answers. Given the answer it is often easy to produce the test case \(f^-1(R_A)\). Now all you have to validate is iff \(f(f^-1(r)) = r\) for any \(r = R_A\).

See also Make your tests fail a talk from FrosCon 2015

Derandomizing #

Unfortunately, for andi the result is only approximately the input. Thus the test may fail, even if we allow the results to deviate by a few percent. Adjusting this threshold lets us choose an appropriate point on the ROC curve. But when a test fails we can never be certain, if we broke the code with the last change or if the test case itself was an unfortunate instance.

Randomly failing tests are not a problem, when executed manually. In these case we can just rerun the tests again or inspect the test cases. But fails prove problematic for two other reasons:

continuous integration
reproducible builds

Of course you want to run your tests with CI. However, you have to ensure that a random test fail does not cause a red alarm. Further more you want to ensure that you have access to the test case which cause the alarm.

Reproducible builds are all about bit-identical executables. If you build your code twice and one instance fails the unit tests and the other does not, all hell breaks loose.

If you run your tests on foreign infrastructure such as TravisCI or Debian build servers, you do not have file access (for obvious reasons). So when your tests fail you have to get the failing instance by other means. This is where derandomization comes in. Instead of having your programs generate random numbers via srand(time(NULL)) or similar, seed with a special value srand(SEED) . The SEED value should come from the environment.

Now you can control the test via the SEED environment variable. If you nail it to a specific value you can ensure that the tests are passed (unless some severe breakage happens). Likewise, by trying different SEED values you can find failing test cases and fix theses.

All-in-all randomizing and then derandomizing unit tests can help find bugs and even enable reproducibility of results.