In part one we used a switch statement to find the complement of a given base. Note how the nucleotides in the code are arranged in what looks like a table. However, this table is laid out in instructions space. Let's convert it to data space instead.

char *revcomp_table(const char *begin, const char *end, char *dest)
{
size_t length = end - begin;
char table[256] = {0};
table['A'] = 'T';
table['C'] = 'G';
table['G'] = 'C';
table['T'] = 'A';

for (size_t i = 0; i < length; i++) {
unsigned char c = begin[length - i - 1];

char d = table[c];

dest[i] = d;
}

return dest + length;
}


Except for the loop this code no longer contains a branch, thus there is nothing that can be mispredicted. Indeed, comparing the runtime of the switch with the table-based version, the latter is easily ten times faster across a range of different CPUs. For instance here are the numbers on a laptop from 2015:

./Brevcomp --benchmark_filter='switch|table'

---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
bench/revcomp_switch    8445060 ns      8332779 ns           83
bench/revcomp_table4     740910 ns       729380 ns          936


To be continued