/ Pitiful Bioinformatics

RSPR: The Pitiful State of Software Engineering in Bioinformatics

RSPR is a tool to compute a topological distance between two phylogenies. It is an alternative to the Robinson-Foulds distance. The canonical implementation rspr has supports many different algorithms and options. Unfortunately, the authors decided not to use a standard way for parsing commandline arguments, such as getopt and instead roll their own method. Within that mess is the following abomination.

else if (strcmp(arg, "-pairwise") == 0)
{
  PAIRWISE = true;
  QUIET = true;
  // PREFER_RHO = true;
  if (max_args > argc) {
    char *arg2 = argv[argc + 1];
    if (arg2[0] != '-') {
      PAIRWISE_START = atoi(arg2);
      if (max_args > argc + 1) {
        char *arg2 = argv[argc + 2];
        if (arg2[0] != '-') {
          PAIRWISE_END = atoi(arg2);
          if (max_args > argc + 2) {
            char *arg2 = argv[argc + 3];
            if (arg2[0] != '-') {
              PAIRWISE_COL_START = atoi(arg2);
              if (max_args > argc + 3) {
                char *arg2 = argv[argc + 4];
                if (arg2[0] != '-') {
                  PAIRWISE_COL_END = atoi(arg2);
                }
              }
            }
          }
        }
      }
    }
  }
}

The -pairwise flag takes up to for optional arguments. Thus the following situations have to be parsed correctly.

rspr -pairwise
rspr -pairwise 1 2
rspr -pairwise 1 2 3 4
rspr -pairwise 1 2 -flag

If your code has a two-dimensional structure, you know that you are in trouble. In this case the indentation can be removed by complementing the conditions and using a goto (just for now).

else if (strcmp(arg, "-pairwise") == 0)
{
	PAIRWISE = true;
	QUIET = true;

	if (max_args <= argc) goto end_of_args;
	if (argv[argc + 1][0] == '-') goto end_of_args;

	PAIRWISE_START = atoi(argv[argc + 1]);

	if (max_args <= argc + 1) goto end_of_args;
	if (argv[argc + 2][0] == '-') goto end_of_args;

	PAIRWISE_END = atoi(argv[argc + 2]);

	if (max_args <= argc + 2) goto end_of_args;
	if (argv[argc + 3][0] == '-') goto end_of_args;

	PAIRWISE_COL_START = atoi(argv[argc + 3]);

	if (max_args <= argc + 3) goto end_of_args;
	if (argv[argc + 4][0] == '-') goto end_of_args;

	PAIRWISE_COL_END = atoi(argv[argc + 4]);

end_of_args:
	// do nothing
}

This is already much easier on the eyes. Also a pattern starts to emerge: Each optional argument is guarded by two conditions: There is a next argument and that next argument is not a flag (ie. begins with -). Unfortunately, using a goto is considered harmful so we should get rid of it here. There are many different solutions, with the following being the most readable.

else if (strcmp(arg, "-pairwise") == 0)
{
	PAIRWISE = true;
	QUIET = true;

	do {
		if (max_args <= argc)break;
		if (argv[argc + 1][0] == '-') break;

		PAIRWISE_START = atoi(argv[argc + 1]);

		if (max_args <= argc + 1)break;
		if (argv[argc + 2][0] == '-') break;

		PAIRWISE_END = atoi(argv[argc + 2]);

		if (max_args <= argc + 2)break;
		if (argv[argc + 3][0] == '-') break;

		PAIRWISE_COL_START = atoi(argv[argc + 3]);

		if (max_args <= argc + 3)break;
		if (argv[argc + 4][0] == '-') break;

		PAIRWISE_COL_END = atoi(argv[argc + 4]);
	} while (0);
}

I think that this last version is way more clear than the pyramid we started out with.

Verdict: get flattened.