/ Pitiful Bioinformatics

FASTA: The Pitiful State of Software Engineering in Bioinformatics

“A QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.”

FASTA is the simplest file format ever invented by humans. However, the lack of a formal specification leads to high variation in the various implementations out there. Here is an assorted list of what could possibly go wrong.

It should be apparent that parsing FASTA files, reliably is not a simple task. Please avoid writing your own FASTA parser and use one of the many available out there. For C and C++ I can recommend kseq and pfasta. If you use python or perl, use the modules that come with BioPython or BioPerl.