“A QA Engineer walks into a bar. Orders a beer. Orders 0 beers. Orders 999999999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.”
FASTA is the simplest file format ever invented by humans. However, the lack of a formal specification leads to high variation in the various implementations out there. Here is an assorted list of what could possibly go wrong.
- Trash memory when an empty file is read.
- Not differentiating between name and comment
- Treating comments as mandatory (leading to crash if missing)
- Occasionally fail to read from pipes
- Stack overflow triggered by wrong assumptions
- Creating invalid example files with empty names
- Adding white space to the end of sequences
It should be apparent that parsing FASTA files, reliably is not a simple task. Please avoid writing your own FASTA parser and use one of the many available out there. For C and C++ I can recommend kseq and pfasta. If you use python or perl, use the modules that come with BioPython or BioPerl.