Write a C program to compute the specified statistics on a file of
DNA sequences.
INPUT:
The program will read data from standard input using
fgets.
The data will be the contents of a DNA sequence file which will be in
the FASTA format:
- A line beginning with a ">" is the header line for the next sequence
- All other lines contain sequence data.
- There may be any number of sequences per file.
- A sequence may be split over several lines.
- Sequence data may be upper or lower case.
- Sequence data may contain white space which should be ignored.
OUTPUT:
- The number of sequences in the input file
- The total length of all sequences
- The average length of the sequences
- The maximum length of any sequence
- The minimum length of any sequence
- The overall percent of the letters A, T, C, and G in the sequences
(ignoring the case of the input).
- The overall percent of other letters in the sequences
Hints
Don't save the lines. Read in one line at a time, process it,
and throw it away.
-
To have your program read the contents of a file through
standard input, redirect the contents of the file into the program. From
the command line either of the following should work
./hw4 < test1.fsa
cat test1.fsa | ./hw4
Input Data Files