Region of interest PHRED scores
About this program
This program was produced as a way to measure the quality of DNA sequences created on an ABI sequencer.
This data was then used to compare the quality of the sequence to the rate of false positives and negatives
of a program I was writing.
The program uses the ABI (*.phd.1) files generated by the ABI base caller and calculates the average Phred
score for a stretch of DNA from the file that is homologous to a user supplied sequence defining the region
of interest. If the sequence is not homologous to the full length of this sequence the program calculates the
average Phred score of the homologous sequence. If this value is high it suggests that the sequence contains
a heterozygous indel or diverges from the supplied sequence.
The program asks for a cut off value, if the average Phred value is lower than this the sequence is said to
have failed quality control. Similarly, if the program can not align the sequence in the Phred quality file
to the sequence of the region of interest the trace file is deemed to have failed.
The program analyses all the files in a folder and creates an output file that that contains the average Phred
scores for the alignment and a summary of whether the sequence passes the quality control steps.
While using the program to analyse 2000 files it became apparent that the Phred score of the worst 10 nucleotides
in the sequence is a better test for the presence of false positives.
For short description on how the Phred scores are calculated look here.
This program can be downloaded
here.
10 quality files and a text file containing the sequence of the region of interest can be downloaded
here. These sequences contain a number of sequence variants that
are reported as 'N'.
|