Phred score calculation
About this page
This page is designed to help explain how a PHRED score is derived from four parameters that
are calculated as described below:
|
Peak spacing For each window of 7 bases the largest and smallest spaces between two
peaks are found and used to calculate the ratio of the largest space divided by smallest space
(S(l)/S(s)). If the sequence is evenly spaced this ratio equals 1. |
Figure 1 | |
|
Uncalled to called peak height ratio For each window of 7 bases the ratio of the height
of the largest uncalled peak (P(lu)) against the height of the shortest called peak (P(sc)) is found.
If no uncalled peak is present in the window then the highest background value under a peak is used. |
Figure 2 | |
|
Uncalled to called peak height ratio in a three base window This is calculated
the same as the last parameter, but uses the values from a 3 base window. |
Figure 3 | |
|
Peak resolution The number of bases between the current base and the nearest
unresolved base (i.e. a base that is called as āNā) is found and then multiplied by -1. |
Figure 4 | |
These parameters were found in a large training set of 10s of thousands of sequences where the correct
base was known. By finding how often the correct base was called for a certain set of parameter values, it
is possible to find the probability that the current base is correct. This information was used to create
a large look-up table which used each time a sequence is analysed. The probability value is converted in
to a Phred score by multiplying the Log(10) of this value by -10.
Citations
For a full description of the methodology read these citations:
Ewing B, Hillier L, Wendl MC, Green P.
Base-calling of automated sequencer traces using phred. I. Accuracy assessment.
Genome Res. 1998 Mar;8(3):175-85.
and
Ewing B, Green P.
Base-calling of automated sequencer traces using phred. II. Error probabilities.
Genome Res. 1998 Mar;8(3):186-94.
|