Insilicase

deus ex computa

	Home


		Desktop programs


			Sequence Analysis


			NGS Analysis


			Mapping software


			Miscellaneous


		Web programs


			Chromosome lengths


			Consensus site finder


			Degenerate motif finder


			Exon finder


			Fusion protein (RE)


			Fusion protein (PCR)


			Genetic distance


			Heterozygous InDels


			Invert sequence


			List comparison


			Microsatellite peaks


			Mismatch Tm


			Mutagenesis (silent)


			Mutation severity


			ORF annotation


			Phred score calculation


			Protein cleavage fragments


			Random sequence


			Residue codes etc


			Restriction sites


			RFLP enzyme picker


			Stats decision tree


		Lab calculations


		History

Phred score calculation

About this page

This page is designed to help explain how a PHRED score is derived from four parameters that are calculated as described below:

	Peak spacing For each window of 7 bases the largest and smallest spaces between two peaks are found and used to calculate the ratio of the largest space divided by smallest space (S_(l)/S_(s)). If the sequence is evenly spaced this ratio equals 1.
Figure 1
	Uncalled to called peak height ratio For each window of 7 bases the ratio of the height of the largest uncalled peak (P_(lu)) against the height of the shortest called peak (P_(sc)) is found. If no uncalled peak is present in the window then the highest background value under a peak is used.
Figure 2
	Uncalled to called peak height ratio in a three base window This is calculated the same as the last parameter, but uses the values from a 3 base window.
Figure 3
	Peak resolution The number of bases between the current base and the nearest unresolved base (i.e. a base that is called as ‘N’) is found and then multiplied by -1.
Figure 4

These parameters were found in a large training set of 10s of thousands of sequences where the correct base was known. By finding how often the correct base was called for a certain set of parameter values, it is possible to find the probability that the current base is correct. This information was used to create a large look-up table which used each time a sequence is analysed. The probability value is converted in to a Phred score by multiplying the Log₍₁₀₎ of this value by -10.

Citations

For a full description of the methodology read these citations:

Ewing B, Hillier L, Wendl MC, Green P.
Base-calling of automated sequencer traces using phred. I. Accuracy assessment.
Genome Res. 1998 Mar;8(3):175-85.
and
Ewing B, Green P.
Base-calling of automated sequencer traces using phred. II. Error probabilities.
Genome Res. 1998 Mar;8(3):186-94.