Consensus site finder
Purpose of this page
This page is designed to identify sequences that match or are similar to a consensus
sequence. This sequence may contain invariant positions which are entered as uppercase
characters. The program is not designed to deal with ambiguous nucleotide codes when
scoring the level of homology between the consensus site and a possible match.
Since it is possible to find numerous matches to a simple consensus sequence only the
first 500 matches will be returned.
A Windows program that
duplicates this page can be downloaded here.
Enter the consensus sequence with invariants positions entered as uppercase letters.
Enter the minimum number of positions that must match the consensus site.
Enter the sequence to be searched in the text box below (maximum length is 50,000bp)
Instructions and test data
To screen a sequence for a consensus site enter its sequence in the first text box. Invariant positions
should be entered as capital letters, e.g. the splice site consensus sequence is entered as GTaagt since
the first GT is invariant, but the last four bases may be any nucleotide. To reduce the number of matches
you can specify that some of the variable positions must also matches the consensus site. For example entering
the number 2 in the second text box will ignore matches which do not contain the invariant GT and at least 2
of the other residues.
Finally, copy the target sequence into the first large text box and then press 'Submit' The
location of any consensus sites in the sequence will be shown in the lower text
box.
While the consensus sequence is case sensitive the test sequence is not case sensitive
but spaces, numbers and end off line characters will
be screened as well as the 'normal' sequence. Therefore the sequence will have all formatting removed before
been screened.
|