Degenerate motif finder

Purpose of this page


This page is designed to find the location of degenerate motifs in a sequence. The sequence can be either an amino acid (one letter AA code) or a nucleotide sequence and the motif can be as complex as needed, however due to its flexibility the program does not attempt to match the motif to sequences containing ambiguous sites. Since it is possible to find numerous sites, only the first 500 will be returned.

insilicase icon A Windows program that duplicates this page can be downloaded here.

A Java command line program that duplicates this page can be downloaded here.  

Add letters (nucleotide or amino acid) that may be present at this place in the consensus site.


Add a gap that may be present at this place in the consensus site


  Remove last residue in motif

Enter your target sequence below (maximum length is 50,000bp)


Instructions and test data

To screen a sequence for a degenerate motif a template must be created. This is done in steps, for example to add the motif for the enzyme Hin4I (GAYNNNNNVTC), first type G into the first text box and press 'Add letters'. Repeat for A, but for the next base 'Y' type CT in the box and then press add. To include a gap in the motif type the number of residue the gap spans, in this case type '5' and then press 'Add a gap'. If the gap is variable in length type the minimum gap length, then a '-' symbol followed by the maximum gap length (i.e. for a gap of 3 to 5 residues you would type 3-5). V is the IUPAC 1-letter code abbreviation for A, C or G, so for the position type ACG and the press 'Add letter'. Finally enter the last two residues as before. If you make an error press the 'Delete last' button. When you have done this the motif will be shown below the second text box and should look like this:


Finally, copy the sequence to the first large text box and then press 'Submit' The location of any consensus sites in the sequence will be shown in the lower text box.

The search is case insensitive but spaces, numbers and end off line characters will be screened as well as the 'normal' sequence. Therefore the sequence will have all formatting removed before been screened.

