Gene genotyper
About this program
This program was designed to analyse the results from Illuminator,
which aligns short reads from a clonal sequencer to a reference sequence. Due to the massive
amount of data generated by a clonal sequencer it is impossible to check by eye where the
result is correct. Initially the program was designed to check variants found by Illuminator
were real using a different algorithm, however it became apparent that the program could
also screen a gene for know sequence variants and so genotype a whole gene or genomic region.
Similarly, it could also be used to automatically screen a disease gene for known pathological
mutations.
The program allows the user to either manual enter the sequence variants or import from
a text file with the format shown in the table below. Each line contains one variant, its
name, 5'flanking sequence, allele 1, allele 2 and the 3' flanking sequence each separated
by a '-'.
rs35659787-ctcagcctcc-a-g-gagtagctgg |
rs6503048-ctttatttat-c-t-ttttttgaga |
rs35993958-aaggagccag-c-g-ggggagcagg |
rs12951053-agtagtatgg-a-c-agaaatcggt |
rs12947788-taagaggtgg-a-g-cccaggggtc |
rs17880604-tccccaaggc-c-g-cactggcctc |
rs1625895-gtggccctcc-a-g-ggtgagcagt |
rs1800372-acacttttcg-a-g-catagtgtgg |
rs2909430-aggtgcttac-a-g-catgtttgtt |
rs9895829-agcgagctag-a-g-gagagttggc |
rs1794287-cccagccact-c-t-aggaggctga |
rs1800371-gatgctgtcc-c-t-cggacgatat |
rs17883323-ggacctggtc-a-c-tctgactgct |
Once the variants have been entered, select a folder that contains the read data has
*.fasta files (one per individual). The program will then read all the reads in each file
and output results to a text file called 'genotype.txt', formatted like the table below:
| cgatct_Q-5.fasta | | gcatgt_Q-5.fasta | |
rs35659787 | -/- ? | (a2 g2 x2939) | -/- ? | (a5 g1 x2312) |
rs6503048 | t/t | (c5 t3298 x29) | t/t | (c2 t2766 x22) |
rs35993958 | g/g | (c1 g1632 x11) | g/g | (c0 g1289 x5) |
rs12951053 | a/a | (a3737 c6 x10) | a/a | (a2879 c4 x12) |
rs12947788 | g/g | (a0 g3532 x4) | g/g | (a2 g2588 x12) |
rs17880604 | g/g | (c4 g3603 x14) | c/g | (c1359 g1426 x12) |
rs1625895 | g/g | (a5 g3224 x8) | a/g | (a1785 g1598 x10) |
rs1800372 | a/a | (a4128 g10 x34) | a/a | (a3260 g6 x38) |
rs2909430 | a/a | (a4639 g15 x58) | a/a | (a3999 g6 x65) |
rs9895829 | -/- ? | (a8 g7 x0) | a/g | (a1410 g1911 x1) |
rs1794287 | c/c | (c3885 t3 x11) | c/c | (c3082 t2 x9) |
rs1800371 | c/c ? | (c3576 t2 x1100) | c/c | (c2911 t1 x6) |
rs17883323 | a/a | (a2713 c9 x10) | a/a | (a2091 c17 x8) |
The first column identifies the variants name while the genotype and read depth for
each variant is shown in the next columns for each of the '*.fasta' files. The number
of times the flanking sequences are found in a read but with a different variant is
given by the 'x' value. If a 2nd variant is in the flanking read the will not be counted.
If the read depth is low, i.e. mapped to less than 100 read are the 'x' value is greater
than 20% of the total matched reads the genotype is flagged by a'?'.
This program is still under development; however it can be downloaded from
here
Computer requirements
This program runs on Microsoft Windows, using the .NET 2.0
environment which can be obtained from
here.
|