Insilicase
deus ex computa
     Skip Navigation Links.

 

Gene genotyper

About this program

This program was designed to analyse the results from Illuminator, which aligns short reads from a clonal sequencer to a reference sequence. Due to the massive amount of data generated by a clonal sequencer it is impossible to check by eye where the result is correct. Initially the program was designed to check variants found by Illuminator were real using a different algorithm, however it became apparent that the program could also screen a gene for know sequence variants and so genotype a whole gene or genomic region. Similarly, it could also be used to automatically screen a disease gene for known pathological mutations.

The program allows the user to either manual enter the sequence variants or import from a text file with the format shown in the table below. Each line contains one variant, its name, 5'flanking sequence, allele 1, allele 2 and the 3' flanking sequence each separated by a '-'.

rs35659787-ctcagcctcc-a-g-gagtagctgg
rs6503048-ctttatttat-c-t-ttttttgaga
rs35993958-aaggagccag-c-g-ggggagcagg
rs12951053-agtagtatgg-a-c-agaaatcggt
rs12947788-taagaggtgg-a-g-cccaggggtc
rs17880604-tccccaaggc-c-g-cactggcctc
rs1625895-gtggccctcc-a-g-ggtgagcagt
rs1800372-acacttttcg-a-g-catagtgtgg
rs2909430-aggtgcttac-a-g-catgtttgtt
rs9895829-agcgagctag-a-g-gagagttggc
rs1794287-cccagccact-c-t-aggaggctga
rs1800371-gatgctgtcc-c-t-cggacgatat
rs17883323-ggacctggtc-a-c-tctgactgct

Once the variants have been entered, select a folder that contains the read data has *.fasta files (one per individual). The program will then read all the reads in each file and output results to a text file called 'genotype.txt', formatted like the table below:

cgatct_Q-5.fastagcatgt_Q-5.fasta
rs35659787-/- ?(a2 g2 x2939)-/- ?(a5 g1 x2312)
rs6503048t/t(c5 t3298 x29)t/t(c2 t2766 x22)
rs35993958g/g(c1 g1632 x11)g/g(c0 g1289 x5)
rs12951053a/a(a3737 c6 x10)a/a(a2879 c4 x12)
rs12947788g/g(a0 g3532 x4)g/g(a2 g2588 x12)
rs17880604g/g(c4 g3603 x14)c/g(c1359 g1426 x12)
rs1625895g/g(a5 g3224 x8)a/g(a1785 g1598 x10)
rs1800372a/a(a4128 g10 x34)a/a(a3260 g6 x38)
rs2909430a/a(a4639 g15 x58)a/a(a3999 g6 x65)
rs9895829-/- ?(a8 g7 x0)a/g(a1410 g1911 x1)
rs1794287c/c(c3885 t3 x11)c/c(c3082 t2 x9)
rs1800371c/c ?(c3576 t2 x1100)c/c(c2911 t1 x6)
rs17883323a/a(a2713 c9 x10)a/a(a2091 c17 x8)

The first column identifies the variants name while the genotype and read depth for each variant is shown in the next columns for each of the '*.fasta' files. The number of times the flanking sequences are found in a read but with a different variant is given by the 'x' value. If a 2nd variant is in the flanking read the will not be counted. If the read depth is low, i.e. mapped to less than 100 read are the 'x' value is greater than 20% of the total matched reads the genotype is flagged by a'?'.

insilicase icon This program is still under development; however it can be downloaded from here

Computer requirements

This program runs on Microsoft Windows, using the .NET 2.0 environment which can be obtained from here.



Copyright © 2011 Insilicase.

 

   
  54.225.47.124