Insilicase

deus ex computa

	Home


		Desktop programs


			Sequence Analysis


			NGS Analysis


				AgileQualityFilter


				AgileSamFileSorter


				AgileSamFileMerger


				AgileAnnotator


				AgileGenotyper


				AgileVariantMapper


				AgileKnownSNPFilter


				AgileVariantViewer


				AgileGeneFilter


				AgileFileViewer


				Agile File Format


				AgileSMPoint


				Illuminator


					Illuminator user guide


					Illuminator algorithm


					Illuminator data extractor


					Illuminator Reference files


					Gene genotyper


					Search clonal read data


			Mapping software


			Miscellaneous


		Web programs


		Lab calculations


		History

Gene genotyper

About this program

This program was designed to analyse the results from Illuminator, which aligns short reads from a clonal sequencer to a reference sequence. Due to the massive amount of data generated by a clonal sequencer it is impossible to check by eye where the result is correct. Initially the program was designed to check variants found by Illuminator were real using a different algorithm, however it became apparent that the program could also screen a gene for know sequence variants and so genotype a whole gene or genomic region. Similarly, it could also be used to automatically screen a disease gene for known pathological mutations.

The program allows the user to either manual enter the sequence variants or import from a text file with the format shown in the table below. Each line contains one variant, its name, 5'flanking sequence, allele 1, allele 2 and the 3' flanking sequence each separated by a '-'.

rs35659787-ctcagcctcc-a-g-gagtagctgg

rs6503048-ctttatttat-c-t-ttttttgaga

rs35993958-aaggagccag-c-g-ggggagcagg

rs12951053-agtagtatgg-a-c-agaaatcggt

rs12947788-taagaggtgg-a-g-cccaggggtc

rs17880604-tccccaaggc-c-g-cactggcctc

rs1625895-gtggccctcc-a-g-ggtgagcagt

rs1800372-acacttttcg-a-g-catagtgtgg

rs2909430-aggtgcttac-a-g-catgtttgtt

rs9895829-agcgagctag-a-g-gagagttggc

rs1794287-cccagccact-c-t-aggaggctga

rs1800371-gatgctgtcc-c-t-cggacgatat

rs17883323-ggacctggtc-a-c-tctgactgct

Once the variants have been entered, select a folder that contains the read data has *.fasta files (one per individual). The program will then read all the reads in each file and output results to a text file called 'genotype.txt', formatted like the table below:

	cgatct_Q-5.fasta		gcatgt_Q-5.fasta
rs35659787	-/- ?	(a2 g2 x2939)	-/- ?	(a5 g1 x2312)
rs6503048	t/t	(c5 t3298 x29)	t/t	(c2 t2766 x22)
rs35993958	g/g	(c1 g1632 x11)	g/g	(c0 g1289 x5)
rs12951053	a/a	(a3737 c6 x10)	a/a	(a2879 c4 x12)
rs12947788	g/g	(a0 g3532 x4)	g/g	(a2 g2588 x12)
rs17880604	g/g	(c4 g3603 x14)	c/g	(c1359 g1426 x12)
rs1625895	g/g	(a5 g3224 x8)	a/g	(a1785 g1598 x10)
rs1800372	a/a	(a4128 g10 x34)	a/a	(a3260 g6 x38)
rs2909430	a/a	(a4639 g15 x58)	a/a	(a3999 g6 x65)
rs9895829	-/- ?	(a8 g7 x0)	a/g	(a1410 g1911 x1)
rs1794287	c/c	(c3885 t3 x11)	c/c	(c3082 t2 x9)
rs1800371	c/c ?	(c3576 t2 x1100)	c/c	(c2911 t1 x6)
rs17883323	a/a	(a2713 c9 x10)	a/a	(a2091 c17 x8)

The first column identifies the variants name while the genotype and read depth for each variant is shown in the next columns for each of the '*.fasta' files. The number of times the flanking sequences are found in a read but with a different variant is given by the 'x' value. If a 2nd variant is in the flanking read the will not be counted. If the read depth is low, i.e. mapped to less than 100 read are the 'x' value is greater than 20% of the total matched reads the genotype is flagged by a'?'.

insilicase icon This program is still under development; however it can be downloaded from here

Computer requirements

This program runs on Microsoft Windows, using the .NET 2.0 environment which can be obtained from here.