Quick user guide with the ATOH7 dataset
Introduction
This guide is a quick demonstration of the use of AgileVariantViewer with data used to identify a mutation in the ATOH7. Since this analysis
identified a number of possible deleterious sequence variants it was necessary to screen the data using AgileGeneFilterer
it prioritise the screening of the patients sequence variants in the patients. As with the PXDN data, this data is derived from a single sequencing run using a DNA sample enriched for a
number of genomic regions.
Entering the sequence variants data
Enter the the gemomic annotation file and the read depth file created by AgileAnnotator and the sequence variants file created by
AgileAnnotator and optionally filtered by AgileGeneFilterer.
File formats
A description of the file formats for the variant and read depth files used by AgileVariantViewer can be found here.
Viewing the sequence variant data
When the Data view window opens it displays a graphical view of the analysis data for chromosome 1 in the upper panel (Figure 2).
Below the graphical display panel are five other panels which allow the sequence variants to be filtered and then exported. Each of these panels is described in
detail below:
The upper panel displays the analysis data organised as five horizontal strips (Figure 3).
- Strip A: shows the location of any genes (black rectangles) in the selected region with the green and orange rectangles representing exons transcribed from
the positive and negative DNA strands respectively.
Placing the cursor over a gene in this strip causes the gene's name to appear in the windows Title bar.
- Strip B: shows the position of exons whose typical read depth is above (grey block) or below (red block) the cut off value selected from the
Exon read depth cut off list in the Region view options panel (see below).
- Strip C: shows the location of sequence variants that will affect function of a protein. These variants include those that create in frame stop codons,
insertions and deletions and splice site variants. The vertical lines at the top of the strip represent homozygous variants, while the vertical lines at the
bottom of the strip correspond to heterozygous variants.
- Strip D: shows variants that change the amino acid sequence of a protein as well as variants close to the start codon which may affect protein translation,
these variants may or may not affect the proteins function. The vertical lines at the top of the strip represent homozygous variants, while the vertical lines at
the bottom of the strip correspond to heterozygous variants.
- Strip E: shows sequence variants that are unlikely to affect a protein's function since they are either in an intron or do not change a proteins amino acid sequence.
Again, the vertical lines at the top of the strip represent homozygous variants, while the vertical lines at the bottom of the strip correspond to heterozygous variants.
Select the candidate region if known
Autozygosity mapping of a number of affected individuals using Affymetrix SNP 6.0 data had identified a region on chromosome 10 as showing linkage to the disease phenotype. A DNA
sample from an affected individual was then enriched for this region and sequenced. Chromosome 10 was selected using the
Chromosome list in the
Region view options panel (Figure 2)
To select a sub region set the Search method list in the Region view options panel to manual and place the cursor over the end of the
region and press the right hand mouse button. Then place the cursor over the start of the region and press the lefthand mouse button. This should place two vertical lines that delimit
the region of interest (Figure 3), to zoom in to the region press the Zoom button in the Region view options panel (Figure 4).
Adjust the variant cut off parameters.
The sequence variant data used in this example has been filtered using AgileGeneFilterer so those variants with a
RS number can be identified. Since sequence variants with a RS number are probably true positives use this to set the cut off values which discord false positives, but retain true positives.
Select the Only SNPs with a RS number option in the Variant status panel. Then increase the Minimum read depth
and Minor allele cut off values in the Read depth options panel until sequence variants start to disappear, but the majority of sequence
variants are still visible. If the region is autozygous, try to remove all of the heterozygous variants. However some variants that appear to be heterozygous may remain if the region is duplicated
elsewhere in the genome. If sequence reads derived from the duplicated sequences are slightly divergent, but still mapped to the region of interest the divergent positions will appear as
heterozygous variants.
Viewing sequence variants not found in the 1000 Genome Project
Select the Only Unknown variants option in the Variant status panel to display sequence variants that have not been identified in the 1000
Genome Project. If the region is autozygous, the majority of the variants should be homozygous. If a large proportion of the variants are heterozygous, increase the
Minimum read depth and Minor allele cut off values in the Read depth options panel until the majority of
variants are homozygous. Select the Variant severity option in the Variant location and type panel and chane the value in the
Variant severity list to 0 to discord all variants that do not affect a protein's amino acid sequence.
Exporting sequence variants
If the disease variant is thought to be autozygous selected the Homozygous option in the Export data options panel, while if the
disease is thought to be recessive, but the patient is not autozygous then select both the Homozygous and Exclude genes with only one heterozygous
variant options in the Export data options panel. However, if the disease is thought to be dominantly inherited do not select
either option. Choose, where to export variants from the Whole genome, Current chromosome or Selected region
(the region currently displayed) by selecting the appropriate option and then pressing the Export button. The exported sequence variants file generated while writing
this guide can be found here. A second ATOH7 sequence variant file was created that contained all the valid sequence variants from the entire genome was
created and then used in the user guide for AgileGeneFilter.