AgileVariantMapper guide

User guide

Introduction

AgileVariantMapper is designed both to visualise and to re-genotype sequence variant data from exome sequencing projects, with the particular aim of mapping recessive disease loci in consanguineous individuals. The genotype data can be exported as a tab-delimited text file that can then be analysed using AutoSNPa or IBDFinder. (If data from a number of individuals has been generated using AgileGenotyper, AutoSNPa may be more appropriate, since AgileGenotyper creates files with a consistent set of previously known sequence variants. On the other hand, when the variant data are either (a) derived ab initio, as sequence calls differing from the reference sequence, or (b) include SNP microarray data, IBDFinder should be used.)

File formats used by AgileVariantMapper.

Figure 1: The format of the tab-delimited text file used by AgileVariantMapper; each column must have the correct header title text.

AgileVariantMapper can import data in three different formats:

1) The format used by the other members of the Agile suite of sequence analysis programs, as described here.

2) The format exported by AgileGenotyper, which can also be imported by AutoSNPa and IBDFinder.

3) A tab-delimited text file (used in conjunction with the *.xls file extension) as shown in Figure 1 and described below.

The tab-delimited text file must contain columns identifying the chromosome, chromosomal position, the read depth of the two commonest nucleotides, the read depth of the alternative base and if it has one, the RS number of the variant. The order of these columns within the file is not important, but the first line of the file most contain the correct titles for each column, as shown in Figure 1. It is not necessary to specify the nucleotides corresponding to each allele.

Importing Data

Figure 2: Data import using the Select button (underlined in blue).

Data are loaded into AgileVariantMapper by pressing the Select button (underlined in blue in Figure 2), selecting the required file extension and then choosing the data file.

Interpreting and manipulating the data

Figure 3: The current chromosome is selected using the chromosome list (underlined in blue).

The variant genotypes are displayed in a similar manner to AutoSNPa, with heterozygous variants represented by yellow horizontal lines and homozygous variants in black. Consequently, autozygous regions can be visualized as an extended block of black horizontal lines, interrupted by few if any yellow lines. In Figure 3, autozygous regions can be seen around the Chromosome 1 centromere (116–163 Mb) and in the region between 220–242 Mb, along with a number of possible smaller regions scattered along the chromosome. By default, all the chromosomes are displayed scaled to the same length as Chromosome 1.

Viewing a region

Figure 4: The region of Chromosome 1 at 143–163 Mb has been selected, to show the genotype data in the autozygous region in more detail. The beginning and end of a region to be displayed are selectable using the list boxes underlined by the red and black lines. The view can be reset to the default values by pressing the Reset button (underlined in green).

The Chromosome region options panel contains three list options. The first (underlined in blue) allows the user to select the chromosome. The other two lists allow a specific region to be expanded, with the start point selected using the list underlined in red (Figure 4) and the end point using the list underlined in black. Alternatively, a region may be selected by clicking the left mouse button over the start point on the chromosome diagram, and right-clicking over the end point. In Figure 4, the autozygous region at 143–163 Mb on Chr. 1 has been selected to show the data in more detail. Pressing the Reset button returns these selections to their default values.

Data smoothing

Figure 5: To view the raw unsmoothed data, use the Show the unsmoothed data tick-box.

By default, the displayed genotype data are smoothed, to lessen the visual impact of incorrectly scored genotypes (Figure 5A). To view the raw, unsmoothed data, tick the Show the unsmoothed data box (underlined in blue in Figure 5B).

Minimum read depth cut-off

Figure 6: Increasing the Select the minimum read depth cut-off value (Figure 6B) alters the apparent extent of the autozygous regions as compared to the default cut-off value (Figure 6A).

As well as the Show the unsmoothed data tick-box, the Variant state options panel includes the Select the minimum read depth and Select heterozygous cut-off list options. Increasing the minimum read depth cut-off value causes variants with a total read depth less than the cut-off value to be ignored. In Figure 6, this manifests as a reduction in the apparent size of the autozygous regions; however the effect may vary between individual datasets. Typically, during the generation of the imported data, the variants will already have been filtered for read depth, and so a cut-off value of 0 will often be appropriate.

Heterozygous cut-off value

Figure 7: The effect on the apparent extent of autozygous regions, due to changes in the Heterozygous cut-off value, is greatest when the value is increased above its default.

Similarly, adjusting the Heterozygous cut-off value affects the apparent extent of autozygous regions, with a value above the default generally having a greater effect than a reduction (Figures 7 and 8). The proportion of sequence variants that are scored as homozygous or heterozygous is indicated in the graph at the bottom right of the window. Each sequence variant is shown as a dot, with the x-axis indicating the read depth of the reads containing the reference allele and the y-axis showing the read depth of the variant allele. Consequently, homozygous variants are plotted along the y-axis while heterozygous variants should trend along the white diagonal. To match the chromosome diagram, heterozygous variants are shown as yellow dots and homozygous variants as black. If the total read depth of a variant position falls below the Minimum read depth cut-off, it is drawn as a white dot.

Figure 8: There is a smaller effect on the apparent size of autozygous regions, when the Heterozygous cut-off value is reduced below its default.

The effect of different sources of data

AgileVariantMapper can display data generated by AgileAnnotator or AgileGenotyper, or correctly formatted data from another source. However, there may be important differences in the underlying data in each of these cases, which may result in apparent differences in the extent of the autozygous regions (Figure 9). These differences concern the number of sequence variants and the frequencies of the different genotypes. In data generated by ab initio detection of variants differing from the reference sequence, variants tend to be fewer in number; there is also an under-representation of homozygous reference sequence genotypes, when compared to data derived from AgileGenotyper, since the latter program concerns itself with known polymorphic positions, most of which will be homozygous wild-type.

To allow for these differences, when displaying data from AgileGenotyper, homozygous variants are drawn first, after which heterozygous markers are drawn over the homozygous ones. This has the effect that autozygous regions are identified by the absence of heterozygous variants. Conversely, when the data are derived from the ab initio detection of non-wildtype positions, there will be a relative excess of heterozygous variants, a high proportion of which may actually represent sequencing errors. With this type of data, heterozygous variants are therefore drawn first and then the rarer homozygous positions are overlaid on the heterozygous variant data. This leads to a situation in which autozygous regions are identified by the presence of blocks of homozygous variants.

Figure 9 shows images of Chromosome 1 in situations where the genotype data were generated by each of these different methods. In Figure 9A, the data were created by AgileAnnotator and filtered by AgileKnownSNPFilter, following which only the variants with RS numbers were exported by AgileVariantViewer. Figure 9B, in contrast, shows data generated by AgileGenotyper.

Figure 9: The exact extent of autozygous regions varies with the type of underlying data. Figure 9A shows data derived from the identification of sequence variants using AgileAnnotator, while Figure 9B shows data derived from the genotyping of ~0.5 million preselected known polymorphic regions by AgileGenotyper. A complete comparison of data from three individuals analysed in different ways can be seen here, here and here.

Exporting the genotype data

Figure 10: An example of a image exported by AgileVariantMapper showing data spanning a single chromosome.

AgileVariantMapper can export data in a number of ways: as a tab-delimited text file for further analysis by AutoSNPa or IBDFinder; as a single image displaying the data as currently displayed (Figure 10); or as a web page containing one image for each chromosome. (An example of a web page derived from data exported by AgileVariantViewer, including only variants with a RS number, is here).

Comparison of autozygous regions identified by different methods

To serve as examples of the capability of AgileVariantMapper to identify autozygous regions, some comparisons with SNP microarray data are shown in Figure 11. Genotypes derived from an Affymetrix Axiom microarray dataset (left-hand lanes) were compared to data generated by AgileVariantMapper, either using variants identified ab initio by analysis of the exome read depth data, or by scoring the genotypes of known polymorphic positions. For each of the disease-causing genes AGK, FARS2 and MFF, Figure 11 shows the autozygous regions surrounding the deleterious sequence variant; the full genome-wide comparisons can be seen here, here and here.

Figure 11: Autozygous regions containing three different disease genes (panels A, B and C). These are located at Chr.7:141 Mb (AGK), Chr.6:5.5 Mb (FARS2) and Chr.2:228 Mb (MFF) respectively. In each panel, the images are derived from: lane 1, Axiom SNP genotypes; lane 2, variants identified by ab initio analysis of the read depths at each position in the exome data; lane 3, the genotypes of known polymorphic positions as determined using the exome data.

Autozygosity mapping with AgileVariantMapper

This example illustrates the combining of data from unrelated affected individuals with the same recessive disorder. Exome data from three unrelated individuals affected by 3M syndrome and found to harbour CCDC8 mutations (Chr.19, 46.9 Mb) were used to identify autozygous regions. The results for Chromosome 19 are shown in Figure 12, while the entire analysis including all the autosomes can be seen here.

Figure 12: A comparison of the autozygous regions identified on Chr.19 among patients homozygous for CCDC8 (~46.9 Mb) mutations. Panels A to C represent respectively: Affymetrix SNP 6.0 data, variants identified ab initio in the exome data, and genotypes at known polymorphic positions.