Go to the U of M home page

Pages

Saturday, June 21, 2014

Genetic Differences from a Quantitative perspective

As I've been traveling and doing field collections, there are notable physical differences within a species from one area to another. I suspect there to be genetic differences, too. With approximately 700 miles between my current location and where I first started collecting, it is unlikely that geographically separate R. viscosum plants share recent ancestors and have been reproductively isolated for a period of time. While we can't pinpoint exact shared ancestors with the advent of genetic markers, we can estimate the the amount of variation at the DNA-sequence level.  This DNA sequence variation, composed of different allele size or the presence/absence of marker loci, can be divided into three parts: variation among regions (large geographic areas), variation among populations within regions (small geographic areas), and variation among individuals within populations.  The technical term for this is AMOVA, or analysis of molecular variance.  It follows the same principle as a traditional analysis of variance (ANOVA): a statistical model commonly used to analyze the difference between group means and procedure (treatment) applied.

You may recall the half-sib mating design I described in the previous post, where I hope to measure the mean performance of progeny from distinct maternal parents.  ANOVA works like this:

ANOVA for R. viscosum wild half-sib families, measured for mean rhizosphere acidification
Source of Variation
Degrees of Freedom
Mean squares
Environment
e-1

Repititions per Environment
(r-1)e

R. viscosum HS families
(n-1)
MSHS families
R. viscosum HS families x Environment
(n-1)(e-1)
MSHS families x environment
Error
(n-1)(r-1)e
MSerror

The sources of variation in an ANOVA, including error, are inherent to the experiment you set up.  In my case there are 4 unique sources of variation: the different media pH Environments where the half sib seedlings are grown, repetitions within the Enviroments, the half sib families themselves, and the interaction between half sib families and the environments they are grown in.  Here we identify significant differences based on ratios of mean squares, for example implying that we square the means of all R.viscosum half-sib families and divide by the relevant degree of freedom (n-1) for that source of variation.  We then calculate an F-statistic to test the significance of each source of variation.  To determine if there is a significant genotypic effect for your trait of interest in this mating design, you take the ratio:

MSHS families
---------------------
MSHS families x environment

This will give you an F-statistic for the effect of the half sib families.  The larger this value, the more significant genotypic effect is present.  

Now ANOVAs can be constructed to analyze group means for any experimental design, the derivations just become more lengthy.  But the same principle still applies.  AMOVA is more complex to grasp because we aren't looking at a mean as is most commonly done when we perform an ANOVA. Rather, we are analyzing the differences of alleles at marker loci from plants across a geographic area. Any marker technology can be applied and analyzed through an AMOVA, as long as different alleles can be detected to give a reliable estimate of hetero or homozygosity, the presence of multiple alleles or 1 allele at a locus, respectively.  

AMOVA Table

Excoffier, L., Smouse, P. E., & Quattro, J. M. (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics131(2), 479–491.
The results of an AMOVA table are more commonly reported as percentages known as ϕ-statistics. A percentage is more appropriate way to interpret marker data as we are most interested in where the variation is within a species.  If our ϕ-statistic for among regions is 0.04 (low) while our ϕ-statistic for within populations is 0.60 (high), this means that most of the genetic variation from our marker data is present within populations with very little distinguishing among regions. This scenario is common among outbreeding, wind pollinated species such as forest trees that have high levels of heterozygosity. 

AMOVA is the simplest way to understand genetic variation across geographic areas, but there are more complex ones.  The algorithm STRUCTURE is notable for its ability to determine optimal population groupings.  The result of a STRUCTURE program is presented below, where the program has grouped and sorted various human ethnic groups by their genetic similarity.  When more colors are observed in a plot, there is a greater allelic diversity within that population.  This also changes with the number of subgroups assumed within populations (K), an iterative process that is part of the STRUCTURE algorithm.

https://anthrogenetics.files.wordpress.com/2010/04/rosenberg-2002-structure.jpg


A nice thing with STRUCTURE, although it is notably more complex, are the graphics it can generate.  They are prettier than tables of numbers, such as those included above.  

No comments:

Post a Comment