15
Lab 7. Estimating Population Structure

Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Embed Size (px)

Citation preview

Page 1: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Lab 7. Estimating Population Structure

Page 2: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Goals1. Estimate and interpret statistics

(AMOVA + Bayesian) that characterize population structure.

2. Demonstrate roles of gene flow and genetic drift on population structure.

Page 3: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Gene flow and Genetic driftGene flow maintains similar allele frequency in different subpopulations.

Genetic drift causes random differences in allele frequencies among small subpopulations.

Wright’s Island model: Assumes Gene flow occurs with equal probability from the continent (large source population) to each island (smaller subpopulations)

qm

q0

mm

m mm

q0 q0 q0

q0

Page 4: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Gene flow and Genetic driftAssuming equilibrium between gene flow (increasing variations) and genetic drift (reducing variation in finite population) and also assuming Wright’s island model, diversity among subpopulations(FST) can be calculated as :

2

2

)1)(12(2

)1(

mNN

mFST

If, m=0, FST =1; i.e. Strong genetic differentiation exists among subpopulations. If, m=1, FST =0; i.e. No genetic differentiation exists among subpopulations.

14

1

Nm

FST

Page 5: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

F Coefficients with different level of structure

F Formula Meaning

FITMeasure of deviation (MD) from HWE in total population.0 : No deviation from HWE in TP.Positive: Deviation due to deficiency of heterozygotes in TP. Negative: Deviation due to excess of heterozygotes in TP.

FSTMeasure of genetic differentiation among subpopulations.It is always positive.0 : No genetic differentiations among subpopulations.1 :Strong genetic differentiations among subpopulations.

FISMeasure of deviation from HWE within subpopulations.0 = No deviation from HWE within SP.Positive: Deviation due to deficiency of heterozygotes within SP. Negative: Deviation due to excess of heterozygotes within SP.

T

OTIT H

HHF

T

STST H

HHF

S

OSIS H

HHF

Page 6: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Parameter Formula Meaning

FSRMeasure of genetic differentiation among subpopulations within a region.0 : No genetic differentiation among subpopulations within a region.1 :Strong genetic differentiation among subpopulations within a region.

FRTMeasure of genetic differentiation among regions for the total population.0 : No genetic differentiation among regions in TP.1 :Strong genetic differentiation among regions in TP.

R

SRSR H

HHF

T

RTRT H

HHF

F Coefficients with diffent levels of structure

Page 7: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Estimation of F Coefficients using AMOVA

Parameter AMOVA (Arlequin)

FST φST or FST

FSR φSC or FSC

FRT φCT or FCT

Page 8: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

America

Africa

Eurasia

East Asia

Oceania

Population structure from worldwide human populationPopulation = subpopulation.Group = Regions

Page 9: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

-------------------------------------------------------------------------------------------------------------- Source of variations Percentage of variation-------------------------------------------------------------------------------------------------------------- Among groups(regions) 10

Among sub(populations) within a region 4

Within sub(populations) 86

Fixation Indices: FST : 0.14 FSC : 0.04 FCT : 0.10----------------------------------------------------------------------14 % of total genetic variation is due to differentiation among subpopulations.

86 % of total genetic variation is due to differentiation within subpopulations.

4 % of regional genetic variation is due to differentiation among subpopulations.

10 % of total genetic variation is due to differentiation among regions.

AMOVA result interpretations:

Page 10: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

10 87 4 13 24 25 25 2 37 50

Human structure data     Colombian Karitiana Maya Pima   SA NA

ID Population

46 Colombian 120 120 128 124 142 124 133 129

47 Colombian 120 120 128 124 146 124 129 129

48 Colombian 126 126 128 124 146 144 129 129

# of individuals

# of pops. # individuals in pops.

# of regions

# individuals in regions

Page 11: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Problem 1. File human_struc.xls contains data for 10 microsatellite loci used to genotype 41 human populations from a worldwide sample.

a.) Convert the file into Arlequin format and perform AMOVA based on this grouping of populations within regions using distance measures based on the IAM and the SMM. How do you interpret these results? Report values of the phi-statistics and their statistical significance for each AMOVA you run.

b.) Do you think that any of these regions can justifiably be divided into subregions? Pick a region, form a hypothesis for what would be a reasonable grouping of populations into subregions, then run AMOVA only for the region you selected using distance measures based on both the IAM and the SMM. Was your hypothesis supported by the data?

c.) How do the phi-statistics calculated from distance measures based on the SMM compare to those based on the IAM?

d.) GRADUATE STUDENTS: Which of the 5 initially defined regions has the highest diversity in terms of effective number of alleles? What is your biological explanation for this?

Page 12: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

How to choose K?

Page 13: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

K Log-likelihood

2 -1235

3 -1238

Picking the Best K

Page 14: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Picking the Best K

Page 15: Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate

Problem 2. Use Structure to further test the hypotheses you developed in Problem 1.

a.) Calculate the posterior probabilities to test whether:i. All populations form a single genetically homogeneous group.ii. There are two genetically distinct groups within your selected regioniii. There are three genetically distinct groups within your selected region.

b.) Use the ΔK method to determine the most likely number of groups. How does this compare to the method based on posterior probabilities?

c.) How do the groupings of subpopulations compare to your expectations from Problem 1?

d.) Is there evidence of admixture among the groups? If so, include a table or figure showing the proportion of each subpopulation assigned to each group.

e.) GRADUATE STUDENTS: Provide a brief, literature-based explanation for the groupings you observe.