77
Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Embed Size (px)

Citation preview

Page 1: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Population Genetic Structure and Race/Ethnicity

EPI 217 Molecular and Genetic Epidemiology

Winter 2013

Page 2: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Outline

• How is genetic variation structured in the human population?

• What is the relationship between population genetic variation and race/ethnicity?

• Racially (ancestrally) admixed populations: African Americans and Latino Americans

• Interpreting racial/ethnic differences and confounding

Page 3: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Outline

• Admixture analysis

• Admixture mapping

• Pharmacogenetics – irinotecan and colon cancer

Page 4: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

What is the evidence regarding genetic structure in the human

population?

Page 5: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Genetic Markers

• 1900-1980: Blood groups, serum proteins (N of about 40)

• 1980-1990 DNA: Restriction Fragment Length Polymorphisms (RFLPs) (biallelic, N of thousands)

• 1990-2000: Short Tandem Repeat (STR) Markers (multiallelic, N of tens of thousands)

• 2000-2009: Single Nucleotide Polymorphisms (SNPs, N of millions)

Page 6: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Results from Population Genetics Studies – Tree Diagrams

• Based on comparisons of populations

• Genetic distances are calculated between populations based on allele frequencies at a collection of loci; the larger the allele frequency differences, the greater the distance

• Tree diagrams created based on these genetic distances

Page 7: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Bowcock et al, Nature 1994

• 30 microsatellite loci

• 14 populations, 148 subjects:• African - CAR pygmy, Zaire pygmy, Lisongo• Caucasian – Northern European, Italians• Oceania – Melanesian, New Guinean, Australian• East Asia – Chinese, Japanese, Cambodian• Americas – Maya, Surui, Karatiana

Page 8: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 9: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Calafell et al, Eur J Hum Genet, 1998

• 45 microsatellite loci

• 10 populations, 504 subjects– African: CAR pygmy, Zaire pygmy– Caucasian: Dane, Druze– Oceania: Melanesian (Nasioi)– East Asia: Chinese, Japanese, Yakut– Americas: Maya, Surui

Page 10: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 11: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Genetic Cluster Analysis

• Based on individuals

• Genetic marker data used to create clusters of individuals with similar genotype profiles

• Number of clusters must be specified

• Individuals can be assigned proportionate membership in different clusters

Page 12: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

• Human Genome Diversity Panel

• 55 Indigenous Populations from 5 Continents: Africa, Americas, Asia, Europe, Oceania, total of 1,056 people

• 377 STR Markers

Noah Rosenberg et al, Science, 2002

Page 13: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 14: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Jun Li et al, Science, 2008

• Human Genome Diversity Panel, 938 individuals from 51 populations, 5 continents

• 650,000 SNP Markers

Page 15: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 16: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Principal Components Analysis

• Method that can be applied to dense genotype data

• Defines a small number of orthogonal variables (linear combinations of genotypes) that explains the variability in a sample

• Based on “genetic similarity” matrix of all pairs of individuals in the sample

• Can reveal genetic clusters based on degree of relatedness of individuals

Page 17: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Genes mirror geography within Europe

John Novembre, Toby Johnson, Katarzyna Bryc, Zoltán Kutalik, Adam R. Boyko, Adam

Auton, Amit Indap, Karen S. King, Sven Bergmann, Matthew R. Nelson, Matthew

Stephens & Carlos D. BustamanteNature, 2008

Page 18: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

J Novembre et al. Nature 000, 1-4 (2008) doi:10.1038/nature07331

Population structure within Europe.

Page 19: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Out of Africa Theory of Human Evolution

Page 20: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Conclusion

• The primary source of genetic structure in the human population is based on continent of origin

• There are five major groupings – Africa, the Americas, East Asia, Europe/West Asia, and Oceania

• These groupings are not perfectly discrete, but neither is genetic variation continuous

• An additional, less prominent level of structure exists between national/ethnic groups within the major continental groupings

Page 21: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

What about non-Indigenous Populations?

• These studies address questions of ancient human evolution, but not recent events.

• For example, they are not representative of current Western Hemisphere populations, as in the U.S., Central and South America

Page 22: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

How are race, ethnicity and ancestry defined?

• Various definitions

• Race often defined in terms of geographic ancestry

• Ancestry defined in terms of country or nationality of origin

• Ethnicity defined in terms of shared socio-political-religious affiliation

• All are inter-related

Page 23: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

How are race, ethnicity and ancestry defined?

• U.S. Census Race Categories:– African/African American/Black– European/White– Asian– Pacific Islander– Native American/Alaskan Native– 2 or More Races

• U.S. Census Ethnicity Category:– Hispanic/Latino

Page 24: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Demography of U.S. in 2000

Group Number (x103) % of Total

White 194,553 69.1

Black 33,948 12.1

Asian 10,123 3.6

Native American 2,069 0.7

Pacific Islander 354 0.1

Hispanic 35,306 12.5

Other 5,070 1.8

Page 25: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

What is the evidence regarding genetic structure and race?

• How much correlation is there between self-identified race/ethnicity (SIRE) (for example, using the categories above) and genetic structure in the population?

Page 26: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Family Blood Pressure Program (FBPP)

• Study of genetic and environmental determinants of hypertension in families

• Four networks, 15 field centers (collection sites), four major race/ethnicity groups: Caucasian (CAU), African American (AFR), East Asian (Chinese, Japanese) (EAS), Hispanic (Mexican American) (HIS

Page 27: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

FBPPSubjects

• Total of 3,636 individuals included (one per family)

• CAU 1349, 6 sites

• AFR 1308, 4 sites

• HIS 412, 1 site

• EAS 567 (407 CHI, 160 JAP), 5 sites

• 18 SIRE-site combinations total

Page 28: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

FBPPGenetic Markers

• Genome Screen STR markers, all typed at the NHLBI sponsored Mammalian Genotyping Service, Marshfield, WI

• Total number of markers included = 366.

• Genetic distance analysis among SIRE-site groups

• Genetic cluster analysis

Page 29: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Multidimensional Scaling

• Based on distance matrix calculated by allele frequency differences between population groups.

• Provides a 2-dimensional picture of distance relationships among the populations

Page 30: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 31: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Genetic Cluster Analysis4 Clusters

Cluster A Cluster B Cluster C Cluster D

CAU 1348 0 0 1

AFR 3 0 1305 0

HIS 1 0 0 411

CHI 0 407 0 0

JAP 0 160 0 0

Page 32: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

GCA Classification versus SIRE

• Concordant: 3,631

• Discordant: 5

• Discordance Rate: .0014

• Conclusion: Very high correspondence between race/ethnicity groupings and genetic clusters

Page 33: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Analysis of Group Differences

• For the major race/ethnicity groups, SIRE and GCA give nearly identical results with enough genetic markers

• Important environmental/social/cultural differences also exist between SIRE groups

• Therefore, race/ethnicity represent both social and genetic factors.

Page 34: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Analysis of Group Differences

• High correlation between SIRE and GCA leads to strong confounding between genetic and non-genetic factors when examining group differences in prevalence of diseases or traits.

• Therefore, no inferences can be made about etiology of group differences from the observed differences alone.

Page 35: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Example – Kistka et al, Am. J. Obstetrics and Gynecology, 2007, “Racial disparity in the frequency of recurrence of preterm birth”:

“In this report, we further analyzed the pattern of recurrent preterm birth stratified by race and found that the tendency to repeat preterm birth during the same week occurs for both whites and blacks, but the median age for preterm birth is shifted 2 weeks earlier in blacks. These findings together highlight the importance of race, particularly after correction for other risk factors, and suggest a probable genetic component that may underlie the public health problem presented by the racial disparity in preterm birth.”

Page 36: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Lack of evidence of explanatory factors does not imply a genetic cause for group difference

Genetic explanations need to be direct, not indirect

Genetic explanations should not be the default position

Page 37: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Genetic Admixture

• Even though the four ethnic groups were easily separable based on genetic markers, African Americans and Latino Americans typically have ancestry from multiple continents. Using the same genetic markers, it is possible to estimate for each individual the proportions of ancestry, or individual ancestry (IA) from each continental/ancestral group.

Page 38: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Admixture Estimates - FBPP

• Estimation of ancestry requires genotypes of individuals representing the original indigenous ancestors. These analyses included 1,378 unrelated Caucasians from the FBPP, 127 unrelated sub-Saharan Africans and 50 Native Americans from the World Diversity Panel.

Page 39: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

African Ancestry in African Americans

Page 40: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Ancestry in Mexican Americans from Starr County, Texas

Page 41: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Admixture Analysis

• Distinguishing between genetic and non-genetic sources of group differences can be examined within a single admixed population.

• Depends on variation in admixture levels within that population

• Examine correlation of individual ancestry (IA) with trait of interest (e.g. does blood pressure correlate with African ancestry?)

Page 42: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Admixture Analysis - FBPP

• 3,207 African Americans

• 1,506 Mexican Americans

• Estimated IA and its correlation with blood pressure, hypertension, and BMI

Page 43: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Linear Regression on African IA in African Americans

b(IA)

SBP

b(IA)

DBP

b(IA)

MAP

b(IA)

BMI

5.4 (4.5) 3.0 (3.1) 6.2 (3.3) 4.0 (2.0)*

Page 44: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Regression in Mexican Americans on African and Caucasian IA

Outcome b(IA)

African

b(IA)

Caucasian

SBP 9.5 (21.6) -8.9 (5.8)

DBP 18.9 (10.0)* -1.0 (2.6)

MAP 15.6 (12.6) -3.9 (3.3)

BMI 3.9 (6.0) 4.3 (1.7)*

Page 45: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Admixture Analysis

• Caveat: Still possibly subject to residual correlation and confounding

• For example, within African Americans, discrimination may be related to both skin pigment and adverse health outcomes

• Skin pigment is likely to be genetically correlated with degree of European versus African ancestry

Page 46: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Admixture Mapping

• As opposed to ancestry estimates based on the entire genome, which may be confounded with non-genetic factors, ancestry at specific genetic locations are less likely to be so confounded

• The power of the method depends on how large the effect of an allele is on the trait, and the difference in the frequency of that allele between ancestral groups

Page 47: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Admixture Mapping

• If the admixture occurred recently in history (e.g. over the past 10 generations), then the ancestry excess will extend over large segments of the chromosome

• Thus, markers in the vicinity of the trait locus will also show excess ancestry from the population with the higher allele frequency

Page 48: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 49: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Example

• Kopp et al, Nature Genetics 40, 1175 - 1184 (2008)

• MYH9 is a major-effect risk gene for focal segmental glomerulosclerosis

Page 50: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 51: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

“We identified a chromosome 22 region with a genome-wide logarithm of the odds (lod) score of 9.2 and a peak lod of 12.4 centered on MYH9, a functional candidate gene expressed in kidney podocytes. Multiple MYH9 SNPs and haplotypes were recessively associated with FSGS (OR = 5.0, 95% CI = 3.5–7.1). This association extended to hypertensive ESKD (OR = 2.2, 95% CI = 1.5–3.4), but not type 2 diabetic ESKD. Genetic variation at the MYH9 locus substantially explains the increased burden of FSGS and hypertensive ESKD among African Americans.”

Page 52: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

HOWEVER

Page 53: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Association of Trypanolytic ApoL1 Variants with Kidney

Disease in African AmericansGenovese et al

Science, July 2010

Page 54: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Fig. 1 Association analysis in FSGS cohorts with logistic regression for alleles G1 and G2.

G Genovese et al. Science 2010;329:841-845

Page 55: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Here, we show that, in African Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 {FSGS odds ratio = 10.5 [95% confidence interval (CI) 6.0 to 18.4]; H-ESKD odds ratio = 7.3 (95% CI 5.6 to 9.5)}. The two APOL1 variants are common in African chromosomes but absent from European chromosomes, and both reside within haplotypes that harbor signatures of positive selection. ApoL1 (apolipoprotein L-1) is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease–associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African Americans.

Page 56: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Individual Ancestry and High Density SNP chips

• A chromosome from an individual of recent admixed ancestry constitutes a patchwork of chromosomal segments of distinct ancestry (ancestry blocks).

• Dense SNP chips (e.g. 500K random SNPs) can be used to accurately recreate ancestry blocks. Therefore, GWA chips can be used for both direct association and admixture mapping.

Page 57: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

truth

MHMM

HMM

Page 58: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Ancestry of African Americans

• Zakharia et al, Genome Biology, 2009

• Approximately 450,000 random SNP markers

• Admixture estimation based on 7 indigenous African populations and Europeans, as well as African Americans

• Principal components analysis

Page 59: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 60: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Correlation between European IA in African Americans and PC1

Page 61: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Ancestry Correlation in Latino Spouses (Genome Biology, 2009)

Based on Ancestry Informative Markers (AIMs) in Mexican spouse pairs from Mexico City and SF Bay Area, and Puerto Rican spouse pairs from Puerto Rico and New York City.

Page 62: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 63: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 64: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 65: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Consequences of ancestry-related assortative mating

• Linkage disequilibrium (allelic associations) between unlinked loci (different chromosomes) persists at a high rate in these populations at loci which have large ancestral allele frequency differences.

• Therefore, ancestry is the most important covariate in any genomic association study in these populations

Page 66: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Population Specificity of Genetic Variation

• J.A. Schneider et al, 2003

• Re-sequenced 2,036 genes (about 5 Mb) from four ethnic groups: 20 African Americans, 20 Asians, 21 Caucasians, 18 Hispanics

Page 67: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013
Page 68: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Population Specificity of Alleles

• The distribution of alleles across populations depends on the allele frequency.

• Most (but not all) alleles with frequency greater than .30 are pan ethnic

• Most alleles with frequency less than .10 are race-specific

• Most alleles with frequency less than .02 are ethnicity specific.

Page 69: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Global Distribution of CCR5 Δ32 mutation

• Martinson et al, Nature Genetics, 1997

• Δ32 mutation confers resistance to HIV infection

Page 70: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Global Distribution of CCR5 Δ32 mutation

Region Frequency

Europe 8.7%

Middle East 3.1%

South Asia 1.6%

East Asia 0.0%

Africa 0.0%

Oceania 0.0%

Americas 0.0%

Page 71: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Irinotecan and Colon Cancer

• Extreme side effects in some patients– Severe diarrhea, neutropenia– Recommended reduced starting dosage

• Metabolized by uridine diphosphate glucuronosyltransferase isoform 1A1 (UGT1A1)

• Homozygotes/compound heterozygotes for deficiency alleles at greatly increased risk for side effects

Page 72: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Frequency of UGT1A1 Deficiency Genotypes by Ethnic Group

Genotype Blacks Whites Asians Pac Isl’s

*28/*28 20% 15% 1% <0.1%

*6/*6 + *6/*28

<0.1% <0.1% 5.5% ?

Page 73: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Relevance

• If one were to characterize genetic variation in this enzyme in Europeans only, the majority of sensitive Asians would be missed entirely.

Page 74: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Conclusions

• Genetic structure is greatest at the level of continental ancestry (correlated with race); a lower level of structure also exists at the level of ethnicity/nationality

• Common genetic variants are typically pan ethnic. Low frequency variants are typically race/ethnicity specific

• Low frequency variants are far more abundant than high frequency ones

Page 75: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Conclusions

• Race/ethnicity differences in disease prevalence are highly confounded between genes and environment

• A group difference alone does not indicate a source for that difference

Page 76: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Conclusions

• Admixed populations (African Americans and Latino Americans) can be used to examine racial differences, but confounding may remain

• Locus specific ancestry results are likely to be less confounded

Page 77: Population Genetic Structure and Race/Ethnicity EPI 217 Molecular and Genetic Epidemiology Winter 2013

Conclusions

• Gene-environment interactions are likely to be prominent in examining population differences

• Pharmacogenetics (e.g. irinotecan) provides important examples of race/ethnicity specific genetic variation in clinical practice.