Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Popula'on Structure and Disease-‐Associa'ons
02-‐223 How to Analyze Your Own Genome
Fall 2013
Popula'on Structure and Genome-‐wide Associa'on Analysis
• The muta;on that gives the lactose persistence phenotype is more common in Caucasian popula;on than in Asian popula;on
• The allele for blonde hair color is also more common in Caucasian popula;on than in Asian popula;on
Popula'on Structure and Genome-‐wide Associa'on Analysis
• Popula;on structure in data causes false posi;ves in GWAS – If samples in the case group are more related (come from the same
popula;on group), any SNPs more prevalent in the case popula;on will be found significantly associated with the trait.
Popula'on Structure and Genome-‐wide Associa'on Analysis
• What if we perform GWAS within each popula;on groups.
Half of the people have “aa” in both case and control groups
Half of the people have “AA” in both case and control groups
Accoun'ng for Popula'on Structure in Associa'on Analysis
• Needs to account for popula;on structure in associa;on mapping
• During the data collec;on process, one needs to design the study such that each popula;on is represented in case/control groups in a balanced way – In prac;ce, this can be hard to control – The effect of cryp;c popula;on structure
Genomic Control (GC)
• Idea: Use the SNPs that are not associated with the trait to remove the effect of popula;on stra;fica;on
• Genotype data consist of – Candidate genes to be tested for associa;ons – L supplementary loci (null loci) for es;ma;ng the infla;on factor λ
• GC uses the infla;on factor λ to correct the associa;on sta;s;c of the SNP in the candidate gene
• Limita;on: the infla;on factor λ is assumed to be the same across the genome, ignoring popula;on admixture
Devlin & Roeder, Biometrics 1999
Genomic Control (GC)
P-‐value threshold before GC correc;on
P-‐value threshold a_er GC correc;on
Structured Associa'on
• Idea: Within each subpopula;on, an associa;on between a gene;c marker and the trait is a true associa;on.
• Two-‐stage method – Step 1:
• es;mate the popula;on structure by applying clustering algorithms on the genome data
• assign sampled individuals to popula;on groups – Step 2:
• Test for phenotype associa;on within each popula;on inferred in Step 1
Structured Associa'on
• Cluster individuals to popula;on groups and perform GWAS within each popula;on group
Half of the people have “aa” in both case and control groups
Half of the people have “AA” in both case and control groups
Experiments: Lactose Persistence Phenotype
• Data : 1400 individuals from the control group of the WTCCC dataset, all of European descent. (The Wellcome Trust Case Control Consortium, Nature 2007)
• Genotype : 135.16-136.82Mb region on chromosome 2 (known to show geographical variation).
• Phenotype : Lactose persistence, fully determined by a particular mutation near the LCT gene (Enattah et al., 2002)
• Associated marker : SNP rs4988243 lies in a high linkage disequilibrium region (r2 >0.9) with this known genetic variant.
Experiments: Lactose Persistence
• Results from admixture clustering (Pritchard et al., Gene;cs 2000) of genotype data with four populations
• Given the results (genome composi;on, each column for each individual in the figure below) from Structure, individuals are grouped into four popula;ons using K-‐means algorithm
Experiments: Lactose Persistence
• Detec;ng the muta;on that confers lactose persistence phenotype to an individual
• Genomic control was not successful in detec;ng the true associa;on SNP, part because it ignores admixture
The correct SNP for lactose persistence phenotype
Genomic Control
Experiments: Lactose Persistence
• Detec;ng the muta;on that confers lactose persistence phenotype to an individual
• Once the popula;on structure is discovered by Structure, sparse mul;variate regression is run on each group separately
Lasso for structured associa;on (for each subpopula;on discovered by Structure)
The correct SNP for lactose persistence phenotype
Summary
• Popula;on structure and associa;on study – The alleles that are differently represented in different popula;ons can
appear as falsely associated with the phenotype of interest
– It is important to detect the popula;on structure in genomes and take into account this informa;on in associa;on analysis
• Sta;s;cal methods for correc;ng for popula;on structure – Genomic control – Structured associa;on