Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Keyconcepts• DenseSNPpanelsallowtheestimationoftheexpected
geneticcovariancebetweendistantrelatives(‘unrelateds’)• AmodelbaseduponestimatedrelationshipsfromSNPsis
equivalenttoamodelfittingallSNPssimultaneously• ThetotalgeneticvarianceduetoLDbetweencommonSNPs
and(unknown)causalvariantscanbeestimated• GeneticvariancecapturedbycommonSNPscanbeassigned
tochromosomesandchromosomesegments
2
1886
3
4
100yearslaterHeritabilityofhumanheight
h2 ~ 80%5
6
Disease Number of loci
Percent of Heritability Measure Explained
Heritability Measure
Age-related macular degeneration
5 50% Sibling recurrence risk
Crohn’s disease 32 20% Genetic risk (liability)
Systemic lupus erythematosus
6 15% Sibling recurrence risk
Type 2 diabetes 18 6% Sibling recurrence risk
HDL cholesterol 7 5.2% Phenotypic variance
Height 40 5% Phenotypic variance
Early onset myocardial infarction
9 2.8% Phenotypic variance
Fasting glucose 4 1.5% Phenotypic variance
WhereistheDarkMatter?
7
Hypothesistestingvs.Estimation
• GWAS=hypothesistesting– Stringentp-valuethreshold– Estimatesofeffectsbiased(“Winner’sCurse”)
• E(bhat|test(bhat)>T)>b{bfixed}• var(bhat)=var(b)+var(bhat|b){brandom}
• CanweestimatethetotalproportionofvariationaccountedforbyallSNPs?
8
Basicidea• Estimatesofadditivegeneticvariancefromknownpedigreeisunbiased– Ifmodeliscorrect– Despitevariationinidentitygiventhepedigree– PedigreegivescorrectexpectedIBD
• Unknownpedigree:estimategenome-wideIBDfrommarkerdata– Estimateadditivegeneticvariancegiventhisestimateofrelatedness
• Ideaisnotnew– (Evolutionary)geneticsliterature(Ritland,Lynch,Hill,others)
9
Closevsdistantrelatives
• Detectionofcloserelatives(fullsibs,parent-offspring,halfsibs)frommarkerdataisrelativelystraightforward
• Butcloserelativesmayshareenvironmentalfactors– Biasedestimatesofgeneticvariance
• Solution:useonly(very)distantrelatives
10
AmodelforasinglecausalvariantAA AB BB
frequency (1-p)2 2p(1-p) p2
x 0 1 2effect 0 b 2bz=[x-E(x)]/sx -2p/√{2p(1-p)} (1-p)/√{2p(1-p)} 2(1-p)/√{2p(1-p)}
yj = µ’ +xijbi +ej x=0,1,2{standardassociationmodel}
yj = µ +zijuj +ej u=bsx;µ =µ’+bsx
11
Multiple(m)causalvariants
yj = µ +Szijuj +ej
= µ +gj +ej
y = µ1 +g +e
= µ1 +Zu +e
12
Equivalence
Letubearandomvariable,u~N(0,su2)
Thensg2 =msu
2 and
var(y) =ZZ’su2 +Ise
2
=ZZ’(sg2/m)+Ise
2
=Gsg2 +Ise
2
13
Model with individual genome-wide additive values using relationships (G) at the causal variants is equivalent to a model fitting all causal variants
We can estimate genetic variance just as if we would do using pedigree relationships
Butwedon’thavethecausalvariantsIfweestimateG fromSNPs:
– loseinformationduetoimperfectLDbetweenSNPsandcausalvariants
– howmuchwelosedependson• densityofSNPs• allelefrequencyspectrumofSNPsvs.causalvariants
– estimateofvarianceàmissingheritability
14
LetA betheestimateofG fromNSNPs:
Ajk =(1/N)S {xij – 2pi)(xik – 2pi)/{2pi(1-pi)}
=(1/N)S zijzik
Data• ~4000 ‘unrelated’ individuals• Ancestry ~British Isles• Measurement on height (self-report or clinically measured)• GWAS on 300k (‘adults’) or 600k (16-year olds) SNPs
15
Lack of evidence for population stratification within the Australian sample
16
Methods• Estimate realised relationship matrix from
SNPs• Estimate additive genetic variance
22)var( eg ss IAVy +==
)1(2),cov(
)var()var(),cov(
ii
ikij
iikiij
iikiijijk pp
xxaxax
axaxA
-==
iii egy +=
Base population = current population
ïïî
ïïí
ì
=-
++-+
¹-
--
==
å
åå
kjpp
pxpxN
kjpp
pxpxN
AN
A
iii
iijiij
iii
iikiij
i ijkjk
,)1(22)21(11
,)1(2
)2)(2(11
22
17
Statistical analysis
22)var( eg ss IAVy +==
y standardised ~N(0,1)
No fixed effects other than mean
A estimated from SNPs
Residual maximum likelihood (REML)
18
h2 ~ 0.5 (SE 0.1)
19
Partitioning variation• If we can estimate the variance captured by
SNPs genome-wide, we should be able to partition it and attribute variance to regions of the genome
• “Population based linkage analysis”
20
Genome partitioning
21
• Partition additive genetic variance according to groups of SNPs– Chromosomes– Chromosome segments– MAF bins– Genic vs non-genic regions– Etc.
• Estimate genetic relationship matrix from SNP groups
• Analyse phenotypes by fitting multiple relationship matrices
• Linear model & REML (restricted maximum likelihood)
Application: the GENEVA Consortium• Data
– ~14,000 European Americans• ARIC• NHS• HPFS
– Affy 6.0 genotype data• ~600,000 after stringent QC
– Phenotypes on height, BMI, vWF and QT Interval
22
QC of SNPs
• 780,062 SNPs after QC steps listed in the table.
• Exclude 141,772 SNPs with MAF < 0.02 in European-ancestry group.
• Exclude 36,949 SNPs with missingness > 2% in all samples.
• Include autosomal SNPs only.
• End up with 577,778 SNPs.23
Results (genome-wide)
24
1
2
3
4
5
6
789
101112
13
14
15
16
17
1819
20
21
22
0
0.01
0.02
0.03
0.04
0.05
0 50 100 150 200 250
Varia
nceexplainedbyeachchromosom
e
Chromosomelength(Mb)
Slope=1.6×10-4
P =1.4×10-6
R2 =0.695
Height(combined)
12
34
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
2122
0
0.005
0.01
0.015
0.02
0.025
0 50 100 150 200 250
Varia
nceexplainedbyeachchromosom
eChromosomelength(Mb)
Slope=2.3×10-5
P =0.214R2 =0.076
BMI(combined)
25
Genome-partitioning: longer chromosomes explain more variation
1
234
5
6
7
8
9
10
11
12
13 14
15
16
17
1819
20
21 22
R²=0.511
0.000
0.002
0.004
0.006
0.008
0.010
0.00 0.01 0.02 0.03 0.04 0.05
Varia
nceexplainedbyGIANTheightSNPson
eachch
romosom
e
Varianceexplainedbyeachchromosome
Height (11,586 unrelated)
1
2
34
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
2122
0.000
0.004
0.008
0.012
0.016
0.020
0.000 0.004 0.008 0.012 0.016 0.020
Varia
nceexplainedbychrom
osom
e(adjustedfortheFTO
SNP)
Varianceexplainedbychromosome(noadjustment)
BMI(11,586unrelated)
FTO
26
Results are consistent with reported GWAS
1
234
5
6
78
9
1011
12
13
14
151617
18
192021220.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0 50 100 150 200 250
Varia
nceexplainedbyeachchromosom
e
Chromosomelength(Mb)
Slope=6.9×10-5
P =0.524R2 =0.021
vWF(ARIC)
27
Inference robust with respect to genetic architecture
1
2
3
4
5
6
7
8
9
1011
12
13
14
151617
18
192021
22
0.00
0.03
0.06
0.09
0.12
0.15
0.00 0.03 0.06 0.09 0.12 0.15
Varia
nceexplainedbychrom
osom
e(adjustedfortheABO
SNP)
Varianceexplainedbychromosome(noadjustment)
vWF(6,662unrelated)
ABO
0
0.01
0.02
0.03
0.04
0.05
0.06
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Varia
nceexplained
Chromosome
intergenic(± 20Kb)
genic(± 20Kb)
Height(combined)17,277proteincoding geneshGg
2 =0.328(s.e.=0.024)hGi
2 =0.126(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=2.1x10-10
0
0.005
0.01
0.015
0.02
0.025
0.03
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Varia
nceexplained
Chromosome
intergenic(± 20Kb)
genic(± 20Kb)
BMI(combined)17,277proteincoding geneshGg
2 =0.117(s.e.=0.023)hGi
2 =0.047(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=0.022
28
Genic regions explain variation disproportionately
Using imputed sequence data
• How much information is gained by using SNP array data imputed to a fully sequenced reference?
• How much is lost relative to whole genome sequencing?
29Yang et al. 2015 (Nature Genetics)
Accounting for LD and MAF spectrum allows unbiased estimation of genetic variance (GREML-LDMS)
30
0.0#
0.2#
0.4#
0.6#
0.8#
1.0#
1.2#
Random# More#common# Rarer# Rarer#&#DHS#
Herita
bility*es,m
ate*
GREML:SC#GREML:MS#LDAK#LDAK:MS#LDres#LDres:MS#GREML:LDMS#
Yang et al. 2015 (Nature Genetics)
0.0#
0.2#
0.4#
0.6#
0.8#
1.0#
Random# More#common# Rarer# Rarer#&#DHS#
Heritab
ility*es
,mate*
7MAF_4LD#
7MAF_3LD#
7MAF_2LD#
2MAF_2LD#
Very little difference in “taggability” between SNP chips
31
Genetic variation captured after imputation96% due to common variants73% due to rare variants
0.0#
0.2#
0.4#
0.6#
0.8#
1.0#
0# 0.1# 0.2# 0.3# 0.4# 0.5# 0.6# 0.7# 0.8# 0.9#
Prop
or%o
n'of'varia%o
n'captured
'
Imputa%on'R2'threshold'
Common#1#Affymetrix#6#
Common#1#Affymetrix#Axiom#
Common#1#Illumina#OmniExpress#
Common#1#Illumina#Omni2.5#
Common#1#Illumina#CoreExome#
Rare#1#Affymetrix#6#
Rare#1#Affymetrix#Axiom#
Rare#1#Illumina#OmniExpress#
Rare#1#Illumina#Omni2.5#
Rare#1#Illumina#CoreExome#
Yang et al. 2015 (Nature Genetics)
n = 45k data on height and BMI
32
Totals~60% for height~30% for BMI
0.00#
0.05#
0.10#
0.15#
0.20#
0.25#
<#0.1# 0.1#~#0.2# 0.2#~#0.3# 0.3#~#0.4# 0.4#~#0.5#
Varia
nce(explaine
d(
MAF(stra2fied(variant(group(
Height# BMI#
Yang et al. 2015 (Nature Genetics)
0.00#
0.02#
0.04#
0.06#
0.08#
0.10#
0.12#
0.14#
2.5e+5#~#0.001#
0.001#~#0.01#
0.01#~#0.1#
0.1#~#0.2#
0.2#~#0.3#
0.3#~#0.4#
0.4#~#0.5#
Varia
nce(explaine
d(
MAF(
1st#quar4le#(low#LD)#
2nd#quar4le#
3rd#quar4le#
4th#quar4le#(high#LD)#
0.00#
0.01#
0.02#
0.03#
0.04#
0.05#
0.06#
0.07#
2.5e,5#~#0.001#
0.001#~#0.01#
0.01#~#0.1#
0.1#~#0.2#
0.2#~#0.3#
0.3#~#0.4#
0.4#~#0.5#
Varia
nce(explaine
d(
MAF(
100%
80%
45%
16%
SlidebyRobertMaier 33
h2 overestimation?untaggedrarevariants?
better tagging of ungenotyped variants
samplesize/power
Partitioning variance of height
TotalvarianceHeritability (based on Twin or family studies)SNP heritability from imputation to sequenced referenceSNP-heritability (variance explained by all genotyped SNPs ontheChip)VarianceexplainedbygenomewidesignificantSNPs
missingheritability60%
Scaling revisitedu = bsx ~ N(0, su
2) implies
b2 proportional to su2/[2p(1-p)], so rare variants have larger allelic effect: natural
selection
If b2 = su2 then no relationship between frequency and effect size: neutral
model
In between: b2 = su2 [2p(1-p)]-s
Variance explained by SNP: 2p(1-p)su2[2p(1-p)]-s
= su2 [2p(1-p)]1-s
s = 0: common SNPs explain more variations = 1: all SNPs explain the same amount of variation
34
Multiple methods to estimate additive genetic variance
• Individual-level data– GREML– Haseman-Elston regression(yjyj) = µ + bAij
• Summary dataLDscore regression
• Consideration:– data availability– model assumptions– computation
35
Key concepts• Dense SNP panels allow the estimation of the expected
genetic covariance between distant relatives (‘unrelateds’)
• A model based upon estimated relationships from SNPs is equivalent to a model fitting all SNPs simultaneously
• The total genetic variance due to LD between common SNPs and (unknown) causal variants can be estimated
• Genetic variance captured by common SNPs can be assigned to chromosomes and chromosome segments
36