Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of...

Preview:

Citation preview

Estimationofquantitativegeneticparametersfromdistant

relativesusingmarkerdata

PeterM.Visscherpeter.visscher@uq.edu.au

1

Keyconcepts• DenseSNPpanelsallowtheestimationoftheexpected

geneticcovariancebetweendistantrelatives(‘unrelateds’)• AmodelbaseduponestimatedrelationshipsfromSNPsis

equivalenttoamodelfittingallSNPssimultaneously• ThetotalgeneticvarianceduetoLDbetweencommonSNPs

and(unknown)causalvariantscanbeestimated• GeneticvariancecapturedbycommonSNPscanbeassigned

tochromosomesandchromosomesegments

2

1886

3

4

100yearslaterHeritabilityofhumanheight

h2 ~ 80%5

6

Disease Number of loci

Percent of Heritability Measure Explained

Heritability Measure

Age-related macular degeneration

5 50% Sibling recurrence risk

Crohn’s disease 32 20% Genetic risk (liability)

Systemic lupus erythematosus

6 15% Sibling recurrence risk

Type 2 diabetes 18 6% Sibling recurrence risk

HDL cholesterol 7 5.2% Phenotypic variance

Height 40 5% Phenotypic variance

Early onset myocardial infarction

9 2.8% Phenotypic variance

Fasting glucose 4 1.5% Phenotypic variance

WhereistheDarkMatter?

7

Hypothesistestingvs.Estimation

• GWAS=hypothesistesting– Stringentp-valuethreshold– Estimatesofeffectsbiased(“Winner’sCurse”)

• E(bhat|test(bhat)>T)>b{bfixed}• var(bhat)=var(b)+var(bhat|b){brandom}

• CanweestimatethetotalproportionofvariationaccountedforbyallSNPs?

8

Basicidea• Estimatesofadditivegeneticvariancefromknownpedigreeisunbiased– Ifmodeliscorrect– Despitevariationinidentitygiventhepedigree– PedigreegivescorrectexpectedIBD

• Unknownpedigree:estimategenome-wideIBDfrommarkerdata– Estimateadditivegeneticvariancegiventhisestimateofrelatedness

• Ideaisnotnew– (Evolutionary)geneticsliterature(Ritland,Lynch,Hill,others)

9

Closevsdistantrelatives

• Detectionofcloserelatives(fullsibs,parent-offspring,halfsibs)frommarkerdataisrelativelystraightforward

• Butcloserelativesmayshareenvironmentalfactors– Biasedestimatesofgeneticvariance

• Solution:useonly(very)distantrelatives

10

AmodelforasinglecausalvariantAA AB BB

frequency (1-p)2 2p(1-p) p2

x 0 1 2effect 0 b 2bz=[x-E(x)]/sx -2p/√{2p(1-p)} (1-p)/√{2p(1-p)} 2(1-p)/√{2p(1-p)}

yj = µ’ +xijbi +ej x=0,1,2{standardassociationmodel}

yj = µ +zijuj +ej u=bsx;µ =µ’+bsx

11

Multiple(m)causalvariants

yj = µ +Szijuj +ej

= µ +gj +ej

y = µ1 +g +e

= µ1 +Zu +e

12

Equivalence

Letubearandomvariable,u~N(0,su2)

Thensg2 =msu

2 and

var(y) =ZZ’su2 +Ise

2

=ZZ’(sg2/m)+Ise

2

=Gsg2 +Ise

2

13

Model with individual genome-wide additive values using relationships (G) at the causal variants is equivalent to a model fitting all causal variants

We can estimate genetic variance just as if we would do using pedigree relationships

Butwedon’thavethecausalvariantsIfweestimateG fromSNPs:

– loseinformationduetoimperfectLDbetweenSNPsandcausalvariants

– howmuchwelosedependson• densityofSNPs• allelefrequencyspectrumofSNPsvs.causalvariants

– estimateofvarianceàmissingheritability

14

LetA betheestimateofG fromNSNPs:

Ajk =(1/N)S {xij – 2pi)(xik – 2pi)/{2pi(1-pi)}

=(1/N)S zijzik

Data• ~4000 ‘unrelated’ individuals• Ancestry ~British Isles• Measurement on height (self-report or clinically measured)• GWAS on 300k (‘adults’) or 600k (16-year olds) SNPs

15

Lack of evidence for population stratification within the Australian sample

16

Methods• Estimate realised relationship matrix from

SNPs• Estimate additive genetic variance

22)var( eg ss IAVy +==

)1(2),cov(

)var()var(),cov(

ii

ikij

iikiij

iikiijijk pp

xxaxax

axaxA

-==

iii egy +=

Base population = current population

ïïî

ïïí

ì

=-

++-+

¹-

--

==

å

åå

kjpp

pxpxN

kjpp

pxpxN

AN

A

iii

iijiij

iii

iikiij

i ijkjk

,)1(22)21(11

,)1(2

)2)(2(11

22

17

Statistical analysis

22)var( eg ss IAVy +==

y standardised ~N(0,1)

No fixed effects other than mean

A estimated from SNPs

Residual maximum likelihood (REML)

18

h2 ~ 0.5 (SE 0.1)

19

Partitioning variation• If we can estimate the variance captured by

SNPs genome-wide, we should be able to partition it and attribute variance to regions of the genome

• “Population based linkage analysis”

20

Genome partitioning

21

• Partition additive genetic variance according to groups of SNPs– Chromosomes– Chromosome segments– MAF bins– Genic vs non-genic regions– Etc.

• Estimate genetic relationship matrix from SNP groups

• Analyse phenotypes by fitting multiple relationship matrices

• Linear model & REML (restricted maximum likelihood)

Application: the GENEVA Consortium• Data

– ~14,000 European Americans• ARIC• NHS• HPFS

– Affy 6.0 genotype data• ~600,000 after stringent QC

– Phenotypes on height, BMI, vWF and QT Interval

22

QC of SNPs

• 780,062 SNPs after QC steps listed in the table.

• Exclude 141,772 SNPs with MAF < 0.02 in European-ancestry group.

• Exclude 36,949 SNPs with missingness > 2% in all samples.

• Include autosomal SNPs only.

• End up with 577,778 SNPs.23

Results (genome-wide)

24

1

2

3

4

5

6

789

101112

13

14

15

16

17

1819

20

21

22

0

0.01

0.02

0.03

0.04

0.05

0 50 100 150 200 250

Varia

nceexplainedbyeachchromosom

e

Chromosomelength(Mb)

Slope=1.6×10-4

P =1.4×10-6

R2 =0.695

Height(combined)

12

34

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

2122

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250

Varia

nceexplainedbyeachchromosom

eChromosomelength(Mb)

Slope=2.3×10-5

P =0.214R2 =0.076

BMI(combined)

25

Genome-partitioning: longer chromosomes explain more variation

1

234

5

6

7

8

9

10

11

12

13 14

15

16

17

1819

20

21 22

R²=0.511

0.000

0.002

0.004

0.006

0.008

0.010

0.00 0.01 0.02 0.03 0.04 0.05

Varia

nceexplainedbyGIANTheightSNPson

eachch

romosom

e

Varianceexplainedbyeachchromosome

Height (11,586 unrelated)

1

2

34

5

6

7

8

9

10

11

12

13

14

1516

17

18

19

20

2122

0.000

0.004

0.008

0.012

0.016

0.020

0.000 0.004 0.008 0.012 0.016 0.020

Varia

nceexplainedbychrom

osom

e(adjustedfortheFTO

SNP)

Varianceexplainedbychromosome(noadjustment)

BMI(11,586unrelated)

FTO

26

Results are consistent with reported GWAS

1

234

5

6

78

9

1011

12

13

14

151617

18

192021220.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0 50 100 150 200 250

Varia

nceexplainedbyeachchromosom

e

Chromosomelength(Mb)

Slope=6.9×10-5

P =0.524R2 =0.021

vWF(ARIC)

27

Inference robust with respect to genetic architecture

1

2

3

4

5

6

7

8

9

1011

12

13

14

151617

18

192021

22

0.00

0.03

0.06

0.09

0.12

0.15

0.00 0.03 0.06 0.09 0.12 0.15

Varia

nceexplainedbychrom

osom

e(adjustedfortheABO

SNP)

Varianceexplainedbychromosome(noadjustment)

vWF(6,662unrelated)

ABO

0

0.01

0.02

0.03

0.04

0.05

0.06

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varia

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

Height(combined)17,277proteincoding geneshGg

2 =0.328(s.e.=0.024)hGi

2 =0.126(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=2.1x10-10

0

0.005

0.01

0.015

0.02

0.025

0.03

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varia

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

BMI(combined)17,277proteincoding geneshGg

2 =0.117(s.e.=0.023)hGi

2 =0.047(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=0.022

28

Genic regions explain variation disproportionately

Using imputed sequence data

• How much information is gained by using SNP array data imputed to a fully sequenced reference?

• How much is lost relative to whole genome sequencing?

29Yang et al. 2015 (Nature Genetics)

Accounting for LD and MAF spectrum allows unbiased estimation of genetic variance (GREML-LDMS)

30

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

1.2#

Random# More#common# Rarer# Rarer#&#DHS#

Herita

bility*es,m

ate*

GREML:SC#GREML:MS#LDAK#LDAK:MS#LDres#LDres:MS#GREML:LDMS#

Yang et al. 2015 (Nature Genetics)

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

Random# More#common# Rarer# Rarer#&#DHS#

Heritab

ility*es

,mate*

7MAF_4LD#

7MAF_3LD#

7MAF_2LD#

2MAF_2LD#

Very little difference in “taggability” between SNP chips

31

Genetic variation captured after imputation96% due to common variants73% due to rare variants

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

0# 0.1# 0.2# 0.3# 0.4# 0.5# 0.6# 0.7# 0.8# 0.9#

Prop

or%o

n'of'varia%o

n'captured

'

Imputa%on'R2'threshold'

Common#1#Affymetrix#6#

Common#1#Affymetrix#Axiom#

Common#1#Illumina#OmniExpress#

Common#1#Illumina#Omni2.5#

Common#1#Illumina#CoreExome#

Rare#1#Affymetrix#6#

Rare#1#Affymetrix#Axiom#

Rare#1#Illumina#OmniExpress#

Rare#1#Illumina#Omni2.5#

Rare#1#Illumina#CoreExome#

Yang et al. 2015 (Nature Genetics)

n = 45k data on height and BMI

32

Totals~60% for height~30% for BMI

0.00#

0.05#

0.10#

0.15#

0.20#

0.25#

<#0.1# 0.1#~#0.2# 0.2#~#0.3# 0.3#~#0.4# 0.4#~#0.5#

Varia

nce(explaine

d(

MAF(stra2fied(variant(group(

Height# BMI#

Yang et al. 2015 (Nature Genetics)

0.00#

0.02#

0.04#

0.06#

0.08#

0.10#

0.12#

0.14#

2.5e+5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

Varia

nce(explaine

d(

MAF(

1st#quar4le#(low#LD)#

2nd#quar4le#

3rd#quar4le#

4th#quar4le#(high#LD)#

0.00#

0.01#

0.02#

0.03#

0.04#

0.05#

0.06#

0.07#

2.5e,5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

Varia

nce(explaine

d(

MAF(

100%

80%

45%

16%

SlidebyRobertMaier 33

h2 overestimation?untaggedrarevariants?

better tagging of ungenotyped variants

samplesize/power

Partitioning variance of height

TotalvarianceHeritability (based on Twin or family studies)SNP heritability from imputation to sequenced referenceSNP-heritability (variance explained by all genotyped SNPs ontheChip)VarianceexplainedbygenomewidesignificantSNPs

missingheritability60%

Scaling revisitedu = bsx ~ N(0, su

2) implies

b2 proportional to su2/[2p(1-p)], so rare variants have larger allelic effect: natural

selection

If b2 = su2 then no relationship between frequency and effect size: neutral

model

In between: b2 = su2 [2p(1-p)]-s

Variance explained by SNP: 2p(1-p)su2[2p(1-p)]-s

= su2 [2p(1-p)]1-s

s = 0: common SNPs explain more variations = 1: all SNPs explain the same amount of variation

34

Multiple methods to estimate additive genetic variance

• Individual-level data– GREML– Haseman-Elston regression(yjyj) = µ + bAij

• Summary dataLDscore regression

• Consideration:– data availability– model assumptions– computation

35

Key concepts• Dense SNP panels allow the estimation of the expected

genetic covariance between distant relatives (‘unrelateds’)

• A model based upon estimated relationships from SNPs is equivalent to a model fitting all SNPs simultaneously

• The total genetic variance due to LD between common SNPs and (unknown) causal variants can be estimated

• Genetic variance captured by common SNPs can be assigned to chromosomes and chromosome segments

36

Recommended