Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of...

Estimationofquantitativegeneticparametersfromdistant

relativesusingmarkerdata

PeterM.Visscherpeter.visscher@uq.edu.au

Keyconcepts• DenseSNPpanelsallowtheestimationoftheexpected

geneticcovariancebetweendistantrelatives(‘unrelateds’)• AmodelbaseduponestimatedrelationshipsfromSNPsis

equivalenttoamodelfittingallSNPssimultaneously• ThetotalgeneticvarianceduetoLDbetweencommonSNPs

and(unknown)causalvariantscanbeestimated• GeneticvariancecapturedbycommonSNPscanbeassigned

tochromosomesandchromosomesegments

100yearslaterHeritabilityofhumanheight

h2 ~ 80%5

Disease Number of loci

Percent of Heritability Measure Explained

Heritability Measure

Age-related macular degeneration

5 50% Sibling recurrence risk

Crohn’s disease 32 20% Genetic risk (liability)

Systemic lupus erythematosus

6 15% Sibling recurrence risk

Type 2 diabetes 18 6% Sibling recurrence risk

HDL cholesterol 7 5.2% Phenotypic variance

Height 40 5% Phenotypic variance

Early onset myocardial infarction

9 2.8% Phenotypic variance

Fasting glucose 4 1.5% Phenotypic variance

WhereistheDarkMatter?

Hypothesistestingvs.Estimation

• GWAS=hypothesistesting– Stringentp-valuethreshold– Estimatesofeffectsbiased(“Winner’sCurse”)

• E(bhat|test(bhat)>T)>b{bfixed}• var(bhat)=var(b)+var(bhat|b){brandom}

• CanweestimatethetotalproportionofvariationaccountedforbyallSNPs?

Basicidea• Estimatesofadditivegeneticvariancefromknownpedigreeisunbiased– Ifmodeliscorrect– Despitevariationinidentitygiventhepedigree– PedigreegivescorrectexpectedIBD

• Unknownpedigree:estimategenome-wideIBDfrommarkerdata– Estimateadditivegeneticvariancegiventhisestimateofrelatedness

• Ideaisnotnew– (Evolutionary)geneticsliterature(Ritland,Lynch,Hill,others)

Closevsdistantrelatives

• Detectionofcloserelatives(fullsibs,parent-offspring,halfsibs)frommarkerdataisrelativelystraightforward

• Butcloserelativesmayshareenvironmentalfactors– Biasedestimatesofgeneticvariance

• Solution:useonly(very)distantrelatives

AmodelforasinglecausalvariantAA AB BB

frequency (1-p)2 2p(1-p) p2

x 0 1 2effect 0 b 2bz=[x-E(x)]/sx -2p/√{2p(1-p)} (1-p)/√{2p(1-p)} 2(1-p)/√{2p(1-p)}

yj = µ’ +xijbi +ej x=0,1,2{standardassociationmodel}

yj = µ +zijuj +ej u=bsx;µ =µ’+bsx

Multiple(m)causalvariants

yj = µ +Szijuj +ej

= µ +gj +ej

y = µ1 +g +e

= µ1 +Zu +e

Equivalence

Letubearandomvariable,u~N(0,su2)

Thensg2 =msu

var(y) =ZZ’su2 +Ise

=ZZ’(sg2/m)+Ise

=Gsg2 +Ise

Model with individual genome-wide additive values using relationships (G) at the causal variants is equivalent to a model fitting all causal variants

We can estimate genetic variance just as if we would do using pedigree relationships

Butwedon’thavethecausalvariantsIfweestimateG fromSNPs:

– loseinformationduetoimperfectLDbetweenSNPsandcausalvariants

– howmuchwelosedependson• densityofSNPs• allelefrequencyspectrumofSNPsvs.causalvariants

– estimateofvarianceàmissingheritability

LetA betheestimateofG fromNSNPs:

Ajk =(1/N)S {xij – 2pi)(xik – 2pi)/{2pi(1-pi)}

=(1/N)S zijzik

Data• ~4000 ‘unrelated’ individuals• Ancestry ~British Isles• Measurement on height (self-report or clinically measured)• GWAS on 300k (‘adults’) or 600k (16-year olds) SNPs

Lack of evidence for population stratification within the Australian sample

Methods• Estimate realised relationship matrix from

SNPs• Estimate additive genetic variance

22)var( eg ss IAVy +==

)1(2),cov(

)var()var(),cov(

iikiij

iikiijijk pp

xxaxax

iii egy +=

Base population = current population

ïïî

ïïí

iijiij

iikiij

i ijkjk

,)1(22)21(11

)2)(2(11

Statistical analysis

22)var( eg ss IAVy +==

y standardised ~N(0,1)

No fixed effects other than mean

A estimated from SNPs

Residual maximum likelihood (REML)

h2 ~ 0.5 (SE 0.1)

Partitioning variation• If we can estimate the variance captured by

SNPs genome-wide, we should be able to partition it and attribute variance to regions of the genome

• “Population based linkage analysis”

Genome partitioning

• Partition additive genetic variance according to groups of SNPs– Chromosomes– Chromosome segments– MAF bins– Genic vs non-genic regions– Etc.

• Estimate genetic relationship matrix from SNP groups

• Analyse phenotypes by fitting multiple relationship matrices

• Linear model & REML (restricted maximum likelihood)

Application: the GENEVA Consortium• Data

– ~14,000 European Americans• ARIC• NHS• HPFS

– Affy 6.0 genotype data• ~600,000 after stringent QC

– Phenotypes on height, BMI, vWF and QT Interval

QC of SNPs

• 780,062 SNPs after QC steps listed in the table.

• Exclude 141,772 SNPs with MAF < 0.02 in European-ancestry group.

• Exclude 36,949 SNPs with missingness > 2% in all samples.

• Include autosomal SNPs only.

• End up with 577,778 SNPs.23

Results (genome-wide)

101112

0 50 100 150 200 250

nceexplainedbyeachchromosom

Chromosomelength(Mb)

Slope=1.6×10-4

P =1.4×10-6

R2 =0.695

Height(combined)

0 50 100 150 200 250

eChromosomelength(Mb)

Slope=2.3×10-5

P =0.214R2 =0.076

BMI(combined)

Genome-partitioning: longer chromosomes explain more variation

R²=0.511

0.00 0.01 0.02 0.03 0.04 0.05

nceexplainedbyGIANTheightSNPson

eachch

romosom

Varianceexplainedbyeachchromosome

Height (11,586 unrelated)

0.000 0.004 0.008 0.012 0.016 0.020

nceexplainedbychrom

e(adjustedfortheFTO

Varianceexplainedbychromosome(noadjustment)

BMI(11,586unrelated)

Results are consistent with reported GWAS

151617

192021220.00

0 50 100 150 200 250

Chromosomelength(Mb)

Slope=6.9×10-5

P =0.524R2 =0.021

vWF(ARIC)

Inference robust with respect to genetic architecture

151617

192021

0.00 0.03 0.06 0.09 0.12 0.15

nceexplainedbychrom

e(adjustedfortheABO

Varianceexplainedbychromosome(noadjustment)

vWF(6,662unrelated)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

Height(combined)17,277proteincoding geneshGg

2 =0.328(s.e.=0.024)hGi

2 =0.126(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=2.1x10-10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

BMI(combined)17,277proteincoding geneshGg

2 =0.117(s.e.=0.023)hGi

2 =0.047(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=0.022

Genic regions explain variation disproportionately

Using imputed sequence data

• How much information is gained by using SNP array data imputed to a fully sequenced reference?

• How much is lost relative to whole genome sequencing?

29Yang et al. 2015 (Nature Genetics)

Accounting for LD and MAF spectrum allows unbiased estimation of genetic variance (GREML-LDMS)

Random# More#common# Rarer# Rarer#&#DHS#

Herita

bility*es,m

GREML:SC#GREML:MS#LDAK#LDAK:MS#LDres#LDres:MS#GREML:LDMS#

Yang et al. 2015 (Nature Genetics)

Random# More#common# Rarer# Rarer#&#DHS#

Heritab

ility*es

,mate*

7MAF_4LD#

7MAF_3LD#

7MAF_2LD#

2MAF_2LD#

Very little difference in “taggability” between SNP chips

Genetic variation captured after imputation96% due to common variants73% due to rare variants

0# 0.1# 0.2# 0.3# 0.4# 0.5# 0.6# 0.7# 0.8# 0.9#

n'of'varia%o

n'captured

Imputa%on'R2'threshold'

Common#1#Affymetrix#6#

Common#1#Affymetrix#Axiom#

Common#1#Illumina#OmniExpress#

Common#1#Illumina#Omni2.5#

Common#1#Illumina#CoreExome#

Rare#1#Affymetrix#6#

Rare#1#Affymetrix#Axiom#

Rare#1#Illumina#OmniExpress#

Rare#1#Illumina#Omni2.5#

Rare#1#Illumina#CoreExome#

n = 45k data on height and BMI

Totals~60% for height~30% for BMI

<#0.1# 0.1#~#0.2# 0.2#~#0.3# 0.3#~#0.4# 0.4#~#0.5#

nce(explaine

MAF(stra2fied(variant(group(

Height# BMI#

2.5e+5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

nce(explaine

1st#quar4le#(low#LD)#

2nd#quar4le#

3rd#quar4le#

4th#quar4le#(high#LD)#

2.5e,5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

nce(explaine

SlidebyRobertMaier 33

h2 overestimation?untaggedrarevariants?

better tagging of ungenotyped variants

samplesize/power

Partitioning variance of height

TotalvarianceHeritability (based on Twin or family studies)SNP heritability from imputation to sequenced referenceSNP-heritability (variance explained by all genotyped SNPs ontheChip)VarianceexplainedbygenomewidesignificantSNPs

missingheritability60%

Scaling revisitedu = bsx ~ N(0, su

2) implies

b2 proportional to su2/[2p(1-p)], so rare variants have larger allelic effect: natural

selection

If b2 = su2 then no relationship between frequency and effect size: neutral

In between: b2 = su2 [2p(1-p)]-s

Variance explained by SNP: 2p(1-p)su2[2p(1-p)]-s

= su2 [2p(1-p)]1-s

s = 0: common SNPs explain more variations = 1: all SNPs explain the same amount of variation

Multiple methods to estimate additive genetic variance

• Individual-level data– GREML– Haseman-Elston regression(yjyj) = µ + bAij

• Summary dataLDscore regression

• Consideration:– data availability– model assumptions– computation

Key concepts• Dense SNP panels allow the estimation of the expected

genetic covariance between distant relatives (‘unrelateds’)

• A model based upon estimated relationships from SNPs is equivalent to a model fitting all SNPs simultaneously

• The total genetic variance due to LD between common SNPs and (unknown) causal variants can be estimated

• Genetic variance captured by common SNPs can be assigned to chromosomes and chromosome segments

Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of...

Documents

Texture, Microstructure & Anisotropypajarito.materials.cmu.edu/rollett/27750/Introduction-14Jan20.pdf · Texture: Quantitative Description • Three (3) parameters needed to describe

Qualitative and Quantitative Perfusion Parameters Determined by 3D

Quantitative Parameters of Some Novellas by Roman Ivanychuk

GUIDELINES ON PROSTATE CANCER - Uroweb · PDF fileProstate Cancer 47 M - Distant Metastasis4 MX M0 M1 Distant metastasis cannot be assessed No distant metastasis Distant metastasis

Possible Usage of IASI L2 Profiles in Nowcasting - essl.org · Quantitative comparison of the GII retrieved parameters and the same parameters derived from the profiles of IASI L2

Qualitative and quantitative chest CT parameters as ... · ORIGINAL ARTICLE Qualitative and quantitative chest CT parameters as predictors of specific mortality in COVID-19 patients

Quantitative Assessment of the Physiological Parameters

Distant Echo

Distant hybridization

Simultaneous Estimations of Plasma Parameters using ... · Simultaneous Estimations of Plasma Parameters using Quantitative Spectroscopy (Plasma Devices Lab Microwave Tubes (MWT)

A Distant Mirror

Distant relationship

Quantitative Assessment of Cardiac Left Ventricular Parameters using MRI Scanners … · 2012-09-25 · Quantitative Assessment of Cardiac Left Ventricular Parameters using MRI Scanners

Quantitative evaluation of hemodynamic parameters by … · 2021. 5. 6. · Quantitative evaluation of hemodynamic parameters by echocardiography in patients with postcardiotomy cardiac

A quantitative assessment of the parameters of the role of

Distant Data Save

Distant Targets 20

PARAMETERS Quantitative parameter - Overall Pass percentage or Pass percentage in a particular subject is taken as quantitative parameter. Qualitative

Distant Writing 2012

Quantitative determination of pore and throat parameters