47
1 Human Genetics Human Genetics Genetic Epidemiology Genetic Epidemiology

1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

Embed Size (px)

Citation preview

Page 1: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

1

Human GeneticsHuman Genetics

Genetic EpidemiologyGenetic Epidemiology

Page 2: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

2

Family trees can have a lot of nuts

Page 3: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

3

Genetic Epidemiology - AimsGenetic Epidemiology - Aims

1. Gene detection

2. Gene characterization

mode of inheritance

allele frequencies

→ prevalence, attributable risk

Page 4: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

4

Genetic Epidemiology - MethodsGenetic Epidemiology - Methods

• Aggregation

• Segregation

• Co-segregation

• Association

Page 5: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

5

SegregationSegregation

Can the dichotomy or trichotomy be explained by Mendelian segregation?

affected and unaffected or

two distributions:

determined by a dominant or recessive allele

Also possible: three distributions:

Page 6: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

6

Likelihood (parameter(s); data)

Probability (data | parameter(s))

founders nonfounders observed

( ) ( | , ) ( | )

j ji j f mi j

P G P G G G P Y G

The joint probability of the genotypes and phenotypes of all the members of a pedigree can be written as

nonfounders 1 2 founders

observed

( ; )

( ) ( | , )

( | ).

j j

n

i j f mG G G i j

L Y

P G P G G G

P Y G

Page 7: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

7

Transmission ProbabilitiesTransmission Probabilities

P(AA transmits A) = τ AA A

P(Aa transmits A) = τ Aa A

P(aa transmits A) = τ aa A

Value if there is Mendelian segregation

1

½

0

Page 8: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

8

• We examine segregating sibships

• The proportion of sibs affected is larger than expected on the basis of

Mendelian inheritance

• The likelihood must be conditional on the mode of ascertainment

• We need to know the proband sampling frame

AscertainmentAscertainment

Page 9: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

9

CosegregationCosegregation

• Chromosome segments are transmitted

• Cosegregation is caused by linked loci

ultimate statistical proof of genetic etiology

Page 10: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

10

Methods of Linkage AnalysisMethods of Linkage Analysis

• Trait model-based – assume a genetic model underlying the trait

• Trait model-free - no assumptions about the genetic model underlying the trait

(parametric)

(non-parametric)

• Ascertainment is often not an issue for locus detection by linkage analysis

Page 11: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

11

Model-based Linkage AnalysisModel-based Linkage Analysis

• If founder marker genotypes are unknown, we can

1) estimate them

2) use a database

• If founder marker genotypes are known or can be inferred exactly,

→ no increase in Type 1 error

→ smallest Type 2 error when the model is correct

• All parameters other than the recombination fraction are assumed known

Page 12: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

12

1 2 founders nonfounders

observed

( ; )

( ) ( | , )

( | ).

j j

n

i j f mG G G i j

P G P G G G

P Y G

L Y

( | , ) is expressed as a function of

2-locus transmission probabilitiesj jj f mP G G G

(1 )

2

2

AB ABAB abab ab

AB ABAb aBab ab

and

Page 13: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

13

Model-free Linkage AnalysisModel-free Linkage Analysis

Identity-in-state versus Identity-by-descentIdentity-in-state versus Identity-by-descent

Two alleles are identical by descent if they are copies of the same parental allele

AA11AA11 AA11AA22

AA11AA22 AA11AA22

IBDIBD

Page 14: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

14

Sib pairs shareSib pairs share

0, 1 or 2alleles identical by descent at a marker locus

0, 1 or 2alleles identical by descent at a trait locus

LinkageLinkage

The average proportion shared at any particular The average proportion shared at any particular locus is locus is 11//22

Page 15: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

15

Relative Pair Model-Free Linkage AnalysisRelative Pair Model-Free Linkage Analysis

• We correlate relative-pair similarity (dissimilarity) for the trait of interest with relative-pair similarity (dissimilarity) for a marker

• Affected relative pair analysis: Do affected relative pairs share more marker alleles than expected if there is no linkage?• No controls!

• Linkage between a trait locus and a marker locus

→ positive correlation

Page 16: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

16

AssociationAssociation

• Causes of association between a marker and a disease

• chance• stratification, population heterogeneity• very close linkage• pleiotropy

Page 17: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

17

Causes of Allelic AssociationCauses of Allelic Association

The best solution to avoid this confounding is to study only ethnically homogeneous populations

Heterogeneity/stratification

This allelic association is nuisance association

Simpson's paradox: If we mix two populations that have both different disease prevalence and different marker allele prevalence, and there is no association between the disease and marker allele in each population, there will be an association between the disease and the marker allele in the mixed population.

Page 18: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

This chromosome is passed down through the generations,

and now there are many copies. If the distance between D

and A1 is small, recombinations are unlikely, so most D

chromosomes carry A1

This is the type of allelic association we are interested in

Imagine a number of generations ago, a normal allele d

mutated to a disease allele D on a particular chromosome

on which the allele at a marker locus was A1

mutationA1 d A1 D

(Tight)(Tight) LinkageLinkage

Page 19: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

19

Guarding Against StratificationGuarding Against Stratification

• Three solutions:

• use a homogenous population

• use family-based controls

• use genomic control

Page 20: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

20

Matching on EthnicityMatching on Ethnicity• Close relatives are the best controls, but can lead to

overmatching• Cases and control family members must have the

same family history of disease

SiblingsSiblings CousinsCousins

Page 21: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

21

Transmission Disequilibrium Test Transmission Disequilibrium Test (TDT)(TDT)

• A design that uses pseudosibs as controls• Cases and their parents are typed for markers

A1A2 A2A2

A1A2

Transmitted genotype is A1A2

Untransmitted genotype is A2A2

Father transmits A1, does not transmit A2

Mother transmits A2, does not transmit A2

(uninformative in terms of alleles)

Page 22: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

22

• Build up a 2 x 2 table:Build up a 2 x 2 table:Transmitted

A1 A2

Untransmitted A1

A2•

Transmitted

A1 A2

Untransmitted A1

A2c

a b

d

• The counts a and d come from homozygous parents

• The counts b and c come from heterozygous parents

• McNemar's test : χ12

(b - c)2

b + c

Page 23: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

23

Genomic ControlGenomic Control

• Calculate an association statistic for acandidate locus

• Calculate the same association statistic, from the same sample, for a set of unlinked loci

• Determine significance by reference to the results for the unlinked loci

Page 24: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

24

Linkage Between Linkage Between a Marker and a Diseasea Marker and a Disease

• Intrafamilial association

• Typically no population association

• Not affected by population stratification

• Population association if very close

Page 25: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

25

Association versus LinkageAssociation versus Linkage

Allelic Association Linkage

• Association at the population level

Intrafamilial association

• Pinpoints alleles Pinpoints loci• More powerful Less powerful

• More tests required Fewer tests required• More sensitive to mistyping

Less sensitive to mistyping

• Sensitive to population stratification

Not sensitive to population stratification

• Which is better?

Page 26: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

26

What is the Best Design and Analysis?What is the Best Design and Analysis?

Note: cost, burden of multiple testing

• If heterogeneity / stratification could be an issue, genome scan desired,

large extended pedigrees, type all (founders and non- founders) for 200-400 equi-spaced markers, for linkage analysis

• If heterogeneity / stratification is a non-issue,unrelated cases and controls for association analysis

(genome scan?)

A wise investigator, like a wise investor, would hedge bets with a judicious mix

Page 27: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

27

Case-Control DataCase-Control Data• Consider a particular marker allele, A1, sample of cases and controls:

Nn2n1n0Total

Ss2s1s0Controls

Rr2r1r0Cases

Total210

Number of A1 alleles

Page 28: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

28

• Cochran-Armitage trend: test the null hypothesis

p2 + ½p1 = q2 + ½q1

without assuming the two alleles a person has are independent

Sasieni (1997) Biometrics 53:1253-1261

q2q1q0Controls

p2p1p0Cases

210 Number of A1 Alleles

• Consider the probability structure:

Page 29: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

29

21 12 2 2 1 2 1

1 2

1 2 1 22

ˆ ˆ ˆ ˆ(p + p ) - (q + q )Y =

1 1 1 1 1+ N n + n - n + n

R S N 4 2

asymptotically has a χ2 distribution with 1 d.f

Page 30: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

30

Cochran-Armitage Trend TestCochran-Armitage Trend Test• Does not assume independence of alleles within a

person• Does assume independence of genotypes from person to

person

genomic control.Devlin and Roeder (1999) Biometrics 55:997-1004

• Is not valid if there is population stratification

• The increased variance due to stratification can be estimated from a random set of markers that are independent of the disease

Page 31: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

31

Case-only StudiesCase-only Studies• Look at departure from

(1-p)22p(1-p)p2

A*A*A1A*A1A1

where p = P(A1) = p2 + ½p1

• Hardy-Weinberg Disequilibrium (HWD) test statistic:ˆ ˆ ˆ

2212 2 12 2

1

p - (p + p )χ

estimated variance

é ùê úë û ®

• Suggested as• more powerful (only cases needed)• more precise (signal decreases faster with distance

from the causative locus)

Page 32: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

32

Case - only StudiesCase - only Studies

• there must be a difference in HWD between cases and controls

• No controls

2

2 21 12 22 2 1 2 2 1

2

ˆ ˆ ˆ ˆ ˆ ˆp -(p + p ) - q -(q + q )Y =

estimated variance

• therefore we consider this HWD trend test:

• No power in the case of a multiplicative model

1 * 1 1 * *P(affected | A A ) P(affected | A A ) P(affected | A A )

Page 33: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

33

1

2

b²Y =

var(b)

d²Y =

var(d)

ˆˆ

ˆˆ

Page 34: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

34

2

2 2

ˆ ˆw | b | (1 w) | d |Y ˆ ˆ ˆ ˆw var(b) (1 w) var(d) 2w(1 w)cov(| b |,| d |)

We want to give more weight to b or d, whichever yields the larger signal

1

1 2

Yw

Y Y

Therefore take

Weighted average of the Cochran-Weighted average of the Cochran-Armitage trend test and the HWD trend Armitage trend test and the HWD trend

test statisticstest statistics

Page 35: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

35

• To investigate the null distribution of this average we simulate many different situations – sample sizes up to 10,000 cases and 10,000 controls - and generate

0 1 2 0 1 2p ,p ,p for cases and q ,q ,q for controlsˆ ˆ ˆ ˆ ˆ ˆ

• For all situations considered, the distribution is well approximated by a Gamma distribution

Page 36: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

36

• As the sample size and marker allele frequency increase, the largest mean and the smallest variance occur for 10,000 cases and 10,000 controls, and for a marker allele frequency 0.5

• For 10,000 cases and 10,000 controls, and marker allele frequency 0.5, the upper tail of the distribution is well approximated by a Gamma distribution with mean μ = 1.78 and variance σ2 = 3.45

Page 37: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

37

• We develop a prediction equation to determine percentiles of the null distribution for smaller sample sizes and marker allele frequencies

• We base goodness of fit on the root mean squared error (RMSE) of logeα, calculated

for various sample size combinations, from the variance among 50 replicate samples:

1

22

e e

1ˆRSME = (log α - log α)

50

Page 38: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

38

• With ~90% confidence, the true loge α lies in the

interval logeα + 1.645(RSME), i.e., α is within

e+1.645(RSME) - fold of the true α• For total sample size (R + S) 200 or larger and α =

0.0001 or larger, in the very worst case (R = S = 100, α =

0.0001) with 90% confidence α could differ from the

true α by a factor of at most ~ 4.8

• The average RMSE is 0.35, corresponding to being between 78% and 122% of the true α with 90% confidence

Page 39: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

39

Probability of being affected given

A1A1 A1A* A*A*

1 Recessive 1 1.00 0.10 0.10

2 Recessive 2 1.00 0.05 0.05

3 Additive 1.00 0.50 0.00

4Multiplicative 0.81 0.045 0.0025

POWERPOWERGenetic Models SimulatedGenetic Models Simulated

• Marker loci placed at distances 0 – 6 cM from the disease susceptibility locus

• For type I error, no association between the disease and marker loci

• Each simulated population contains 500,000 individuals allowed to randomly mate for 50 generations after the appearance of a disease mutation

Page 40: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

40

Tests PerformedTests PerformedHomogeneous populations

• HWD, cases only

• Allele test

• Allele test x HWD in cases

• HWD trend test

• Cochran-Armitage trend test

• Cochran-Armitage trend test x HWD trend test

• Weighted average

Population stratification

• Cochran-Armitage trend test with genomic control

• Product of this and the HWD trend test

• Weighted average with genomic control

Page 41: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

41

Type I error, homogeneous populationType I error, homogeneous population

∆ HWD test, cases only

▲ product of the allele test and HWD test

Page 42: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

42

Type I error, population stratificationType I error, population stratification

○ allele test ◊ Cochran-Armitage trend test▲ product of the allele test and HWD test ■ weighted average test ● product of the Cochrn-Armitage trend test and the HWD test

Page 43: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

43

Power, homogeneous populationPower, homogeneous population

■ weighted average test

Page 44: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

44

Power, population stratificationPower, population stratification

□ HWD trend test♦ CA test with genomic control■ weighted average with genomic control

Page 45: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

45

ConclusionsConclusions

• Under recessive inheritance, the weighted average has better performance than either the Cochran-Armitage trend test or the HWD trend test

• Has good performance for other models as well

• The product of the Cochran-Armitage trend test statistic and the HWD test statistic (cases only) has better power, but has inflated Type I error if there is population stratification

• The weighted average has good overall properties, automatically controls for marker mistyping

Page 46: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

46

With acknowledgment to

Kijoung Song

Page 47: 1 Human Genetics Genetic Epidemiology. 2 Family trees can have a lot of nuts

47

Can we use evolutionary models, when we have large amounts of genetic data on a sample of cases and controls, to obtain a more powerful way of detecting loci involved in the etiology of disease?

Will these models bear fruit or nuts?