Linkage and association

Linkage and association

Sarah Medland

Genotypic similarity between relatives

IBS Alleles shared Identical By State “look the same”, may have the

same DNA sequence but they are not necessarily derived from a

known common ancestor - focus for associationIBD Alleles shared

Identical By Descent

are a copy of the

same ancestor allele

- focus for linkage

M1

Q1

M2

Q2

M3

Q3

M3

Q4

M1

Q1

M3

Q3

M1

Q1

M3

Q4

M1

Q1

M2

Q2

M3

Q3

M3

Q4

IBS IBD

2 1

In linkage analysis we will be estimating an additional variance component Q For each locus under analysis the coefficient of

sharing for this parameter will vary for each pair of siblings The coefficient will be the probability that the pair of

siblings have both inherited the same alleles from a common ancestor

Q A C E

PTwin1

E C A Q

PTwin2

MZ=1.0 DZ=0.5

MZ & DZ = 1.0

1 1 1 11 1 1 1

q a c e e c a q

Alternative approach

is a summary statistic Convenient Loss of information

.5 can mean p.ibd0=0 p.ibd1=1 p.ibd2=0

or

p.ibd0=0 p.ibd1=.6 p.ibd2=.2

Use all the information

Alternative approach

Model each of the possible outcomes IBD0 IBD1 IBD2

Weight each of the models by the probability that it is the correct model

The pairwise likelihood is equal to the sum of likelihood for each model multiplied by the probability it is the correct model

The combined likelihood is equal to the sum of all the pairwise likelihoods

DZ pairs

T2

QEQ F

q qf e

T1

1 111

E

e

1

F

f

1

.5

1

1m3 m4

T2

QEQ F

q qf e

T1

1 111

E

e

1

F

f

1

1

1m5 m6

T2

QEQ F

q qf e

T1

1 111

E

e

1

F

f

1

1

1

1m1 m2

* pIBD2

+

* pIBD1

+

* pIBD0

How to do this in mx? Script link_mix.mx

G2 DZ TWINS Data NInput=124 NModel=3 Missing =-99.00 Rectangular File=example3.dat Labels ….

Select pheno1 pheno2 z0_20 z1_20 z2_20 age1 sex1 age2 sex2;Definition_variables z0_20 z1_20 z2_20 age1 sex1 age2 sex2;

Tells Mx we will be using 3 different means and variance models

Tells Mx that these variables will be used as covariates – the values for these

variables will be updated for each case during the optimization – the mxo will show

the values for the final case

How to do this in mx?

Begin Matrices; X Lower nvar nvar = X1 Z Lower nvar nvar = Z1 D Lower nvar nvar = D1 B full 3 1 ! will contain IBD probabilities (from Genehunter) def var…

Matrix H 0.5

Specify B z0_20 z1_20 z2_20 ! put ibd probabilities in B

This script runs an AE model D is the QTL VC path coefficent

We are placing the prob. Of being IBD 0 1 & 2 in

the B matrix


Begin Algebra; T = X*X'+Z*Z'+D*D' ; ! total variance U = H@X*X' ; ! IBD 0 cov (=non-qtl cov) K = U + H@D*D' ; ! IBD 1 cov W = U + D*D' ; ! IBD 2 cov

A = T|U_ U|T_ ! IBD 0 matrix T|K_ K|T_ ! IBD 1 matrix T|W_ W|T ; ! IBD 2 matrix

Pre-computing the total variance and the covariance for the diff. IBD groups

Stacking the pre-computed covariance

matrices


Means G+O*R'| G+S*R'_G+O*R'| G+S*R'_G+O*R'| G+S*R'; Covariance A ; Weights B ;

The means matrix contains corrections for age and sex – it is repeated 3 times

and vertically stacked

Tells Mx to weight each of the means and var/cov matrices by the IBD

prob. which we placed in the B matrix

Summary Weighted likelihood approach more powerful

than pi-hat Quickly becomes unfeasible

3 models for sibship size 2 27 models for sibship size 4 Q: How many models for sibship size 3?

For larger sib-ships/arbitrary pedigrees pi-hat approach is method of choice

Association

Introduction

Association

Simplest design possibleCorrelate phenotype with genotype

Candidate genes for specific diseasescommon practice in medicine/genetics

Genome-wide association with millions available SNPs, can search whole genome exhaustively

Allelic Association

chromosomeSNPs trait variant

Genetic variation yields phenotypic variation

More copies of ‘B’ allele

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6

More copies of ‘b’ allele

2a

bb BBBb

d

midpoint

Genotype Genetic Value

BBBbbb

ad-a

Va (QTL) = 2pqa2 (no dominance)

Biometrical model

Basic premise of assoc. for qualitative trait Chose a phenotype & a candidate gene(s)

Collect 2 groups - cases and controls Unrelated individuals Matched for relevant covariates

Genotype your individuals for your gene(s) Count the % of cases & controls with each

genotype Run a chi-square test

Yi = + Xi + ei

whereYi = trait value for individual iXi = 1 if allele individual i has allele ‘A’

0 otherwise

i.e., test of mean differences between ‘A’ and ‘not-A’ individuals

0

0.2

0.4

0.6

0.8

1

1.2

X

Y

The equivalent for a quantitative trait- run a regression

Play with Association.xls

Practical – Find a gene for sensation seeking:

Two populations (A & B) of 100 individuals in which sensation seeking was measured

In population A, gene X (alleles 1 & 2) does not influence sensation seeking

In population B, gene X (alleles 1 & 2) does not influence sensation seeking

Mean sensation seeking score of population A is 90 Mean sensation seeking score of population B is 110 Frequencies of allele 1 & 2 in population A are .1 & .9 Frequencies of allele 1 & 2 in population B are .5 & .5

Sensation seeking score is the same across genotypes, within each population.

Population B scores higher than population A

Differences in genotypic frequencies

.01 .18 .81 .25 .50 .25Genotypic freq.

Suppose we are unaware of these two populations and have measured 200 individuals and typed gene X The mean sensation seeking score of this mixed population is 100What are our observed genotypic frequencies and means? Calculating genotypic frequencies in the mixed populationGenotype 11:1 individual from population A, 25 individuals from population B on a total of 200 individuals: (1+25)/200=.13Genotype 12: (18+50)/200=.34 Genotype 22: (81+25)/200=.53Calculating genotypic means in the mixed populationGenotype 11:1 individual from population A with a mean of 90, 25 individuals from population B with a mean of 110 = ((1*90) + (25*110))/26 =109.2Genotype 12: ((18*90) + (50*110))/68 = 104.7Genotype 22: ((81*90) + (25*110))/106 = 94.7

Now, allele 1 is associated with higher sensation seeking scores, while in both populations A and B, the gene was not associated with sensation seeking scores…

FALSE ASSOCIATION

.13 .34 .53Genotypic freq.

Gene X is the gene for sensation seeking!

What if there is true association?

allele 1 frequency 0.1


allele 1 = -2, allele 2 = +2

Pop mean = 90


allele 2 frequency 0.5.

allele 1 = -2 allele 2 = +2

Pop mean = 110

.01 .18 .81 .25 .50 .25Genotypic freq.

Calculate:

Genotypic means in mixed population Genotypic frequencies in mixed population Is there an association between the gene and

sensation seeking score? If yes which allele is the increaser allele?

Reversal effects

Overestimation

Underestimation

-3

-2

-1

0

1

2

3

4

50

.49

0.4

2

0.3

5

0.2

8

0.2

1

0.1

4

0.0

7

0.0

0

-0.0

7

-0.1

4

-0.2

1

-0.2

8

-0.3

5

-0.4

2

-0.4

9

Est

imate

d v

alu

e o

f alle

lic e

ffect

=-10

=-5

=5

=10

Genuine allelic effect=+2

Difference in gene frequency in subpopulations

Difference in subpopulation mean

False positives and false negatives

Posthuma et al., Behav Genet, 2004

How to avoid spurious association?True association is detected in people coming from the same genetic stratumCan check that individuals come from the same population using a large set of highly polymorphic genes – genomic controlCan use family members as controls – family based association

Fulker (1999) between/within association model

Fulker et al, AJHG, 1999

bij as Family Control

bij is the expected genotype for each individual Ancestors Siblings

wij is the deviation of each individual from this expectation Informative individuals

To be “informative” an individual’s genotype should differ from expected

Have heterozygous ancestor in pedigree βb≠ βw is a test for population stratification βw > 0 is a test for association free from stratification

BTW – this is on top of a linkage model

So…

4 tests for the price of one Test for pop-stratification aw=ab

Robust test for association aw=0

Test so see if linked loci is the functional variant

QTL≠ 0 in the presence of aw it is in LD with the

variant but is not the casual variant

Test for dominance effects if dominance is also

modeled

Combined Linkage & associationImplemented in QTDT (Abecasis et al., 2000) and Mx (Posthuma et al., 2004)

Association and Linkage modeled simultaneously:• Association is modeled in the means• Linkage is modeled in the (co)variances

QTDT: simple, quick, straightforward, but not so flexible in terms of modelsMx: less simple, but highly flexible

Implementation in Mx link_assoc.mx#define n 3 ! number of alleles is 3, coded 1, 2, 3…

G1: calculation group between and within effectsData Calc

Begin matrices; A Full 1 n free ! additive allelic effects within C Full 1 n free ! additive allelic effects between D Sdiag n n free ! dominance deviations within F Sdiag n n free ! dominance deviations between I Unit 1 n ! one's End matrices;

Specify A 100 101 102Specify C 200 201 202Specify D 800 801 802Specify F 900 901 902

The locus has 3 alleles

These 1*3 vector matrices contain the b/n & w/n additive effects of each of the 3 alleles

These 3*3 off-diagonal matrices contain the b/n & w/n dominance effects of each of the 3 alleles

Sticking it together…

Begin algebra; K = (A'@I) + (A@I') ; ! Within effects, additive L = D + D' ; ! Within effects, dominance W = K+L ; ! Within effects - additive and dominance in one matrix

M = (C'@I) + (C@I') ; ! Between effects, additive N = F + F' ; ! Between effects, dominance B = M+N ; ! Between effects - additive and dominance in one matrix End algebra ;

W = K+L ;K+L =

2*a1a1+a2+

d12a1+a3+

d13

a1+a2+ d12

2*a2a2+a3+

d23

a1+a3+ d13

a2+a3+ d23

2*a3

This is the 1/2 genotype mean composed of the simple additive

effects of allele 1 + allele 2 and any deviation from these simple additive

effects (dominance effects)

W = K+L ;K+L (parameter numbers) =

100+100101+100+

800102+100+

801

100+101+800

101+101102+101+

802

100+102+801

101+102+802

102+102

Between effects stick together in the same way

K = (A'@I) + (A@I') ; ! Within effects, additiveL = D + D' ; ! Within effects, dominanceW = K+L ; ! Within effects total

K = (A'@I) + (A@I') =

a1 1 a2 @ [1 1 1] + [a1 a2 a3] @ 1 = a3 1

a1 a1 a1 a1 a2 a3 a1a1 a1a2 a1a3a2 a2 a2 + a1 a2 a3 = a2a1 a2a2 a2a3a3 a3 a3 a1 a2 a3 a3a1 a3a2 a3a3

I = [ 1 1 1], A = [a1 a2 a3] D = 0 0 0

d21 0 0d31 d32 0

W = K+L =a1a1 a1a2 a1a3 0 d21 d31 a1a1 a1a2d21 a1a3d31a2a1 a2a2 a2a3 + d21 0 d32 = a2a1d21 a2a2 a2a3d32a3a1 a3a2 a3a3 d31 d32 0 a3a1d31 a3a2d32 a3a3

L = D + D' =

0 0 0 0 d21 d31 0 d21 d31d21 0 0 + 0 0 d32 = d21 0 d32d31 d32 0 0 0 0 d31 d32 0

M = (C'@I) + (C@I') ; ! Between effects, additive N = F + F' ; ! Between effects, dominance B = M+N ; ! Between effects - total

We have a sibpair with genotypes 1,1 and 1,2. μ1 = Grand mean + pair mean + ½ pair differenceμ2 = Grand mean + pair mean - ½ pair difference

To calculate the between-pairs effect, or the mean genotypic effect of this pair, we need matrix B: ((c1c1) + (c1c2f21)) / 2

To calculate the within-pair effect we need matrix W and the between pairs effect:

For sib1: (a1a1) + ((c1c1) + (c1c2f21)) / 2For sib2: (a1a2d21) - ((c1c1) + (c1c2f21)) / 2

W = a1a1 a1a2d21 a1a3d31a2a1d21 a2a2 a2a3d32a3a1d31 a3a2d32 a3a3

B = c1c1 c1c2f21 c1c3f31c2c1f21 c2c2 c2c3f32c3c1f31 c3c2f32 c3c3

G3: datagroup: sibship size two, DZ… Definition_variables

tw1a1 tw1a2tw2a1 tw2a2

… Begin Matrices;…G Full 1 nvar = G2 ! grand mean B Computed n n = B1 ! spurious and genuine genotypic effects (between)W Computed n n = W1 ! genuine genotypic effects (within)K Full 1 4 Fix ! Will contain first and second allele of twin1 L Full 1 4 Fix ! Will contain first and second allele of twin2 S Full 1 1 Fix ! Will contain 2 (for two individuals per family)…End Matrices;Matrix K 1 1 1 1Matrix L 1 1 1 1

Specify K tw1a1 tw1a2 tw1a1 tw1a2 Specify L tw2a1 tw2a2 tw2a1 tw2a2

Alleles 1 and 2 for twins 1 and 2 respectively

Alleles 1 and 2 for twins 1 and 2 respectively

We are going to put the allele numbers here

These matrices must be initialized –given default values

Sticking it together using part For sibpairs 1,1 and 1,2 To calculate the between-pairs effect, or the

mean genotypic effect of this pair, we need matrix B: ((c1c1) + (c1c2f21)) / 2

We can use the part function to draw out a specified element of a matrix


Sticking it together using part We can use the part function to draw out a

specified element of a matrix

So to draw out c3c1f31 we could say:

\part(B, K)

where K is a matrix containing 1 3 1 3


Sibpair with genotypes: 1,1 and 1,2

Specify K tw1a1 tw1a2 tw1a1 tw1a2 = 1 1 1 1 Specify L tw2a1 tw2a2 tw2a1 tw2a2 = 1 2 1 2

V = (\part(B,K) + \part(B,L) ) %S ; (c1c1 + c1c2f21)/2C = (\part(W,K) + \part(W,L) ) %S ; (a1a1 + a1a2d21)/2

Means G + F*R '+ V + (\part(W,K)-C) | G + I*R' + V +(\part(W,L)-C); =

G + F*R’ + (c1c1 + c1c1f21)/2 + (a1a1 - (a1a1 + a1a2d21)/2) | G + I*R' + (c1c1 + c1c1f21)/2 + (a2a1 - (a1a1 + a1a2d21)/2)

W = a1a1 a1a2d21 a1a3d31a2a1d21 a2a2 a2a3d32a3a1d31 a3a2d32 a3a3


Constrain sum additive allelic within effects = 0 Constraint ni=1Begin Matrices; A full 1 n = A1 O zero 1 1 End Matrices;Begin algebra; B = \sum(A) ;End Algebra;Constraint O = B ;end

Constrain sum additive allelic between effects = 0 Constraint ni=1Begin Matrices; C full 1 n = C1 ! O zero 1 1 End Matrices;Begin algebra; B = \sum(C) ;End Algebra;Constraint O = B ;end

!1.test for linkage in presence of full associationDrop D 2 1 1 end

!2.Test for population stratification: !between effects = within effects. Specify 1 A 100 101 102Specify 1 C 100 101 202Specify 1 D 800 801 802Specify 1 F 800 801 802end

!3.Test for presence of dominance Drop @0 800 801 802end

!4.Test for presence of full associationDrop @0 800 801 802 100 101end

!5.Test for linkage in absence of associationFree D 2 1 1 end

Model Test -2ll df Vs model Chi^2 Df-diff P-value

0 - - -

1 Linkage in presence of association

2 B=W

3 Dominance

4 Full association

5 Linkage in absence of association

Documents

Linkage and association