Lecture 21: Quantitative Traits I Date: 11/05/02 Review: covariance, regression, etc Introduction to quantitative genetics

Lecture 21: Quantitative Traits I

Date: 11/05/02 Review: covariance, regression, etc Introduction to quantitative genetics

Joint Density

Suppose you have two random variables X and Y. For each entity in the sample you take from the population you obtain a random pair (Xi, Yi).

The joint density function for two random variables is given by p(x,y) such that

2

1

2

1

,,P 2121

y

y

x

x

dxdyyxpxXxyYy

Conditional Density

The conditional density function for the random variable Y conditional on a particular realization x of the random variable X is p(y|x).

The joint and conditional densities are related: p(x,y) = p(y|x)p(x), where p(x) is the marginal density

dyyxpxp ,

Independent Random Variables

If the random variables X and Y are independent, then p(x,y) = p(x)p(y).

As a consequence, we also have p(y|x) = p(y) and p(x|y) = p(x).

If X and Y are dependent, then the relationship between then is either linear or nonlinear. In either case, a linear relationship is a first approximation to the true relationship.

Conditional Expectation

The expectation of the product is given by

The conditional expectation is given by

If X and Y are independent, then

dyxyypXYE

YXY EE

dxdyyxxypXY ,E

Covariance

Definition: The covariance of two random variables is defined as

and an covariance estimator is given by

YX

YX

XY

YXYX

E

E,

n

iii yx

nyxyxyx

n

nYX

1

1 where,

1,Cov

Meaning of Covariance

The sign of the covariance implies something about how Y responds to changes in X or vice versa.

dencylinear ten no is there0

increases as decreases 0

increases as increases 0

, XY

XY

YX

Regression and History

The use of a linear function as a first approximation of the relationship between two random variables is termed regression.

In fact, regression was first introduced in a genetics context.

Galton (1889) studied the average height of parents X and the height of offspring Y.

XXY E exy

Least Squares Method

The least squares method finds estimates a and b for the coefficients of the assumed linear relationship by minimizing the mean squared error.

Because means, variances and covariances are available from phenotypic data, these estimates are particularly useful.

X

YXb

xbya

Var

,Cov

Homoscedasticity and Heteroscedasticity

Definition: If the variance in the residual error is constant regardless of the dependent variable x, then E(Y|X) is homoscedastic.

Definition: There is heteroscedasticity in the data if the residual variance depends on the value of the dependent variable x.

Transformations exist to achieve homoscedasticity.

Example: Regression

Suppose Cov(X,Y) = 10, Var(X) = 10, Var(Y) = 15 and the means E(X) = E(Y) = 0.

Regress X on Y and Y on X. Then the intercept estimates is

and the slope estimates are 0 xbya

YXY

YX

XYX

YX

bon

3

2

15

10

Var

,Cov

on 110

10

Var

,Cov

Correlation

Definition: The correlation of two random variables X and Y is defined as

with estimator

Correlations are scale independent:

YX

YX

,

YX

YXr

VarVar

,Cov

yxrdycxr ,,

Correlations imply linear association.

r is the standardized regression coefficient that is obtained if x and y are scaled to have unit variance.

r2 measures the proportion of the Var(Y) that is explained if E(Y|X) is linear.

Correlation (cont)

associatedlinearly not are and that implies 0~

associatedlinearly andstrongly are and that implies 1~

YXr

YXr

Example: rats

1 2 3 4 5 6 7 8 9 10 11 12 Totals50 1 3 1 560 1 6 2 970 2 10 17 12 4 1 4680 1 1 11 8 18 10 9 3 2 6390 2 5 7 18 30 28 12 5 1 108100 3 5 10 25 37 35 21 7 2 1 146110 1 4 12 19 38 37 29 6 2 148120 2 6 9 21 36 26 30 14 6 1 151130 4 4 9 12 35 29 17 17 6 1 1 1 136140 1 4 6 9 12 27 15 6 2 1 83150 3 2 13 11 6 6 2 43160 2 1 11 11 9 3 4 41170 1 1 1 2 4 2 2 1 1 15180 1 1 2 2 2 8190 1 1

Totals 15 34 68 129 258 235 156 71 29 4 3 1 1003

03.0

17.09.21.623

4.7

01.01.623

4.7

4.7

E1002

1003,Cov

1.660E

9.2Var ,49.5

1.623Var ,9.118

2

r

r

b

yxXYYX

XY

YY

XX

Quantitative Traits

Definition: A quantitative trait is one with a continuous distribution. In other words, it is a trait that is measured not counted.

It is assumed that quantitative traits are controlled by many genes, each with small effect. Environmental effects are also important.

Definition: A quantitative trait locus (QTL) is locus controlling a quantitative trait.

Quantitative Trait – A Model

We start with a very simple scenario. Suppose there is one locus determining a quantitative trait. Suppose that there are only two alleles at this locus. We seek a model for this scenario. This model will have two parameters to account for the two degrees of freedom (when location is removed) among the 3 possible outcomes (genotypes).

Let the phenotypic value of a particular genotype be z. When environment has an effect, z is a consequence of both the underlying genotype and the environment. We can write z = G + E. Here G is the genotypic value and it is the expected phenotypic value averaged over all environments.

Each of the three genotypes has an associated genotypic value.

Phenotypic and Genotypic Value

11BBG21BBG

22BBG

Quantitative Trait – Model A

11BBG21BBG

22BBG

11BBG21BBG

22BBG

0 (1+k)a 2a

Quantitative Trait – Model B

11BBG21BBG

22BBG

11BBG21BBG

22BBG

-a d a

Model A – Parameter Meanings

11BBG21BBG

22BBG

0 (1+k)a 2a

Value of k Genetic Interpretation

0

1

-1

>1 overdominance

<-1 underdominance

Example – Scaling Quantitative Trait

The Booroola (B) gene influences fecundity in Merino sheep.

Genotype Mean Litter Size

BB 2.66

Bb 2.17

bb 1.48

17.059.0

59.069.0

48.117.2

59.02

48.166.2

a

ak

a

Gene Content

Definition: The B1 gene content of a genotype is the number of copies of the B1 allele. The gene content for allele B1 in genotype B1B2 is 1.

At a single locus, the genotypic value is not a linear relationship on gene content, unless k=0.

0

a

2a

1 2

0

(1+k)a

2a

Partitioning Genotypic Value

Let N1 be the number of B1 alleles in the genotype.

Let N2 be the number of B2 alleles in the genotype.

Then, multiply regress the genotypic value on independent variables N1 and N2.

Assume again only two alleles, then N1 = 2 – N2. Call N2 = N.

ijGij NNG 2211

ij

ijGij

N

NG

1212

Predicted Genotypic Values

The predicted genotypic values are given as

22

2,1

12ˆ

2

21

1

ji

ji

ji

G

G

G

G

ij

Weighted Mean of ’s is 0

0

022EEE

EEEE

2211

22112211

2211

pp

ppNN

NNG

ij

ijGGij

02211

12

pp

21

12

p

p

Slope of Regression Line

Recalling the formula for the slope of a regression line, we have

We will now find expressions for the covariance and variance.

N

NG2

,

Derivation of Slope

21222

2121

1222221

122221

222221

2

22221

2E

12E,

122412E

12212

1242E

222

ppNN

ppkappGNNG

kppapapkappGN

kpapapkapp

pppppN

pppp

N

NG

G

N

211 ppka

Average Effect of Allelic Substitution

The previous derivations were completed under the assumption of random mating and HWE.

The slope is the change in genotypic value associated with the addition of one more allele. To add one more B2 allele, one must replace another B1 allele with B2, so it is also called the average effect of allelic substitution.

Except under additivity (k=0), this substitution effect can only be defined in terms of the population.

Partitioning Genetic Variance

Because we now have a linear function for genotypic value G

we can write the total genetic variance as

but there is no covariance term.

GG ˆ

22

22

,ˆ2ˆ

ˆ

GG

GG

Additive and Dominance Components

The first term is additive genetic variance: the amount of variance of G that is explained by regression on N.

The second term is dominance genetic variance: the residual variance for the regression.

We seek an expression for both terms.

222DAG

Derivation of Slope

Genotype B1B1 B1B2 B2B2

N 0 1 2

G 0 (1+k)a 2a

Freq.*

GN 0 (1+k)a 4a

N2 0 1 4

21p 212 pp 2

2p

G 12 G 22 G21 G

12 G 211 Gak 222 Ga

Derivation of Genetic Variance Components

2

21

21

22

2121

21

21

2

22

2122

1

21

22

2121

21

21

2

22211

2122211ˆ

2

222

2122E

12242

22222ˆE

02122

22222

akpp

ap

akppp

papp

ppppG

apakpp

pppp

G

GG

GG

GGG

G

GGG

Genetic Variance Components

221

2

221

2

2

2

akpp

pp

D

A

Both components depend on gene frequencies (conditional on population from which they are derived).

When k=0 (purely additive effects), then additive genetic variance is maximized when heterozygosity is maximized.

With dominance k>0, additive genetic variance is maximized at higher frequencies of the recessive allele. Rare recessive alleles cause little genetic variance because they are not often expressed.

Why?

Why have we partitioned the genotypic value into additive and dominance components?

When a parent transmits alleles to the offspring, the dominance deviation in the parent is irrelevant because only one gamete is transmitted.

May think of additive genetic component as the heritable component of an individual’s genotypic value.

Average Excess

There are multiple ways to measure the effect of an allele. The effect of allelic substitution is one. The additive effect i is another.

Definition: The average excess of an allele is the difference between the mean genotypic value of individuals carrying at least one copy of the allele and the mean genotypic value of a random individual form the entire population.

GBBBGBBBG 2222222112*2 PP

Average Excess with Random Mating

2*2

1

1221

222112

2222222112*2

1221

PP

p

p

kpapapkap

pGpG

BBBGBBBG

G

G

Breeding Value

Definition: The breeding value of an individual is the sum of the additive effects.

Genotype Breeding Value

B1B1 21

B1B2 1+2

B2B2 2

Breeding Value and Random Mating

Consider the expected genotypic values of progeny produced by parental genotypes.

Genotype Breeding Value

Progeny Expected Genotypic Value

Deviation

B1B1 21 ap2(1+k) 1

B1B2 1+2 a[p2 + (1+k)/2] 1+2)/2

B2B2 2 a[2p2 + p1(1+k)] 2

How to Use this Analysis

A common approach today. Identify candidate loci that are potential

contributors to the variation of the trait of interest. Genotype a random selection of individuals are

identified by molecular markers. Determine average phenotypic values within each

genotypic class. Estimate fraction of total phenotypic variation

associated with candidate locus.

An Example

Consider the Booroola gene example in two random mating populations where the gene B is present at gene frequencies 0.5.

BB Bb bb

Gij 2.66 2.17 1.48

Pij 0.25 0.50 0.25

Booroola (cont)

Value Estimate

Mean genotypic value

2.120

Additive Effects B = 0.295

b = -0.295

Breeding Values ABB = 0.59; ABb = 0; Abb = 0.59

Dominance Deviations

DBB = -0.05; DBb = 0.05; Dbb = -0.05

Booroola (cont)

Value Estimate

Additive Genetic Variance

0.1740

Dominance Genetic Variance

Total Genetic Variance

0.1765

Documents

Lecture 21: Quantitative Traits I Date: 11/05/02 Review: covariance, regression, etc Introduction to quantitative genetics