Upload
magdalen-bridges
View
219
Download
0
Embed Size (px)
Citation preview
Lecture 21: Quantitative Traits I
Date: 11/05/02 Review: covariance, regression, etc Introduction to quantitative genetics
Joint Density
Suppose you have two random variables X and Y. For each entity in the sample you take from the population you obtain a random pair (Xi, Yi).
The joint density function for two random variables is given by p(x,y) such that
2
1
2
1
,,P 2121
y
y
x
x
dxdyyxpxXxyYy
Conditional Density
The conditional density function for the random variable Y conditional on a particular realization x of the random variable X is p(y|x).
The joint and conditional densities are related: p(x,y) = p(y|x)p(x), where p(x) is the marginal density
dyyxpxp ,
Independent Random Variables
If the random variables X and Y are independent, then p(x,y) = p(x)p(y).
As a consequence, we also have p(y|x) = p(y) and p(x|y) = p(x).
If X and Y are dependent, then the relationship between then is either linear or nonlinear. In either case, a linear relationship is a first approximation to the true relationship.
Conditional Expectation
The expectation of the product is given by
The conditional expectation is given by
If X and Y are independent, then
dyxyypXYE
YXY EE
dxdyyxxypXY ,E
Covariance
Definition: The covariance of two random variables is defined as
and an covariance estimator is given by
YX
YX
XY
YXYX
E
E,
n
iii yx
nyxyxyx
n
nYX
1
1 where,
1,Cov
Meaning of Covariance
The sign of the covariance implies something about how Y responds to changes in X or vice versa.
dencylinear ten no is there0
increases as decreases 0
increases as increases 0
, XY
XY
YX
Regression and History
The use of a linear function as a first approximation of the relationship between two random variables is termed regression.
In fact, regression was first introduced in a genetics context.
Galton (1889) studied the average height of parents X and the height of offspring Y.
XXY E exy
Least Squares Method
The least squares method finds estimates a and b for the coefficients of the assumed linear relationship by minimizing the mean squared error.
Because means, variances and covariances are available from phenotypic data, these estimates are particularly useful.
X
YXb
xbya
Var
,Cov
Homoscedasticity and Heteroscedasticity
Definition: If the variance in the residual error is constant regardless of the dependent variable x, then E(Y|X) is homoscedastic.
Definition: There is heteroscedasticity in the data if the residual variance depends on the value of the dependent variable x.
Transformations exist to achieve homoscedasticity.
Example: Regression
Suppose Cov(X,Y) = 10, Var(X) = 10, Var(Y) = 15 and the means E(X) = E(Y) = 0.
Regress X on Y and Y on X. Then the intercept estimates is
and the slope estimates are 0 xbya
YXY
YX
XYX
YX
bon
3
2
15
10
Var
,Cov
on 110
10
Var
,Cov
Correlation
Definition: The correlation of two random variables X and Y is defined as
with estimator
Correlations are scale independent:
YX
YX
,
YX
YXr
VarVar
,Cov
yxrdycxr ,,
Correlations imply linear association.
r is the standardized regression coefficient that is obtained if x and y are scaled to have unit variance.
r2 measures the proportion of the Var(Y) that is explained if E(Y|X) is linear.
Correlation (cont)
associatedlinearly not are and that implies 0~
associatedlinearly andstrongly are and that implies 1~
YXr
YXr
Example: rats
1 2 3 4 5 6 7 8 9 10 11 12 Totals50 1 3 1 560 1 6 2 970 2 10 17 12 4 1 4680 1 1 11 8 18 10 9 3 2 6390 2 5 7 18 30 28 12 5 1 108100 3 5 10 25 37 35 21 7 2 1 146110 1 4 12 19 38 37 29 6 2 148120 2 6 9 21 36 26 30 14 6 1 151130 4 4 9 12 35 29 17 17 6 1 1 1 136140 1 4 6 9 12 27 15 6 2 1 83150 3 2 13 11 6 6 2 43160 2 1 11 11 9 3 4 41170 1 1 1 2 4 2 2 1 1 15180 1 1 2 2 2 8190 1 1
Totals 15 34 68 129 258 235 156 71 29 4 3 1 1003
03.0
17.09.21.623
4.7
01.01.623
4.7
4.7
E1002
1003,Cov
1.660E
9.2Var ,49.5
1.623Var ,9.118
2
r
r
b
yxXYYX
XY
YY
XX
Quantitative Traits
Definition: A quantitative trait is one with a continuous distribution. In other words, it is a trait that is measured not counted.
It is assumed that quantitative traits are controlled by many genes, each with small effect. Environmental effects are also important.
Definition: A quantitative trait locus (QTL) is locus controlling a quantitative trait.
Quantitative Trait – A Model
We start with a very simple scenario. Suppose there is one locus determining a quantitative trait. Suppose that there are only two alleles at this locus. We seek a model for this scenario. This model will have two parameters to account for the two degrees of freedom (when location is removed) among the 3 possible outcomes (genotypes).
Let the phenotypic value of a particular genotype be z. When environment has an effect, z is a consequence of both the underlying genotype and the environment. We can write z = G + E. Here G is the genotypic value and it is the expected phenotypic value averaged over all environments.
Each of the three genotypes has an associated genotypic value.
Phenotypic and Genotypic Value
11BBG21BBG
22BBG
Quantitative Trait – Model A
11BBG21BBG
22BBG
11BBG21BBG
22BBG
0 (1+k)a 2a
Quantitative Trait – Model B
11BBG21BBG
22BBG
11BBG21BBG
22BBG
-a d a
Model A – Parameter Meanings
11BBG21BBG
22BBG
0 (1+k)a 2a
Value of k Genetic Interpretation
0
1
-1
>1 overdominance
<-1 underdominance
Example – Scaling Quantitative Trait
The Booroola (B) gene influences fecundity in Merino sheep.
Genotype Mean Litter Size
BB 2.66
Bb 2.17
bb 1.48
17.059.0
59.069.0
48.117.2
59.02
48.166.2
a
ak
a
Gene Content
Definition: The B1 gene content of a genotype is the number of copies of the B1 allele. The gene content for allele B1 in genotype B1B2 is 1.
At a single locus, the genotypic value is not a linear relationship on gene content, unless k=0.
0
a
2a
1 2
0
(1+k)a
2a
Partitioning Genotypic Value
Let N1 be the number of B1 alleles in the genotype.
Let N2 be the number of B2 alleles in the genotype.
Then, multiply regress the genotypic value on independent variables N1 and N2.
Assume again only two alleles, then N1 = 2 – N2. Call N2 = N.
ijGij NNG 2211
ij
ijGij
N
NG
1212
Predicted Genotypic Values
The predicted genotypic values are given as
22
2,1
12ˆ
2
21
1
ji
ji
ji
G
G
G
G
ij
Weighted Mean of ’s is 0
0
022EEE
EEEE
2211
22112211
2211
pp
ppNN
NNG
ij
ijGGij
02211
12
pp
21
12
p
p
Slope of Regression Line
Recalling the formula for the slope of a regression line, we have
We will now find expressions for the covariance and variance.
N
NG2
,
Derivation of Slope
21222
2121
1222221
122221
222221
2
22221
2E
12E,
122412E
12212
1242E
222
ppNN
ppkappGNNG
kppapapkappGN
kpapapkapp
pppppN
pppp
N
NG
G
N
211 ppka
Average Effect of Allelic Substitution
The previous derivations were completed under the assumption of random mating and HWE.
The slope is the change in genotypic value associated with the addition of one more allele. To add one more B2 allele, one must replace another B1 allele with B2, so it is also called the average effect of allelic substitution.
Except under additivity (k=0), this substitution effect can only be defined in terms of the population.
Partitioning Genetic Variance
Because we now have a linear function for genotypic value G
we can write the total genetic variance as
but there is no covariance term.
GG ˆ
22
22
,ˆ2ˆ
ˆ
GG
GG
Additive and Dominance Components
The first term is additive genetic variance: the amount of variance of G that is explained by regression on N.
The second term is dominance genetic variance: the residual variance for the regression.
We seek an expression for both terms.
222DAG
Derivation of Slope
Genotype B1B1 B1B2 B2B2
N 0 1 2
G 0 (1+k)a 2a
Freq.*
GN 0 (1+k)a 4a
N2 0 1 4
21p 212 pp 2
2p
G 12 G 22 G21 G
12 G 211 Gak 222 Ga
Derivation of Genetic Variance Components
2
21
21
22
2121
21
21
2
22
2122
1
21
22
2121
21
21
2
22211
2122211ˆ
2
222
2122E
12242
22222ˆE
02122
22222
akpp
ap
akppp
papp
ppppG
apakpp
pppp
G
GG
GG
GGG
G
GGG
Genetic Variance Components
221
2
221
2
2
2
akpp
pp
D
A
Both components depend on gene frequencies (conditional on population from which they are derived).
When k=0 (purely additive effects), then additive genetic variance is maximized when heterozygosity is maximized.
With dominance k>0, additive genetic variance is maximized at higher frequencies of the recessive allele. Rare recessive alleles cause little genetic variance because they are not often expressed.
Why?
Why have we partitioned the genotypic value into additive and dominance components?
When a parent transmits alleles to the offspring, the dominance deviation in the parent is irrelevant because only one gamete is transmitted.
May think of additive genetic component as the heritable component of an individual’s genotypic value.
Average Excess
There are multiple ways to measure the effect of an allele. The effect of allelic substitution is one. The additive effect i is another.
Definition: The average excess of an allele is the difference between the mean genotypic value of individuals carrying at least one copy of the allele and the mean genotypic value of a random individual form the entire population.
GBBBGBBBG 2222222112*2 PP
Average Excess with Random Mating
2*2
1
1221
222112
2222222112*2
1221
PP
p
p
kpapapkap
pGpG
BBBGBBBG
G
G
Breeding Value
Definition: The breeding value of an individual is the sum of the additive effects.
Genotype Breeding Value
B1B1 21
B1B2 1+2
B2B2 2
Breeding Value and Random Mating
Consider the expected genotypic values of progeny produced by parental genotypes.
Genotype Breeding Value
Progeny Expected Genotypic Value
Deviation
B1B1 21 ap2(1+k) 1
B1B2 1+2 a[p2 + (1+k)/2] 1+2)/2
B2B2 2 a[2p2 + p1(1+k)] 2
How to Use this Analysis
A common approach today. Identify candidate loci that are potential
contributors to the variation of the trait of interest. Genotype a random selection of individuals are
identified by molecular markers. Determine average phenotypic values within each
genotypic class. Estimate fraction of total phenotypic variation
associated with candidate locus.
An Example
Consider the Booroola gene example in two random mating populations where the gene B is present at gene frequencies 0.5.
BB Bb bb
Gij 2.66 2.17 1.48
Pij 0.25 0.50 0.25
Booroola (cont)
Value Estimate
Mean genotypic value
2.120
Additive Effects B = 0.295
b = -0.295
Breeding Values ABB = 0.59; ABb = 0; Abb = 0.59
Dominance Deviations
DBB = -0.05; DBb = 0.05; Dbb = -0.05
Booroola (cont)
Value Estimate
Additive Genetic Variance
0.1740
Dominance Genetic Variance
Total Genetic Variance
0.1765