38
Lecture 5: Segregation Analysis I Date: 9/10/02 Counting number of genotypes, mating types Segregation analysis: dominant, codominant, estimating segregation ratio Testing populations: polymorphism, heterogeneity,

Lecture 5: Segregation Analysis I Date: 9/10/02 Counting number of genotypes, mating types Segregation analysis: dominant, codominant, estimating segregation

Embed Size (px)

Citation preview

Page 1: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Lecture 5: Segregation Analysis I

Date: 9/10/02

Counting number of genotypes, mating types Segregation analysis: dominant, codominant,

estimating segregation ratio Testing populations: polymorphism,

heterogeneity, heterozygosity, allele frequency.

Page 2: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Probability: The Need for Permutations and Combinations

Often, particularly in genetics, the sample space consists of all orders or arrangements of groups of objects (usually genes or alleles in genetics).

Permutations, combinations, and combinations with repetition exist to handle this elegantly.

Page 3: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Probability: Permutation

Definition: A permutation is the number of ways one can order r elements out of n elements. It is often written nPr and is calculated as

Example: How many different types of heterozygotes exist when there are l alleles and we distinguish order (e.g. paternal vs. maternal)?

!!

rn

nprn

Page 4: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Probability: Combination

Definition: A combination is the number of ways you can select r objects from n objects without regard to order. It is written as nCr and has value

Example: How many different heterozygotes exist without regard to order when there are l types of alleles?

!!

!

rnr

n

r

nCrn

Page 5: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Probability: Combination with Repetition

Definition: Suppose there are n different types of elements and r are selected with replacement, then the number of combinations is given by C’(n, r) =

n+r-1Cr. Examples:

How many genotypes are possible when there are l alleles?

How many mating types are possible when there are l alleles?

Page 6: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Review: Segregation Ratio

Recall that the law of segregation states that one of the two alleles of a parent is randomly selected to pass on to the offspring.

Definition: The segregation ratios are the predictable proportions of genotypes and phenotypes in the offspring of particular parental crosses. e.g. 1 AA : 2 AB : 1 BB following a cross of AB X AB.

Page 7: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Segregation Ratio Distorition

Definition: Segregation ratio distortion is a departure from expected segregation ratios. The purpose of segregation analysis is to detect significant segregation ratio distortion. A significant departure would suggest one of our our assumptions about the model wrong.

Page 8: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Genetic model for a single locus gene: dominant, codominant, truly single locus

Other genetic information: selection-free, completely penetrant.

Data quality: systematic error, non-random sampling.

Few important genes are single-locus. Often single locus analysis is used to verify marker systems.

Segregation Analysis: What it Teaches Us

Page 9: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Segregation Analysis: Experimental Design

Run a controlled cross with known expected segregation ratios. OR

Sample offspring of particular mating type with known expected segregation ratios.

Verify segregation ratios.

Page 10: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Autosomal Dominant

Mating Type

Genotype Phenotype

DD Dd dd Dominant Recessive

DDxDD 1 0 0 1 0

DDxDd 0.5 0.5 0 1 0

DDxdd 0 1 0 1 0

DdxDd 0.25 0.5 0.25 0.75 0.25

Ddxdd 0 0.5 0.5 0.5 0.5

ddxdd 0 0 1 0 1

A

B

C

Page 11: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Autosomal Dominant: The Data and Hypothesis

Obtain a random sample of matings between affected (Dd) and unaffected (dd) individuals.

Sample n of their offspring and find that r are affected with the disease (i.e. Dd).

H0: proportion of affected offspring is 0.5

Page 12: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Autosomal Dominant: Binomial Test

H0: p = 0.5

If r n/2 p-value = 2P(X r)

If r > n/2 p-value = 2P(X n-r)

P(X c) =

observe 29

p-value = 0.32

c

x

n

x

n

0 2

1

Page 13: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Autosomal Dominant: Standard Normal Test

= np 2 = np(1-p)

Under H0, X ~ N(n/2,n/4)

pnpnpNpnp

npXZ

1,~1 2/1

13.1

4/

2/2/1

n

nrz

observe 29

p-value = 0.26

Page 14: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Autosomal Dominant: Pearson Chi-Square Test

The distribution of the sum of k squares of iid standard normal variables is defined as a chi-square distribution with k degree of freedom.

21

22 ~

1

pnp

npXZ

pn

pnXn

np

npXZ

1

1 222

28.1

4/

2/ 22

n

nrz

p-value = 0.26

Page 15: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Continuity Correction

Both the normal and chi-square are continuous distributions, but our data is not.

Continuity correction for Normal: r = 28.5

corrected p-value = 0.32 Continuity correction for Chi-Square:

r = 28.5; n-r = 21.5

corrected p-value = 0.32

Page 16: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Autosomal Dominant: Likelihood Ratio Test

Write likelihood: Calculate the MLE under HA:

Calculate the G statistic:

Determine G distribution: Calculate p-value = 0.26

rnr ppr

npL

1

n

rp ˆ

5.0log

5.0log2

log2loglog21

0

rnrn

rr

e

ooLLG

c

i i

iiA

21~ G

Page 17: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Estimating Segregation Ratio: MOM

first moment = np sample moment = r MOM: np = r MOM estimate:

n

rp

Page 18: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Estimating Segregation Ratio: Likelihood Method

Set score to 0:

Solve for mle:

0ˆ1ˆ

p

rn

p

r

n

rp ˆ

Page 19: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Estimating Confidence Interval for Segregation Ratio

Our estimate is X/n, where X is the random variable representing the number of “successes” observed and n is the sample size.

E(X/n) = E(X)/n = np/n = p Var(X/n) = Var(X)/n2 = np(1-p)/n2 = p(1-p)/n SE(X/n) = Therefore, X/n is unbiased and we can obtain a

confidence interval using a normal approximation with SE(X/n).

2/1/ˆ1ˆ npp

Page 20: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Estimating Confidence Interval for Segregation Ratio

58.050

29ˆ p

0698.050

5021

5029

ˆ

2/1

pSE

717.0,443.096.1ˆ,96.1ˆ SEpSEp

Page 21: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Segregation Analysis: Codominant Loci I

Mating Type Genotype

DD Dd dd

DDxDD 1 0 0

DDxDd 0.5 0.5 0

DDxdd 0 1 0

DdxDd 0.25 0.5 0.25

Ddxdd 0 0.5 0.5

ddxdd 0 0 1

Page 22: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Segregation Analysis: Codominant Loci II

All 6 mating types are identifiable. Each mating type can be tested for agreement with

expected segregation ratios. Some mating types result in 3 types of offspring.

Must use Chi-Square or likelihood ratio test.

Page 23: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Multiple Populations: Testing for Heterogeneity

Suppose you observe segregation ratios in samples of size n in m populations.

Calculate a total chi-square:

Calculate a pooled chi-square:

m

i

n

j ij

ijij

e

eo

1 1

2

2total

n

jm

iij

m

iij

m

iij

e

eo

1

1

2

112pooled

Page 24: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Multiple Populations: Testing for Heterogeneity

Then, 2

)1(2pooled

2total ~ mn

Page 25: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Multiple Populations: Testing for Heterogeneity

Alternatively, one may calculate G statistics. Then, Gtotal –Gpooled is also distributed as

2)1( mn

m

i

n

j ij

ijij e

ooG

1 1total log2

n

jm

iij

m

iijm

iij

e

ooG

1

1

1

1pooled log2

Page 26: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Multiple Populations: Example

In Mendel’s F2 cross of smooth and wrinkled inbred pea lines, he sampled 10 plants and counted the number of smooth and wrinkled peas produced by each of those plants.

Is there heterogeneity between plants? Further tests show that

single gene controls smooth vs. wrinkledsmooth is dominant to wrinkled

Page 27: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Screening Markers for Polymorphism

An important step in designing mapping studies is to find markers that show polymorphism. We are interested in tests for polymorphism.

A false negative would result if the marker was truly polymorphic, but our test showed it to be monomorphic.

A false positive would result if the marker was truly monomorphic, but our test showed it to be polymorphic.

Page 28: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Testing for Polymorphism: Backcross 1:1

You design a backcross experiment to test for polymorphism at a marker of interest. You sample n offspring of the backcross.

P(monomorphic) = 2(0.5)n

Page 29: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Testing for Polymorphism: F2 codominant 1:2:1

You design a F2 cross with a marker that is codominant. You sample n F2 individuals.

P(monomorphic) = 2(0.25)n + (0.5)n

Page 30: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Testing for Polymorphism: F2 dominant marker

You design an F2 cross, but this time observe a dominant marker. You sample n F2 individuals.

P(monomorphic) = (0.75)n + (0.25)n

Page 31: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Power of Test for Polymorphism

Power to Detect Polymorphism

0

0.2

0.4

0.6

0.8

1

1.2

1 3 5 7 9 11 13 15 17 19

Sample Size

Po

wer

1:1

1:2:1

3:1

Page 32: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Estimating Heterozygosity

l

iipH

1

21

l

iip

n

nH

1

2ˆ11

ˆ

2

1

2

1

321

ˆVarl

ii

l

ii pp

n

nH

Page 33: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Estimating Allele Frequency

It is often assumed that alleles have equal frequencies when there are many alleles at a locus. This assumption can result in false positives for linkage, so it is important to test allele frequencies.

Suppose there are l possible alleles A1, A2, …. You observe nij genotypes AiAj.

You estimate genotypes frequencies ijp̂

Page 34: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Estimating Allele Frequencies

l

ijijiii ppp ˆ

2

1ˆˆ

HWEunder 2

1

12

1ˆVar 2

n

pp

ppppn

p

ii

iiiiii

HWEunder

2

1

44

1ˆ,ˆCov

ji

jiijji

ppn

pppn

pp

Page 35: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Probability of Observing an Allele

Suppose there is an allele Ai with frequency pi. What is the probability of sampling at least one allele of type Ai? n

ii pA 211 allele oneleast at observingP

i

i

pn

1log2

1log samplesizecalculation

Page 36: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Probability of Observing Multiple Alleles

Let i be the probability of observing at least one allele of type i.

There are ways of selecting m different alleles and an associated probability (jm) of detecting at least one of each calculated from the i.

Then we can calculate the probability of observing k or more alleles by summing over these probabilities for k, k+1, …, l.

m

ljm

Page 37: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Approximate Probability of Observing k or More Alleles

The above procedure becomes computationally difficult when there are many alleles and the frequencies are unequal.

There is a Monte Carlo approximation. Select a random variable Ii to be 1 with probability i and 0 otherwise.

Compute for b bootstrap trials. The proportion of trials with Ik is an estimate of the probability of observing k or more alleles.

l

iiII

1

Page 38: Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation

Summary

Permutation and combinations: knowing how to count number of genotypes, mating types, etc.

Testing segregation ratios for dominant and codominant loci.

Testing for population heterogeneity. Screening for polymorphism. Estimating heterozygosity, probability of observing

and allele.