38
PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics Hardy Weinberg Equilibrium Linkage Disequilibrium

PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Embed Size (px)

Citation preview

Page 1: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

PBG 650 Advanced Plant Breeding

Module 1: • Introduction• Population Genetics

– Hardy Weinberg Equilibrium– Linkage Disequilibrium

Page 2: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

“The science, art, and business of improving plants for human benefit”Considerations:– Crop(s)– Production practices– End-use(s)– Target environments– Type of cultivar(s)– Traits to improve– Breeding methods– Source germplasm– Time frame– Varietal release and intellectual property rights

Plant Breeding

Bernardo, Chapter 1

Page 3: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Plant Breeding

A common mistake that breeders make is to improve

productivity without sufficient regard for other

characteristics that are important to producers,

processors and consumers.

Well-defined ObjectivesGood ParentsGenetic VariationGood Breeding MethodsFunctional Seed System

Þ Adoption of Cultivars by Farmers

Page 4: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Quantitative Traits

• Continuum of phenotypes (metric traits)• Often many genes with small effects• Environmental influence is greater than for

qualitative traits• Specific genes and their mode of inheritance

may be unknown• Analysis of quantitative traits

– population parameters• means• variances

– molecular markers linked to QTL

Page 5: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Populations

• In the genetic sense, a population is a breeding group– individuals with different genetic constitutions– sharing time and space

• In animals, mating occurs between individuals– ‘Mendelian population’– genes are transmitted from one generation to the next

• In plants, there are additional ways for a population to survive– self-fertilization– vegetative propagation

• Definition of ‘population’ may be slightly broader for plants– e.g., lines from a germplasm collection

Falconer, Chapt. 1; Lynch and Walsh, Chapt. 4

Page 6: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Study genes in populations– Frequency and interaction of alleles

– Mating patterns, genotype frequencies

– Gene flow

– Selection and adaptation vs random genetic drift

– Genetic diversity and relationship

– Population structure

Related Fields– Evolutionary Biology – e.g., crop domestication

– Landscape Genetics

What do population geneticists do?

Page 7: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Gene and genotype frequencies

Alleles Genotypes

A1 A2 A1A1 A1A2 A2A2

Frequencies p q P11 P12 P22

# Individuals 80 120 16 48 36

Proportions 0.4 0.6 0.16 0.48 0.36

For a population of diploid organisms:

402401602

1...P P pp 12111

602403602

1...P P q p 12222

p + q = 1

P11 + P12 + P22 = 1

Bernardo, Chapter 2

Page 8: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Gene frequencies (another way)

Alleles Genotypes

A1 A2 A1A1 A1A2 A2A2

Frequencies p q P11 P12 P22

# Individuals 80 120 16 48 36

Proportions 0.4 0.6 0.16 0.48 0.36

402004816*222

21

121112111 .NNNNNNp p

Number of individuals = N = N11+ N12+ N22 = 100

Number of alleles = 2N = N1 + N2 = 200

602004836*222

21

122212222 .NNNNNNq p

Page 9: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Allele frequencies in crosses

Inbred x inbredAlleles are unknown, but allele frequencies at

segregating loci are known

F1 and F2: p = q = 0.5

p q

BC1 0.75 0.25

BC2 0.875 0.125

BC3 0.9375 0.0625

BC4 0.96875 0.03125

Value of q is reduced by ½

in each backcross

generation

Page 10: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Factors that may change gene frequencies

• Population size– changes may occur due to sampling

assume ‘large’ population

• Differences in fertility and viability– parents may differ in fertility– gametes may differ in viability– progeny may differ in survival rate

assume no selection

• Migration and mutationassume no migration and no mutation

Page 11: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Factors that may change genotype frequencies

Changes in genotype frequency (not gene frequency)

• Mating system– assortative or disassortative mating

– selfing

– geographic isolation

assume that mating occurs at random (panmixia)

Page 12: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Hardy-Weinberg Equilibrium

• Assumptions– large, random-mating population– no selection, mutation, migration– normal segregation– equal gene frequencies in males and females– no overlap of generations (no age structure)

• Note that assumptions only need to be true for the locus in question

Gene and genotype frequencies remain constant from one generation to the next

Genotype frequencies in progeny can be predicted from gene frequencies of the parents

Equilibrium attained after one generation of random mating

Page 13: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Hardy-Weinberg Equilibrium

Genes in parents Genotypes in progeny

A1 A2 A1A1 A1A2 A2A2

Frequencies p q P11 = p2 P12 = 2pq P22 = q2

Example 0.4 0.6 0.16 0.48 0.36

Expected genotype frequencies are obtained by

expanding the binomial

(p + q)2

= p2

+ 2pq + q2

= 1

A1 A2

A1

A2

p2

=.16 pq=.24 p = 0.4

q = 0.6q2

=.36pq=.24

Page 14: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Equilibrium with multiple alleles

For multiple alleles, expected genotype frequencies can be found by expanding the

multinomial (p1 + p2 + ….+ pn)2

For example, for three alleles:

2 2 2 21 2 3 1 1 2 1 3 2 2 3 32 2 2p p p p p p p p p p p p

Lynch and Walsh (pg 57) describe equilibrium for autopolyploids

Corresponding genotypes:

A1A1 A1A2 A1A3 A2A2 A2A3 A3A3

Page 15: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Relationship between gene and genotype frequencies

• f(A1A2) has a maximum of 0.5, which occurs when p=q=0.5

• Most rare alleles occur in heterozygotes

• Implications for– F1?

– F2?– Any BC?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Frequency of A2

Gen

oty

pe

freq

uen

cy A2A2A1A1

A1A2

Page 16: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Applications of the Hardy-Weinberg Law

• Predict genotype frequencies in random-mating populations

• Use frequency of recessive genotypes to estimate the frequency of a recessive allele in a population– Example: assume that the incidence of individuals

homozygous for a recessive allele is about 1/11,000.

q2 = 1/11,000 q 0.0095

• Estimate frequency of individuals that are carriers for a recessive allele

p = 1 - 0.0095 = 0.9905 2pq = 0.0188 2%

Page 17: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Testing for Hardy-Weinberg Equilibrium

All genotypes must be distinguishable

Genotypes Gene frequencies

A1A1 A1A2 A2A2 A1 A2

Observed 233 385 129 0.5696 0.4304

Expected 242.36 366.26 138.38

5696.0747/)385233(50ˆ

2

1

N

N*.Np 1211

1

36.242747*5696.0ˆ)( 2 N*pNE2

111

N = N11+ N12+ N22= 233 + 385 + 129 = 747

Page 18: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Chi-square test for Hardy-Weinberg Equilibrium

• Accept H0: no reason to think that assumptions for Hardy-Weinberg equilibrium have been violated– does not tell you anything about the fertility of the parents

• When you reject H0, there is an indication that one or more of the assumptions is not valid– does not tell you which assumption is not valid

Example in Excel 96.1

Exp

Exp-Obs 22 χ

84.321df critical χ

only 1 df because gene frequencies are

estimated from the progeny data

Page 19: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Exact Test for Hardy-Weinberg Equilibrium

• Chi-square is only appropriate for large sample sizes

• If sample sizes are small or some alleles are rare, Fisher’s Exact test is a better alternative

– Calculate the probability of all possible arrays of genotypes for the observed numbers of alleles

– Rank outcomes in order of increasing probability

– Reject those that constitute a cumulative probability of <5%

)!2(!!!

2!!!),Pr(

NNNN

nnNnn,N,NN

aaAaAA

NaA

aAaaAaAA

Aa

Example in Excel

Weir (1996) Chapt. 3

Page 20: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Likelihood Ratio Test

zL

zL r

Maximum of the likelihood function given the data (z) when some parameters are assigned

hypothesized values

Maximum of the likelihood function given the data (z) when there are no restrictions

When the hypothesis is true:

zLzLLR r

2ln2

2

df=#parameters assigned values

Likelihood ratio tests for multinomial proportions are often called G-tests (for goodness of fit)

Lynch and Walsh Appendix 4

Page 21: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Likelihood Ratio Test for HWE

ij

ijn

i

n

ijij N

NNG

ˆln2

1

where is the expected number

and is the observed number of the ijth

genotype

ijN̂

ijN

Calculations in Excel

Page 22: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Gametic phase equilibrium

Lynch and Walsh, pg 94-100; Falconer, pg 15-19

A

a

B b

PAB PAb

PaB Pab

pA

pa

pB pb

Random association of alleles at different loci

(independence)

PAB=pApB

Disequilibrium

DAB = PAB – pA pB

DAB = PAB Pab – PAb PaB

DAB = 0.40 – 0.5*0.5 = 0.15

DAB = 0.4*0.4 – 0.1*0.1 = 0.15

B b

A

a

.40 .10

.10 .40

.50

.50

.50 .50

Page 23: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Linkage Disequilibrium

• Nonrandom association of alleles at different loci– the covariance in frequencies of alleles between the loci

• Refers to frequencies of alleles in gametes (haplotypes)

• May be due to various causes in addition to linkage

– ‘gametic phase disequilibrium’ is a more accurate term

– ‘linkage disequilibrium’ (LD) is widely used to describe associations of alleles in the same or in different linkage groups

Page 24: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Linkage Disequilibrium

Gametic types AB Ab aB ab

Observed PAB PAb PaB Pab

Expected pA pB pA pb pa pB pa pb

Disequilibrium +D -D -D +D

Excess of coupling phase gametes +D

Excess of repulsion phase gametes -D

Page 25: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Sources of linkage disequilibrium

• Linkage• Multilocus selection (particularly with epistasis)• Assortative mating• Random drift in small populations• Bottlenecks in population size• Migration or admixtures of different populations• Founder effects• Mutation

Page 26: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Two locus equilibrium

• For two loci, it may take many generations to reach equilibrium even when there is independent assortment and all other conditions for equilibrium are met– New gamete types can only be produced when the parent

is a double heterozygote

A

A

B

b

0.5 AB

0.5 Ab

A

a

B

b

0.25 AB 0.25 aB

0.25 Ab 0.25 ab

Page 27: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Decay of linkage disequilibrium

• In the absence of linkage, LD decays by one-half with each generation of random mating

c = recombination frequency

Generation

0 10 20 30 40 50 60 70 80 90 100

Dis

equi

libriu

m (

D)

0.00

0.05

0.10

0.15

0.20

0.25c=.50 c=.20 c=.10 c=.01

tt DcD )1(1

0)1( DcD tt

Page 28: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Factors that delay approach to equilibrium

• Linkage

• Selfing – because it decreases the frequency of double heterozygotes

• Small population size – because it reduces the likelihood of obtaining rare recombinants

0)1( DcD tt

Page 29: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Implications for breeding

• Gametic Phase Disequilibrium that is not due to linkage is eliminated by making the F1 cross

• Recombination occurs during selfing• There would be greater recombination with additional random mating,

but it may not be worth the time and resources

P1 P2

A1A1B1B1 x A2A2B2B2

F1 A1A2B1B2

gamete frequency

A1B1 0.5*(1-c)

A1B2 0.5*c

A2B1 0.5*c

A2B2 0.5*(1-c)

0.00

0.05

0.10

0.15

0.20

0.25

0 0.1 0.2 0.3 0.4 0.5

Fre

qu

ency

of A

1A1B

2B2

c = recombination frequency

Effect of inbreeding on the frequency of a recombinant genotype

Inbreds F2 F2 (adjusted)

Page 30: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Effect of mating system on LD decay

c = effective recombination rates = the fraction of selfing s2

s1c

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 4 7

10

13

16

19

22

25

28

31

34

37

40

Generation

D'

0.05 0.00

0.05 0.99

0.25 0.00

0.25 0.99

0.50 0.00

0.50 0.99 no linkage

99% selfing

outcrossing

Page 31: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Alternative measures of LD

• D is the covariance between alleles at different loci• Maximum values of D depend on allele frequencies• It is convenient to consider r2 to be the square of the

correlation coefficient, but it can only obtain a value of 1 when allele frequences at the two loci are the same

• r2 indicates the degree of association between alleles at different loci due to various causes (linkage, mutation, migration)

bBaA pppp

D2AB2r

Page 32: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

D – minimum and maximum values

B b

A PAB = pApB + D PAb = pApb - D pA

a PaB = papB - D Pab = papb + D pa

pB pb

If D>0 Look for the maximum value D can have

PAb = pApb - D 0 D pApb

PaB = papB - D 0 D papB

D min(pApb, papB)

If D<0 Look for the minimum value D can have

PAB = pApB + D 0 D -pApB

Pab = papb + D 0 D -papb

D max(-pApB, -papb)

fyi

Page 33: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Alternative measures of LD

• D’ is scaled to have a minimum of 0 and a maximum of 1

• D’ indicates the degree to which gametes exhibit the maximum potential disequilbrium for a given array of allele frequencies

• D’=1 indicates that one of the haplotypes is missing

• D’ is very unstable for small sample sizes, so r2 is more widely utilized to measure LD

),min(' AB

BabA pppp

DD

),min(*)1(' AB

baBA pppp

DD

When DAB > 0

When DAB < 0

fyi

Page 34: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Testing for gametic phase disequilibrium

• Best when you can determine haplotypes

– inbred lines or doubled haploids

– haplotypes of double heterozygotes inferred from progeny tests

• Use a Goodness of Fit test if the sample size is large

– Chi-square

– G-test (likelihood ratio)

• Use Fisher’s exact test for smaller sample sizes

• Use a permutation test for multiple alleles

• Need a fairly large sample to have reasonable power for LD (~200 individuals or more)

See Weir (1996) pg 112-133 for more information

Page 35: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Depiction of Linkage Disequilibrium

Flint-Garcia et al., 2003. Annual Review of Plant Biology 54: 357-374.

Disequilibrium matrix for polymorphic sites within sh1 in maize

r2

Prob value

Fisher’s Exact Test

Page 36: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Extent of LD in Maize

Linkage disequillibrium across the 10 maize chromosomes measured with 914 SNPs in a global collection of 632 maize

inbred lines.

Yan et al. 2009. PLoS ONE 4(12): e8451

r2

Average LD decay distance is 5–10 kb

Page 37: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

Extent of LD in Barley

Average LD decay distance is ~5 cM

Waugh et al., 2009, Current Opinion in Plant Biology 12:218-222

r2

No adjustment for population structure

Adjusted for population structure

Other studies

Wild barley – LD decays within a gene

Landraces ~ 90 kb

European germplasm - significant LD:

mean 3.9 cM, median 1.16 cM

Elite North American Barley

Page 38: PBG 650 Advanced Plant Breeding Module 1: Introduction Population Genetics – Hardy Weinberg Equilibrium – Linkage Disequilibrium

References on linkage disequilibrium

Flint-Garcia et al., 2003. Structure of linkage disequilibrium in plants. Annual Review of Plant Biology 54: 357–374.

Gupta et al., 2005. Linkage disequilibrium and association studies in higher plants: present status and future prospects. Plant Molecular Biology 57: 461–485.

Mangin et al., 2012. Novel measures of linkage disequilibrium that correct the bias due to population structure and relatedness. Heredity 108: 285–291.

Slatkin, M. 2008. Linkage disequilibrium – understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics 9: 477–485.

Waugh, R., Jean-Luc Jannink, G.J. Muehlbauer, L. Ramsay. 2009. The emergence of whole genome association scans in barley. Current Opinion in Plant Biology 12(2): 218–222.

Yan, J., T. Shah, M.L Warburton, E.S. Buckler, M.D. McMullen, et al. 2009. Genetic characterization and linkage disequilibrium estimation of a global maize collection using SNP Markers. PLoS ONE 4(12): e8451.

Zhu et al., 2008. Status and prospects of association mapping in plants. The Plant Genome 1: 5–20.