Upload
ngokhanh
View
213
Download
0
Embed Size (px)
Citation preview
1
Natural Selection andHaplotype Trees
Recall From the Neutral Theory:
• Rate of Neutral Evolution Over Long Periods ofTime (interspecific) = 1/2N×(2Νµ) = µ =(theneutral mutation rate)
• Amount of Intraspecific Polymorphism (asmeasured by expected heterozygosity) = θ/(1+θ) ≅ θ = 4Nµ
• Do not know µ, & µ can vary from locus to locusor even base-pair to base-pair, but the Ratio ofIntraspecific Polymorphism to InterspecificDivergence should be proportional to θ/µ = 4Nµ/µ = 4Ν for all loci and base pairs under neutrality
Maynard Smith (1970)Suggested This Ratio As a Test
of Neutrality• Problem Then: Interspecific Divergence Data
Came from Amino Acid Sequencing, andIntraspecific Polymorphism Data Came FromProtein Electrophoresis: The methods were non-comparable, so the Ratio was highly suspect fortechnical reasons
• This problem disappears with Haplotype Treesthat span the intra- and interspecific domains.
Inversion Treesfor Drosophila
AnticipatedThis Approach:
E.g., TheHawaiian
DrosophilaInversion Tree
2
Inversion EvolutionIn Hawaii
• Much Speciation Associated with Inter-IslandFounder Events
• Most Founders Are Probably Single GravidFemales Inseminated By Multiple Males
• If a Female Mated with n Males, Then The FounderEvents Have 2+n X-Chromosomes to 2+2nAutosomes.
• Templeton (1987) Used This to Test Ohta’s “NearlyNeutral” Theory That There Will Be EffectivelyNeutral Evolutionary Rates Whenever |s| <1/(4N).This implies that X–chromosomes should evolvemore rapidly than Autosomes for HawaiianDrosophila.
Can Test This Null Hypothesis With aContingency Chi-Square Test
N=a+b+c+d+e+fnIII=c+fnII=b+enI=a+dSum
nB=d+e+ffedPolymorphicWithin Species
nA=a+b+ccbaFixed BetweenSpecies
SumMutationType III
MutationType II
MutationType I
Under the null hypothesis, the expected number in any cell = (rowsum)(column sum)/N. For Example, the expected number in
Mutation Type I, Tree Position Fixed = nAnI/N. After calculatingthe expected numbers, perform an ordinary chi-square test with
degrees of freedom = (no. of rows-1)(no. of columns-1).
Tree Position:
Basic Idea of The Contingency Chi-Square Test
Choose Tree Positions That Should Be Sensitive Tothe Operation of Natural Selectiona) Fixed differences represent mutations that arose
within a species, increased in frequency untilfixation, and then have persisted in time in theentire species – this demonstrates a certaindegree of evolutionary success
b) Polymorphisms represent mtuations that arosewithin a species but have not gone to fixation –less demonstrated evolutionary success
Choose Mutational Types That Should Be Sensitiveto the Operation of Natural Selection
Inversion EvolutionIn Hawaii
X chrom Chrom 2 Chrom 3 Chrom 4 Chrom 5Fixed between species 59 11 14 32 11Polymorphic within 13 9 18 35 11
χ2 = 23.19, df=4
Control: The Mainland repleta Group of Drosophila
X chrom Chrom 2 Chrom 3 Chrom 4 Chrom 5Fixed between species 12 79 16 2 8Polymorphic within 7 86 15 5 5
χ2 =3.62, df=4
3
Haplotype Trees Allow SimpleTests Of Hypotheses About
Natural Selection Both Withinand Between Species
Redundancy of the Genetic Code
The McDonald-Kreitman (1989)Test For Protein Coding DNA
Example: Eanes et al [1993. PNAS 90:7475] surveyed Drosophila simulans & D.melanogaster) using DNA sequencing to find levels of polymorphism at the G6PDlocus.
Amino acidreplacement Synonymous
polymorphic 2 36fixed 21 26
Use Fisher’s Exact Test (too few samples for standard χ2 contingency tests) to rejecthomogeneity (neutrality) at p<0.001.
Uses Same Tree Positions as Earlier Inversion Work: polymorphic within speciesor fixed between them.Uses the mutational categories (for protein coding DNA) of nonsynonymous(amino acid replacement) and synonymous (silent). It is expected that replacementmutations should be more sensitive to selection than silent mutations.
Positions in the Haplotype Tree• Tip Branch: A branch leading to only one haplotype. (Recent
and with no known descendants, unproven evolutionarily orevolutionary dead-end.)
• Interior Branch: A branch leading from one internal node toanother. (Successful evolutionarily because an internal nodehas two or more descendant haplotypes and have persisted forlonger periods of time on average than tip mutations.)
• Interspecific Branch: A branch leading from a part of the treedefined by haplotypes from one species to a haplotype or cladeof haplotypes in another species or common ancestral species.(Successful evolutionarily because it has gone to fixationwithin a species and has persisted for long periods of time.)
• To Make These Categories Mutually Exclusive, Can SeparateTips and Interiors Only For Intraspecific Portions of theHaplotype Tree.
4
Evolution ofCytochrome Oxidase II
712TipC-term(activesite)
212Interior253Fixed
Probability = 0.004
Probability = 0.27
660Fixed
310Interior28TipN-term
(trans-mem-brane)
Replace-mentSilentTree
PositionRegion
Evolution ofCytochrome Oxidase II
712TipC-term(activesite)
212Interior253Fixed
Probability = 0.004
Probability = 0.27
660Fixed
310Interior28TipN-term
(trans-mem-brane)
Replace-mentSilentTree
PositionRegion
There is natural selectionagainst amino acid changes
in the C-terminal half.
Sometimes, small sample sizesrequire pooling.
Note, the McDonald-Kreitman testpools together tips and intraspecific
interiors versus fixed.This pooling does not make sense.
A better pooling is tips versusintraspecific interiors and fixed
(called “young” vs. “old”)
Nucleotide Polymorphism at the RpII215 Gene inDrosophila subobscura: Weak Selection on
Synonymous Mutations (Lloparta & Aguadé,Genetics 2000)
Synonymous codons are divided into preferred and unpreferred toReflect codon usage in this gene in the genus Drosophila.
5
Neutrality Implies That TheRatio of Fixed Mutations to
Polymorphic Mutations ShouldBe Constant Across All Loci
The HKA Test (Hudson,Kreitman & Aguade,
Genetics 1987) Tests ForHomogeneity Of This RatioAcross ≥ 2 Loci With a Chi-
Square Test Statistic
Can Examine Ratio of NonsynonymousSubstitutions Per Nonsynonymous Site(dN) to Synonymous Substitutions Per
Synonymous Site (dS) on Each Branch of aHaploytpe Tree to Examine Heterogeneity
in Natural Selection:• Under Neutrality, Expect dN/dS = 1
• Under Purifying Selection (favors maintaining thecurrent structure): Expect dN/dS < 1
• Under Directional Selection (favors change): ExpectdN/dS > 1
One Potential Problem With This Approach is theCalculation of the Total Number of
Nonsynonymous and Synonymous Substitutions(the denominators in dN and dS).
This requires a mutation model, and often thedefault is the Jukes-Cantor Model.
For example, these numbers could be stronglyaltered by a strong transition bias. It is essential to
fit the mutational model to the data first.DO NOT USE ANY PROGRAM THAT DOESNOT REQUIRE YOU TO FIT A MUTATION
MODEL!
E.g., Evolution of Abalone Sperm LysinLysin Is Involved in Fertilization and Gamete Recognition
ω = dN/dS
6
E.g., Evolution of Abalone Sperm Lysin
• M0, only one ω on all branches• M1, mixture of conserved sites (ω = 0) and
neutral sites (ω = 1)• M2, M1 plus some selected sites with ω > 1• M3, a mixture of three site classes with ω0, ω1,
and ω2• M7, ω’s are distributed as a Beta pdf• M8, M7 plus selected sites with ω > 1• Use Bayesian procedures to identify specific sites
with high probability of ω > 1
Yang et al. (Mol. Biol. Evol. 2000) Considered Several Models
E.g., Evolution of Abalone Sperm Lysin
E.g., Evolution of Abalone Sperm LysinE.g.,
Evolution ofAbalone
Sperm Lysin
Sites Under PositiveSelection Are
Involved in GameteRecognition of Egg
By the Sperm
7
E.g., Evolutionof the Pyrin
LocusThis locus codes forAn Activator of AnInactivating FactorInvolved in PeritonealFluid Balance (and Other Areas of theBody).
E.g., Evolutionof the Pyrin
LocusMost BranchesHave dN/dS < 1,
Implying That Mostof the Time
Selection Maintainsa Conserved Amino
Acid Sequence
E.g., Evolutionof the Pyrin
LocusMost BranchesHave dN/dS < 1,
Implying That Mostof the Time
Selection Maintainsa Conserved Amino
Acid SequenceBut Some Branches
Have dN/dS > 1,Implying Episodes
of SelectivelyDriven Change
Substitutions On the Human/ApeLineage Are Functionally Important.Many Mutations Are Known at ThisLocus That Lead to An AutosomalRecessive Disease Called Familial
Mediterranean Fever. Virtually All ofThese Mutations Are Reversals of the
Amino Acid Substitutions On theBranch Leading to Humans and Apes.
8
In General, Many Human Genetic DiseasesAre Caused By Mutations That Affect
Sites Subject to Directional Selection Or toPurfiying Selection.
Therefore, Evolutionary Analyses AreNow Standard As A Way Of IdentifyingThose Mutations Most Likely to Cause
Disease In Humans.These Studies Also Show That The Impactof Amino Acid Replacements Are Context
Dependent
These Tests Are Also Being Used to Estimate WhichLoci Are Selected in Specific Speciation Events. E.g.,
Evolution of the ASPM Locus In HumansLoss of function mutations in this gene cause severe
reductions in the cerebral cortical size of affected humans
dN/dS ratios in the primate ASPM haplotype tree:
E.g., Evolution of the ASPM Locus
dN/dS
9
Studies Are Now Underway To Identify Those GenesSubjected to Positive Selection In The Lineage LeadingTo Humans. Results As Of 2004 (Vallender & Lahn,
2004):
8Other1Skin Color3Brain Size & Anatomy3Neurotransmission4Dietary Adaptation7Sensory Systems14Reproduction23Host-Pathogen Interactions
No. of GenesFunction of Gene
Coalescent Tests of Neutrality
• Can Compare Observed Statistics toExpected Values Under NeutralCoalescence (Incorporates EvolutionaryStochasticity)
• Can Compare Haplotype Tree Topologiesto Expected Topologies Under NeutralCoalescence
Comparing Observed Statisticsto Expected Values
Expected neutral heterozygosity = θ/(1+θ) ≅ θ
!
"=
2# ij
j=2
n
$i=1
j
$
n(n %1)
The Observed expected heterozygosityGiven a sample of n DNA sequencesWith πij being the number of nucleotideDifferences between sequence i and j is:
!
" =S
1
ii=1
n#1
$
Under a neutral coalescent model and theInfinite sites model, the expected hetero-zygosity is also equal to:
where S is the observed number of segregating sitesin the sample of n sequences
Evolutionary Process(Neutral, Constant Sized
Population)
Current Population
Sample
Statistics
ΠΘ
10
Π versus Θ The Tajima (1989) Test:
• Var(Π−Θ) is determined under the infinitesites model
• If D > 0 significantly, this impliesbalancing selection
• If D < 0 significantly, this impliesdirectional selection
!
D ="#$
Var("#$)
Problems with The Tajima (1989) Test:
• Var(Π−Θ) is determined under the infinite sitesmodel, and it is not easy computationally to dealwith deviations from this model
• The null hypothesis is a neutral model in aconstant sized population. Demographic changesas well as selection can cause D ≠ 0
• Fay & Wu’s H test (Genetics 155:1405-1413,2000) is a variant of D that eliminatesconfoundment with demography, but still dependsupon the infinite sites model.
Coalescent Tests of Neutrality
• Can Compare Observed Statistics toExpected Values Under NeutralCoalescence (Incorporates EvolutionaryStochasticity)
• Can Compare Haplotype TreeTopologies to Expected TopologiesUnder Neutral Coalescence
11
Neutral GeneticDrift, Stable
Population Size
Neutral GeneticDrift,
ExpandingPopulation Size
NegativeSelection
Positive(Directional)Selection orBottleneck
Positive(Diversifying)Selection orSubdivision
Distribution of Recombinants In LPL
Recombination Hotspot
Haplotype Network in 5’ Region ofLPL
5'-1 5'-2 5'-823J
8J14J
44N
3
17
13 7 8
4
5 16 9 10 2
14
36J
12
6
15
188
49N
84R
175'-3
5'-4
17
5'-5
5'-6
5'-7
16
4
4
6
32J
9 10
Haplotype Network in 3’ Region ofLPL
3'-9
56N
53N
3'-4
64J 43J
59N
3'-12 19J
16J
30J
54N
41N 42N
40J
37J
59 44
50 38 54
50
64
42
65
46
61
58
65
66
63
40
53 6724J34J
4136 69
T-1
36J38 39 41 43 44 46 47 60 5764
T-4
69
3'-10
3'-11
45J
53
59
75R
53
36
55
58
61
38J
51
55
40
45
3756
3'-8
8J 14J
T-3
48
52
29J44
55
35J
41
T-2
49
68
12J
3'-6
36
46
58 55
49
81R
39J
49N
50N
78R
46N
503'7
62
4450
26J
61
32J
40
36
41 58 67
4920J
77R
67N 48N
42
41
62
59
3'-161N
60N
56
59 45 633'-2
9N44N
38 38
63
56 56
3'-351N63
633'-5
53 53
28J
4259 45
55
44
40 53
42
53
42
12
Neutral GeneticDrift, Stable
Population Size
Neutral GeneticDrift,
ExpandingPopulation Size
NegativeSelection
Positive(Directional)Selection orBottleneck
Positive(Diversifying)Selection orSubdivision
Page and Holmes 1998
Notice That The Ratio of The Branch LengthsBetween Current Haplotypes And Deep
Interior Nodes Is Quite Different In TheseVarious Topologies. This Is A Robust Result
Under Coalescent Theory.
Just As Ratios Are Frequently Used To Measure“Shape” In Morphology, Ratios Can Be Used ToMeasure The “Shape” Of An Evolutionary Tree
1. Calculate the nearest neighbor distance,dij, for each haplotype i within tree j
2. Calculate the average branch length intree j of all branches that are not part of anearest neighbor comparison (which haveto be branches between interior nodes), aj
3. Calculate the statistics dij/aj for all i and j
For Example, 5’ vs. 3’ LPL
5'-1 5'-2 5'-823J
8J14J
44N
3
17
13 7 8
4
5 16 9 10 2
14
36J
12
6
15
188
49N
84R
175'-3
5'-4
17
5'-5
5'-6
5'-7
16
4
4
6
32J
9 10
3'-9
56N
53N
3'-4
64J 43J
59N
3'-12 19J
16J
30J
54N
41N 42N
40J
37J
59 44
50 38 54
50
64
42
65
46
61
58
65
66
63
40
53 6724J34J
4136 69
T-1
36J38 39 41 43 44 46 47 60 5764
T-4
69
3'-10
3'-11
45J
53
59
75R
53
36
55
58
61
38J
51
55
40
45
3756
3'-8
8J 14J
T-3
48
52
29J44
55
35J
41
T-2
49
68
12J
3'-6
36
46
58 55
49
81R
39J
49N
50N
78R
46N
503'7
62
4450
26J
61
32J
40
36
41 58 67
4920J
77R
67N 48N
42
41
62
59
3'-161N
60N
56
59 45 633'-2
9N44N
38 38
63
56 56
3'-351N63
633'-5
53 53
28J
4259 45
55
44
40 53
42
53
42
vs.
5' Distribution
3' Distribution
0
10
20
30
40
dij / a j
0 0.5 1 1.5 2
exact two sided p value = 0.0000 based upon 10,000 random permutations
Neutral GeneticDrift, Stable
Population Size
Neutral GeneticDrift,
ExpandingPopulation Size
NegativeSelection
Positive(Directional)Selection orBottleneck
Positive(Diversifying)Selection orSubdivision
Page and Holmes 1998
Shape Is Still Confounded With Demography,But Selection Is Often Locus Specific, So A
Test For Deviation From Neutrality Is To TestFor Shape Homogeneity Among Two Or MoreTrees. This Is Done With A Non-Parametric
Exact Test
13
The Tree Topologies Indicate Either ASelective Sweep in the 5’ End and/or
Balancing Selection in the 3’ EndDNA region
contrasted to 3’LPL
Kolmogorov-Smirnov
p-Value
Permutation Test
p-Value
PDHA1 0.0000 0.0074
Xq13.3 0.0000 0.0000
!-Globin 0.0000 0.6343
MX1 0.0000 0.0000
MC1R 0.0002 0.0313
EDN 0.0000 0.0000
ECP 0.0000 0.0000
MS205 0.0000 0.0000
Homogeneity of the 5’ LPL With These Other Genes Cannot Be Rejected
These Results Indicate That 3’ LPL Is An Extreme Outlier. Therefore,Had Balancing Selection At the 3’ End.
The Tree Topologies Indicate Either ASelective Sweep in the 5’ End and/or
Balancing Selection in the 3’ EndDNA region
contrasted to 3’LPL
Kolmogorov-Smirnov
p-Value
Permutation Test
p-Value
PDHA1 0.0000 0.0074
Xq13.3 0.0000 0.0000
!-Globin 0.0000 0.6343
MX1 0.0000 0.0000
MC1R 0.0002 0.0313
EDN 0.0000 0.0000
ECP 0.0000 0.0000
MS205 0.0000 0.0000
Homogeneity of the 5’ LPL With These Other Genes Cannot Be Rejected
These Results Do Not Indicate A Selective Sweep At The 5’ End, ButWe Can Test That Hypothesis With Another Non-Parametric Test
Using Recombination
Haplotype Network in 3’ Region ofLPL
3'-9
56N
53N
3'-4
64J 43J
59N
3'-12 19J
16J
30J
54N
41N 42N
40J
37J
59 44
50 38 54
50
64
42
65
46
61
58
65
66
63
40
53 6724J34J
4136 69
T-1
36J38 39 41 43 44 46 47 60 5764
T-4
69
3'-10
3'-11
45J
53
59
75R
53
36
55
58
61
38J
51
55
40
45
3756
3'-8
8J 14J
T-3
48
52
29J44
55
35J
41
T-2
49
68
12J
3'-6
36
46
58 55
49
81R
39J
49N
50N
78R
46N
503'7
62
4450
26J
61
32J
40
36
41 58 67
4920J
77R
67N 48N
42
41
62
59
3'-161N
60N
56
59 45 633'-2
9N44N
38 38
63
56 56
3'-351N63
633'-5
53 53
28J
4259 45
55
44
40 53
42
53
42
12 Recombination Events OccurredBetween T-1 Haplotypes With T–2,
3, or 4 Haplotypes
In All 12 Cases, the 5’ End Was Of The T-1Type. Under Neutrality, This Has A Probability
of (1/2)12 = 0.002.
Therefore, the 5’ End Experienced A SelectiveSweep Enhanced By Recombination
14
Peeled Haplotype Network in of LPL(the portion of the combined 5’/3’ tree that never
experienced recombination)
52N
1JNR 2JNR
56N
61N
9N
53N
60N
4JN
79R
64J
70R
43J
27J
11J
28J
59N
23J
31J 19J
16J
10J
30J
40J
37J
56 59 45
29
63
38
30
50
3050
64
26
42
26
42
56
29 31 33 565953 13
65
46
29
7 8
26
53
25
65 19 61
58
65
66
8
29
8
8 25
63
40
53
67
34J26
4
63
5
16
66
41
29 36 69
17J
9
10
19
2 17
T-1
36J
5
12
16
31
35
38
39
41
43
44
46
47
60
57
64
T-4
5155
T-3
T-240
454968
32374852
29J
19
44
MaximumParsimony (andother techniques)allow you toinfer the state ofextinct ancestralstates.
Ancestral ReconstructionRevealed That Only 8Amino Acid Changes
Are Responsible For TheDiversified Functions ofThis Important Enzyme
Family.
Evolution of Isocitrate Dehydrogenases
TheseEvolutionaryEvents Took
Place 3.5 BillionYears Ago!
(Zhu, Golding & Dean, Sci. 307: 1279-1282, 2005)
Uses NADPUses NAD
15
Why Did NADP Use EvolveIndependently Several Times?
Evolution of Isocitrate Dehydrogenases
Ancestral reconstructionshows that this change inphenotype was due to 5
specific amino acidsubstitutions. Engineered
these changes in E. coliNADP Idh to make it NADIdh, and then competed iso-genic strains in chemostats
(Zhu, Golding & Dean, Sci. 307: 1279-1282, 2005)
Uses NADPUses NAD
NAD Idh is much less fit than NADP Idh on acetate, yetmore fit on glucose as an energy source.
Evolution of Isocitrate Dehydrogenases
Ancestral reconstructionshows that
A switch to NADP occurredmultiple times5 specific amino acidsubstitutions are theproximal cause for thisswitchAdaptations to acetate (&similar energy-poorresources) were the ultimatecauses for these switches
(Zhu, Golding & Dean, Sci. 307: 1279-1282, 2005)