Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008

Preview:

Citation preview

Human Evolution:

Searching for Selection

Andrew Shah

Algorithms in Biology

374 Spring 2008

Overview Given a DNA sequences how do we know when natural selection has occurred?

Different methods of answering this question

How does having the entire genome available change this?

Natural Selection

Introduction

Natural Selection

Introduction

Natural Selection

Introduction

Natural Selection

What sort of artifacts would this leave within the genome?

Introduction

Natural Selection

Introduction

The frequency of the long gene increases from one generation to the next.

It eventually reaches 100%, or fixation.

Natural SelectionGene Perspective

Introduction

Same process at the gene level

Let the yellow dot represent the advantageous allele

It begins at a small frequency (.125 in this case)

Natural SelectionGene Perspective

Introduction

During selection The allele has risen in frequency!

Because of linkage, the nearby alleles have also risen in frequency

Natural SelectionGene Perspective

Introduction

The allele has reached fixation!

As time goes on the nearby genes will slowly begin to reach fixation as well

Diversity has been lost

Natural SelectionGene Perspective

Introduction

Effect of Selection on the Genome Next Challenge: How did this effect differ from non-selection?

Neutral Theory (N.T.) Problem: Need to distinguish natural selection

Therefore: Need a null hypothesis

Solution: Create model that approximates neutral evolution

Introduction

Kimura, 1960s

N.T. & Genetic Drift Most variation is neutral with respect to selection

Therefore most changes in frequency are due to genetic drift

Introduction

N.T. & Genetic Drift A neutral gene has an equal probability of increasing or decreasing in frequency in the next generation

Introduction

N.T. & Mutation New alleles are introduced a constant rate (at a particular point)

To think about: How will this help us search for selection?

Introduction

N.T. & Mutation

Introduction

N.T. & Mutation

Introduction

N.T. & Mutation

Introduction

N.T. & Recombination Recombination occurs at a near-constant rate at a given position

Introduction

Testing the N. T. How would natural selection differ from these assumptions?

Introduction

“Positive Natural Selection in the Human Lineage”

P. C. Sabeti, S. F. Schaffner, B. Fry, J. Lohmueller, P. Varilly,

Shamovsky, A. Palma, T. S. Mikkelsen, D. Altshuler, E. S.

Lander

Testing for Selection

Sabeti et al.

Review of current state of genomic selection

Five statistical tests which use divergence from neutral theory to test for selection

Ideas? Functional Alteration, Decreased Diversity, High Derived Alleles, Population Differences,

Long Haplotypes

Sabeti et al.

I. Functional Alteration Get a section of genome, and compare synonymous vs. non-synonymous mutations between two species

Definition of synonymous mutation

I. Functional Alteration

Sabeti et al.

Silent/ SynonymousNon-Synonymous

I. Functional Alteration

Sabeti et al.

Long time scale, because it is an interspecies metricLimited value--only finds ongoing or recurrent selectionUse a Ka/Ks statistical test, or McDonald-Kreitman

II. Decreased Diversity

Sabeti et al.

Way of detecting a selective sweep Requires you know ancestral gene, derived genes

A derived gene is one that is a descendent of the ancestral one-it can be inferred using comparison to others species

II. Decreased Diversity

Sabeti et al.

The two small bars represent mutations. They are derived genes of the blue ancestor gene.

II. Decreased Diversity

Sabeti et al.

After the selective sweep the frequency of the derived alleles has jumped vis-a-vis the ancestral gene

II. Decreased Diversity

Sabeti et al.

A real example: derived alleles in red

II. Decreased Diversity

Sabeti et al.

Key idea: need to have ancestral genes present

The genes must not have reached fixation!

The pattern will be that of normal diversity of alleles but with skewed distribution of variation

Statistical Tests: Tajima’s D, Fu and Li’s D*

III. New Alleles(AKA High Frequency of Derived Alleles) Another technique for detecting selective sweep

Gene ‘hitch-hiking’ Limited diversity because of fixation

Key idea: low frequency of new genes, but high diversity of rare alleles

Sabeti et al.

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Gene has reached fixation

Low diversity in this region compared to other regions

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Next mutations slowly increase the diversity

Because they are all new the frequency remains low

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

As more time progresses, any pre-selective sweep alleles die out, and diversity is replace by many derived alleles

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Real world example: Red dots indicate rare alleles

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Key Idea: The genes will have reached fixation and decreased diversity

The diversity will all be in the form of rare alleles (because they are new)

Statistical Test: Fay and Wu’s H

Comparing Methods The difference between decreased diversity and increased frequency of new alleles?

Sabeti et al.

Vs.

IV. Population Differences Requires population split

Disproportionate shift in gene frequencies

Limited utility

Sabeti et al.

IV. Population Differences

Sabeti et al.

IV. Population Differences

Sabeti et al.

Tall Tree Island

IV. Population Differences

Sabeti et al.

IV. Population Differences

Sabeti et al.

Two separated populations--specific gene will show disproportionate shift in frequency with respect to the other genes

Limited to cases where there are two populations

Statistical Test: F(st), P(excess)

V. Long Haplotypes

Based on Linkage Disequilibria (LD) Long Haploblock and high frequency

Sabeti et al.

V. Long Haplotypes

Under neutral conditions, a new allele has low frequency and high linkage disequilibrium

Sabeti et al.

V. Long Haplotypes

As time goes on and the neutral allele increases in frequency recombination erodes the L.D.

Sabeti et al.

V. Long Haplotypes

Sabeti et al.

Genome-Wide Scanning Better estimation of background rate

Helps to confirm previous studies

Suggests future areas of research

MORE POWER

Sabeti et al.

Genome-Wide Scanning SNP: Single Nucleotide Polymorphisms (excludes other types of mutations) that occur at > 1% frequency

SNPs are the basis of many genome wide analyses

Sabeti et al.

“Forces Shaping the Fastest Evolving Regions in the Human Genome”

K. S. Pollard, S. R. Salama, B. King, A. D. Kern, T. Dreszer, S.

Katzman, A. Siepel, J. S. Pedersen, G. Bejerano, R. Baertsch, K. R. Rosenbloom, J. Kent, D. Haussler

Background Exploits the very recent sequencing of the chimp and human genome

Uses the rate of allele replacement as test for selection

Assumption is that highly changing parts of the genome have been under selective pressure

Pollard et al.

Idea Take chimp and mouse genome, find common regions

Compare these regions to human genome

Pollard et al.

Method Part I

First half: Find conserved regions. Use sequence tests to look for regions of 100bp with 96% similarity

Pollard et al.

Results Part I

Results Part I

Conclusion: These areas represent genes with deep functionality

Method Part II

Pollard et al.

Search human genome for conserved regions

Method Part II

Pollard et al.

For every region that doesn’t match up, label Human Accelerated Region

Formal Description

Pollard et al.

Results Part II Found 202 Human Accelerated Regions in total

These were regions where there had been rapid evolution in the past 5 million years

But evolution doesn’t mean selection

Pollard et al.

Possible Explanations Relaxation of negative selection -- ruled out because the rate of neutral evolution is slower for 201/202 HARs

Natural selection Sudden change in mutation rate

Pollard et al.

But was it Selection?

Pollard et al.

A Digression Biased Gene Conversion: Tendency to replace misaligned nucleotides with GC

In all but two of the HARs there was no evidence of a selective sweep but significant evidence of GC favored replacement

Pollard et al.

A Digression New Paper suggests BGC hotspots change for species

Conserved areas may suddenly become a BGC hotspot, explaining the HAR’s high BGC rates

Adaptation or biased gene conversion: Extending the null hypothesis of molecular evolution, Galtier & Duret 2007

Pollard et al.

General Implications Illustrates utility of genome wide approached--by using the full genome to establish a background rate, signals stand out of noise

Weaknesses: approach did not take into account failure to meet the assumption of neutral theory (mutation rate)

Pollard et al.

“Global Landscape of Recent Inferred Darwinian Selection for Homo Sapiens”

E. Wang, G. Kodama, P. Baldi, and R. K. Moyzis

Background Ever growing catalog of SNPs for human populations

SNP data can be used to construct haplotype maps

Can screen whole genome for haplotype outlier

Wang et al.

Idea Take only homozygotes

Bin the alleles together

Calculate the L.D. for each allele

Wang et al.

Idea

Wang et al.

Formalized Description

Wang et al.

Description of the Formalized Description

Wang et al.

Expected decay of LD for a allele of a specific frequency

Description of the Formalized Description

Wang et al.

Description of the Formalized Description

Wang et al.

Selective sweep will be more resistant to decay

Description of the Formalized Description

Wang et al.

Normalize with respect to the sigmoidal curve

Advantages of Method By using the whole genome can track not only for L. D. but the exponential decay of L.D. over distance. This helps to distinguish selective sweeps from other demographic shifts such as bottlenecks

Wang et al.

Results

Wang et al.

Results

Wang et al.

“Darwin’s Fingerprint”: Using different datasets from different populations, certain areas show consistent evidence of selection

Discussion

Wang et al.

Compare regions to known gene functions

Six groups predominate

Test was well designed

Limited detection: Genes cant be at fixation

Overall Conclusions It all comes down to statistics. What are the null assumptions? What are the alternate assumptions?

Genome-wide scans improve by allowing us to exploit this elegant statistical method in new ways Improved data for null hypothesis Increased volume to potential candidates

Wang et al.

Thank You!

Recommended