View
902
Download
0
Category
Preview:
Citation preview
Yeast Ribosomal DNA:Aging & Recombination
Michael J.T. O'KellyVan Oudenaarden Biophysics Lab, MIT
Hertz Fellowship Retreat, April 13, 2008
BubbleGene:“Evolving Mutually Perceptive Creatures for Combat”
Yeast Ribosomal DNA:Aging & Recombination
Michael J.T. O'KellyVan Oudenaarden Biophysics Lab, MIT
In this talk:●Motivation for my thesis work●Scientific background: proteins implicated in aging●pSNPs: mutation vs. recombination
Why study yeast ribosomal DNA?
Why study yeast ribosomal DNA?
Longevity research
Why study yeast ribosomal DNA?
Yeast is a model organism for longevity!
What does rDNA have to do with longevity?
Sir2: regulates silencing
Sir2 + Sir2
50% increased longevity
Fob1: regulates recombination
100% increased longevity
Fob1
Sir2: regulates silencing
Sir2 + Sir2
50% increased longevity
Fob1: regulates recombination
100% increased longevity
Fob1
What does rDNA have to do with longevity?
Background: Ribosomal DNA●Yeast rDNA consists of ~150 nearly identical* copies of a 9.1 kbp sequence encoding several ribosomal RNA's.
●Mutation strikes only one repeat at a time. Recombination either duplicates or eliminates repeats at random, homogenizing the rDNA
●Mutation in the rDNA array occurs about every 1,000 generations●Repeats are gained or lost about every 30 generations, through several recombinatory mechanisms.
●A partial Single Nucleotide Polymorphism (pSNP) is a mutation shared by only a fraction of the rDNA repeats in a particular yeast strain
Underlying process of pSNP propagation
This demo illustrates a simple model of recombination and mutation in rDNA.
Basic Implications of Neutral Mutation Model
●Any particular mutation has a 1/150 chance of becoming the new consensus.●~700 Recombination Unit Events are required, on average, to resolve a mutation.●We expect there to be ~30r unresolved mutations (pSNP's) in any strain, where r is the fraction of neutral basepairs in the rDNA.
Let's look for pSNPs using Whole-Genome Shotgun sequence data
Saccharomyces Genome Resequencing Project
“SGRP, the Saccharomyces Genome Resequencing Project, is a collaboration between the Sanger Institute and Prof. Ed Louis' group at the Institute of Genetics, University of Nottingham. Our goal is to advance understanding of genomic variation and evolution by analysing sequences from multiple strains of the two Saccharomyces species, S cerevisiae and S paradoxus.”
●36 strains of baker's yeast
●RDNA coverage: ~170x per strain
Finding pSNP's:Example: GATACATGTCTTGATAATGT
We use BLAST to align shotgun fragments, with a sliding window along the entire consensus rDNA sequence.
●Align all shotgun sequences that agree (mostly) with the target.
●Basepairs that deviate entirely are conventional Single Nucleotide Polymorphisms
●Basepairs that deviate sometimes are probablypartial Single Nucleotide Polymorphisms
ttttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacg ttgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaacc tGATACATTTCTTGATAATGTtgcatatcagtaa tttctggctcattgatagattgttGATACATTTCTTGATCATGT ttGATACATTTCTTGATAATGTtgcatatcagtaac agattgttGATACATTTCTTGATAATGTtgcatatcagt ctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaac atagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt ttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaac ctcattgatagattgttGATACATTTCTTGATAATGTtgcata tttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaac attgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccc gattgttGATACATTTCTTGATCATGTtgcatatcagtaacgtaaccc ttGATACATTTCTTGATAATGTtgcatatcagtaacgt attgatagattgttGATACATTTCTTGATAATGTt gctcattgatagattgttGATACATTTCTTGATAATGTtgcatat ttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt tcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt tctggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcag gattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaacccttg ttGATACATTTCTTGATAATGTtgcatatcag gatagattgttGATACATTTCTTGATCATGTtgcat tggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagt ctcattgatagattgttGATACATTTCTTGATCATGTtgcatatcagtaa ttgttGATACATTTCTTGATAATGTtgcatatcagt tggctcattgatagattgttGATACATTTCTTGATAATGTtgcatatcagtttttctggctcattgatagattgttGATACATTTCTTGATAATGTtgcat gatagattgttGATACATTTCTTGATAATGTtgcatatcagtaacgtaaccctt
SNP & pSNP map for one yeast strain
Disagreement ratio shows some bp with 100% disagreement, some with moderate disagreement, and many probably spurious points of low disagreement.
Total coverage varies from 25x to 150x (ignoring indels for now).
Using Quality Scores to evaluate correctness of disagreement
Quality score: n=0-60 represents reliability of nucleotide determination.
Let's reject all scores worse than 30.
Then C is accepted as a probable pSNP, but G is rejected.
G A T A C A T T T C T T G A T A G T G T5 5 5 5 3 6 5 6 3 3 4 5 5 3 3 5 2 3 6 66 8 2 6 1 0 6 0 3 1 2 8 5 5 2 7 6 8 0 0
G A T A C A T T T C T T G A T A A T G T4 5 5 5 2 6 4 5 4 4 5 6 5 4 5 3 5 3 6 64 1 0 8 8 0 0 0 6 2 0 0 3 1 5 6 2 0 0 0
G A T A C A T T T C T T G A T A A T G T4 3 5 4 5 3 6 6 5 3 5 5 4 5 4 3 5 5 5 57 3 9 7 8 9 0 0 8 8 1 8 3 1 2 3 3 8 1 9
G A T A C A T T T C T T G A T C A T G T5 4 4 2 3 3 5 5 5 6 5 5 5 3 4 6 3 3 6 57 8 8 7 6 0 9 2 8 0 6 8 9 4 0 0 7 6 0 9
G A T A C A T T T C T T G A T A A T G T5 5 5 6 4 4 5 4 3 5 5 6 4 6 6 4 5 5 5 69 5 6 0 5 3 5 5 1 9 9 0 4 0 0 4 9 9 2 0
P error =10−
n10
SNP & pSNP map after Quality Score filter
Quality coverage is nearly as frequent as total coverage. Most basepairs that disagreed in only one alignment had low Quality.
pSNP fingerprints of 14 yeast strains
(Peak heights exaggerated for visibility.) Rapid variation observed in intergenic regions, as expected.
Finding insertions and deletions
●BLAST does pairwise alignments only●Multiple-alignment necessary for comparing indels between strains●Solution: run MUSCLE on windowed BLAST output
How do we remove erroneous indels? Q-scores do not apply.
SNP & pSNP map, including indels
Substitutions Insertions Deletions
We can now reliably eliminate sequencing errors of all sorts.Remaining variation reflects real underlying repeat variation.
What you just saw
●Whole genome shotgun sequencing data provides a shapshot of mutation propagation in rDNA across multiple repeats and strains of yeast.
●Though the repeats can't be sequenced formally, shotgun reads from random positions let us examine them statistically.
●Fob1 ensures homogeneity of coding regions while permitting experimentation in non-transcribed spacers.
Future implications
●pSNP analysis in other species. Dozens of WGS libraries are publicly available. Humans have more than 1,000 repeats.
●Map recombination and silencing activity according to position in rDNA array.
●Analysis of pSNPs in a phylogenetic context.
Thanks! To Alexander van Oudenaarden and the AvO Biophysics Lab, esp. Ben Kaufmann, Arjun Raj, & Rui Zhen Tan. To Justin Lee, Leonid Mirny, Ian Roberts, Steve James, Kaijen Hsiao, and my parents. The Fannie & John Hertz Foundation for support, and everyone! Everyone!
Recommended