Kerstin Lindblad-TohWhitehead/MIT Center for Genome Research
Michael Kamal
Broad/MIT Center For Genome Reseach
A First Look at the Mouse Genome
• Preliminary mouse genome analysis
• Future directions (briefly)
http://www.nature.com/nature/mousegenome/
Article available online:
DraftBAC map 6.5 x shotgun coverage Genome Assembly
Finished sequenceBAC-based coverage XFinishing
Whitehead InstituteWashington University St LouisSanger InstituteEBI
Mouse Genome Sequencing Consortium
C57BL/6JFemale
-41 M reads-2 and 4 kb plasmids (90%)-10 kb plasmids (5%)-40 kb fosmids (5%)-155 kb and 200 kb BACs (RPCI-23 & 24)
-WI 54% of reads
Mouse Genome Sequencing Consortium
Genome size: Mouse < Human (2.5 vs 2.9 Gb)
46%
37%
400 Mb
Total Transposon-derived Repeat
Human
Mouse
Less Transposon Activity in Mouse Lineage?
100 Mb
Ancestral Repeat Lineage-Specific Repeat
Human
Mouse
No!!!! More Transposon Activity
More deletion in mouse
Protein-coding gene count falling (<30,000)
Mouse-Human Comparison~ 99% have homologs (maybe 100%)~ 96% have homolog in region of conserved synteny~ 80% have 1:1-ortholog~22,500 evidence-based gene predictions
Gene family expansions: reproduction, immunity
25 mouse-specific gene family clusterexpansions
• 14 reproduction• 5 host defense, immunity
Exons
Non-exons
75% 90%
Large conserved elements (>100 bp)
Large conserved elements: Coding, Non-coding
PPAR
How much of the genome is under selection?
Extremely high conservation:560,000 anchors
Less than half are coding exons (~220,000)
Nucleotide-level alignment: ~40% of genomes
WHYT
Why so much?
Given neutral substitution rate between mouse-human:Vast majority of truly orthologous sequence can be aligned!
Alignable does NOT imply Functional
Nucleotide-level alignment: ~40% of genomes
WHYT
Suppose: • Ancestral genome ~2.9 Gb • New transposons are offset by deletion
Ancestral genome remaining:• in human = 73%• in mouse = 57%• in both = 73% x 57% = 42%
Why so little?
Neutral substitution rate: ~0.46 per site
Mouse
Human0.31
0.15
Mouse 2x faster over 75 Myr
Substitutions in Ancestral Repeats roughly normal distribution
Neutral substitution rate: ~0.46 per site
Introns Coding exons
5’-UTR 3’-UTR
Upstream Downstream
CpG Islands Known Regulatory
Proportion of genome under selection: ~5%
Neutral sequence:Ancestral repeat
Whole genome:Alignable portion
Excess Conservation
Coding Exons only ~1.5%
What is the rest? UTR, Regulatory Elements,RNA genes, Structural Elements?
TNFα enhancer
Conserved
RefSeq
Genscan
Human
Mouse
ACCGCTTCCTCCACATGAGATCATGGTTTTCTCCACCAAGGAAGTTTTCCGAGGGTTGAATGAGAGCTTTTCCCCGCCC||||||||||||| ||||| |||||| |||||||||||||||||||||||| |||||||||| |||||||||||ACCGCTTCCTCCAGATGAGCTCATGGGTTTCTCCACCAAGGAAGTTTTCCGCTGGTTGAATGA--TTCTTTCCCCGCCC ******* ******** ********** ****** ****** ****** ********
NFat/Ets CRE k3-Nfat Ets Nfat AP1 SP1
Mouse Genome summary
• 2.5Gb in size (smaller than human, due to deletion)
• More lineage-specific repeats
• < 30,000 genes (>99% with homologs in human)
• Evolves 2x faster than human
• 95% of genome in blocks of conserved synteny
• 5% under selection (1.5% coding, the rest is unknown)
• Large haplotype blocks of domesticus or musculus
ancestry in inbred strains
Implications of mouse sequence
• Cloning of Classical mutations
• New Mutagenesis programs
• Identification of Quantitative Trait Loci (QTLs)
• Engineering Knock-outs, Knock-ins
• BAC transgenics
• Modeling human disease
• Understanding gene regulation
Future direction
• Finish mouse Genome
• Sequence more mammals (dog, chimp, marsupial)
•“Genomic accounting”
• Identify regulatory elements
• Mouse haplotype map
Genomic Alignments for Multiple Species
•Sequence more mammals (dog, chimp, marsupial)
•“Genomic accounting”
• Identify regulatory elements
• Mouse haplotype map
…. integrated with gene expression analysis
Acknowledgement
Whitehead InstituteKerstin Linblad-TohMichael C. ZodyDavid JaffeClaire WadeMark DalyJade VinsonElinor KarlssonEJ KulbokasNicole Stange-ThomannRob NicolTim HolzerToby BloomJill MesirovChad NusbaumBruce BirrenEric Lander
Washington UniversityJohn McPhersonBob Waterston
Sanger InstituteJim MullikinJane Rogers
Analysis GroupDavid HausslerJim KentArian SmitChris PontigWebb MillerRoss HardisonLaura ElnitskyInna DubchakLior PachterSean EddyMichael BrentRoderic GuigoWayne FrankelCarol Bult
EnsemblEwan Birney
•Mouse Liaison group•University of Oklahoma•Albert Einstein/Harvard•NIH ISC•TIGR•CHORI