Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
http://bejerano.stanford.edu 1
Tales from the Dark Side of Your Genome
Dept. of Developmental BiologyDept. of Computer Science
Stanford University
Gill Bejerano
http://bejerano.stanford.edu 2
Biology has become a quantitative science
strings
circuits
time series
http://bejerano.stanford.edu 3
Biology has also become a meeting place
For PhysicistsMathematiciansEngineersBiologistsComputer Scientistsand more…
http://bejerano.stanford.edu 4
AYBABTU
http://bejerano.stanford.edu 5
Genomics in a nutshell
http://bejerano.stanford.edu 6
DNA: Functional and Non-FunctionalDNA = linear molecule that carries instructions for making living organisms ~ long string(s) over a small alphabet
Alphabet of four {A,C,G,T} Strings of length 104-1011
...ACGTACGACTGACTAGCATCGACTACGACTAGCAC...
genetic instructions:
how to...when to...where to...
“junk”
DNA “junk”
DNA
http://bejerano.stanford.edu 7
One Cell, One Genome, One ReplicationEvery cell holds a copy of all its DNA = its genome.The genome is replicated every cell division.The human body is made of ~1014
cells.All originate from a single cell through repeated cell divisions.
cell
genome =all DNA
chicken ≈
1014
copies
(DNA) of egg (DNA)
chicken
egg egg
egg
celldivision
DNAstring
http://bejerano.stanford.edu 8
Genes = How to make Proteins
gene
DNA
cell“the workhorses of every living cell”
http://bejerano.stanford.edu 9
...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...
DNA Replication is ImperfectMedium Scale: substrings are duplicated, deleted, invertedLarge Scale: whole DNA strings are duplicated, deleted
...ACGTACGACTGACTAGCATCGACTACGA...
...ACGTACGACTGACTAGCATCGACTACGA........TCTGACTAGCATCGACTACGA...
functionaljunk
functionalfunctional
functional’’functional’
substringduplication
functionaldivergence
So...More Genes...More Complexity!...Right?
http://bejerano.stanford.edu 10
1. Gene number does not correlate with Complexity
Gene families are important. Many are surprisingly old.But -
flyworm
humanweed
fishrice
# genes
103
cells1014
cells pre-genomic era:“100,000 genes tothe human genome”
http://bejerano.stanford.edu 11
DNA Replication is Imperfect (contd)
Small Scale: single letters are substituted, erased, added
...ACGTACGACTGACTAGCATCGACTACGA...
chicken
egg ...ACGTACGACTGACTAGCATCGACTACGA...
functionaljunk
TT CAT
“anythinggoes”
many changesare not tolerated
chicken
thus, sequence conservation over generations implies function!
http://bejerano.stanford.edu 12
Sequence Conservation implies Function
(but which function/s?...)
human
mouse
mammalianancestor
...CTTTGCGA-TGAGTAGCATCTACTATTT...
...ACGTGGGACTGACTA-CATCGACTACGA...
functional region!
Comparative Genomics of Distantly related species:
http://bejerano.stanford.edu 13
HumanGenome:
3*109
letters
2. Human Genome full of Conserved Non-Coding Elements
[Science 2004 Breakthrough of the Year, 5th
runner up]
1.5%known
function >50%junk
3x more functional DNA than known!
compare to other species
>5% human genome functional
~106 substrings do not code for protein
What do they do then?
http://bejerano.stanford.edu 14
Gene regulation = when/where to make protein
gene (how to)control region
(when & where)
DNA
effective region~103 letters
recognition site~10
letters/protein
Unicellular
http://bejerano.stanford.edu 15
Vertebrate Gene Regulation
gene (how to)control region
(when & where)
DNA
effective region ~106 letters!!!
(~103 letters)
Multicellular
http://bejerano.stanford.edu 16
3. Most Non-Coding Elements are likely cis-regulatory
9Mb
“IRX1 is a member of the Iroquois homeobox
gene family. Members of this family appear to play multiple rolesduring pattern formation of vertebrate embryos.”
gene deserts
regulatory jungles
http://bejerano.stanford.edu 17
The Writing on the Wall…gene deserts
regulatory jungles
25,000
1,000,000
http://bejerano.stanford.edu 18
DNA Conservation levels
[Bejerano et al., Science 2004]
Conserved elements between human and mouse are on average 85% identical. [mouse consortium, 2002]
http://bejerano.stanford.edu 19
Ultraconserved Elements
[Bejerano et al., Science 2004]
fish
http://bejerano.stanford.edu 20
Ultraconserved ElementsHundreds of long substrings identical between human-birds
they must have rejected many different changes.But... all functions we understand in our genome are encoded using redundant codes.
E.g. Protein Coding Genes:DNA –
108
letters over alphabet of 4.
Protein –
102
letters over alphabet of 20.
Coding: 3 DNA letters → 1 Protein letter.
*****
[Bejerano et al., Science 2004]
http://bejerano.stanford.edu 21
No known function requires this much conservation
CDS ncRNA TFBS
*****
seq.
?
http://bejerano.stanford.edu 22
What do they do?
http://bejerano.stanford.edu 23
Genomic Distribution of Ultraconserved Elements
•exonic•non•possibly
http://bejerano.stanford.edu 24
Annotation by Association
Measure Correlation between genomic regions and annotation
genome
heterogeneous body of knowledge Testable Hypothesis
dd
d
http://bejerano.stanford.edu 25
Ultras are Functional
Back in 2004 we hypothesized:
481 ultraconserved
elements
exonic
subset –post transcriptional regulation
[Ni et al., Genes Dev.; Lareau
et al., Nature, 2007]
“nonexonic”
subset –transcriptional regulators
[Pennacchio
et al., Nature, 2006]
http://bejerano.stanford.edu 26
Repeat made Regulatory Region
Reporter GeneMinimal PromoterConservedElement
in situ
transgenic
http://bejerano.stanford.edu 27
Zoom to uc.351, 225Kb upstream of DACH
ultra conserved
e.de.d
12.512.5
[[NobregaNobrega
et al., 2003]et al., 2003]
http://bejerano.stanford.edu 28
A Vertebrate Innovation?Only 24 ultras can be partially traced back through direct sequence search to Ciona, C. Elegans or
Drosophila.All overlap coding exons from known genes (17 of which show clear evidence of alt-splicing inc. EIF2C1, DDX, BCL11A, EVI1, ZFR, CLK4, HNRPH1, GRIA3).
No intronic element in human was found to be coding in another species, although in some cases EST evidence indicates intron retention, presumably not as CDS.
Interestingly, ribosomal DNA (not part of the draft genomes) also harbors 6 ultraconserved elements in 18S, 28S.
def
defdef
http://bejerano.stanford.edu 29
Similar Phenomena in Flies
[Siepel, Bejerano et al., Genome Research 2005][Glazov, ..., Bejerano, Mattick,
Genome Research 2005]
rich in conserved non-coding
fly-specific ultraconserved elements
http://bejerano.stanford.edu 30
Genomic Distribution of Ultraconserved Elements
•exonic•non•possibly
http://bejerano.stanford.edu 31
Repeats / obile
Elements ("selfish DNA")
HumanGenome:
3*109
letters1.5%
knownfunction >50%
junk
http://bejerano.stanford.edu 32
Cis-reg
& Ultra elements from obile
Elements
[Yass
is a small town in New South Wales, Australia.]
Co-option event, probably due to favorable genomic context
All other copies are destined to decay over time at a neutral rate
[Bejerano et al., Nature 2006]
http://bejerano.stanford.edu 33
Exapted Into Which Cellular Roles?
?
xHuman instances cluster together, found <1Mb from 35 TFs
(P<3*10-6).
No evidence for Transcription (Tx) as small RNAs,no orientation preference in introns, not in antisense Tx.
http://bejerano.stanford.edu 34
Repeat made Regulatory Region
Reporter GeneMinimal PromoterConservedElement
in situ
transgenic
http://bejerano.stanford.edu 35
Co-option into Different Roles
repeat
proteincoding
gene
regulating
http://bejerano.stanford.edu 36
Relation to Human Disease
[Derti
et al., Nature Genetics, 2006]
SHH LMBR11Mb Limb
Lettice et al. HMG 2003 12: 1725-35
http://bejerano.stanford.edu 37
Ultras are Under Strong Human Selection
Ultra DAF NonSyn
DAF
[Katzman
et al, Science ,2007]
http://bejerano.stanford.edu 38
Ultraconserved Non-coding RNA
[Calin
et al, Cancer Cell, 2007]miRNA
complementarity
About 1/3 of all ultras are expressed.Some are predicted to provide
microRNA
targets.A few are anti-correlated with miRNA
expression levels.A few even act as oncogenes.
http://bejerano.stanford.edu 39
GGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCGAAAGACCTGTTGGAGGCTATGAATGCAATCAAGGTGACAGACAACTGGTGCAATGATGGTAGTGGAAATGGAGGAGAGGGGATTGATTCAAGATGCATTTAGGACCAAGAATCGGGAGCTTGTGAACGTGTGTATGAGTACTGTAGACGGAGTGGGTGTGTCATCAGAGAAGATCTGAGCATTTGGGCTTGCTCTCCTCAGAGGCCCTGCGAGTGGAGTTCAGCTTTTCCTCATGGGGCAAATCTCACTTTCGCTCCAGTTCCTGGGGCTCAGAGTCCCTGGCCCAGATGCCTCTTGCCATCTCATCTTCACCCTGCCTGGCTTCCCTTGCTTGTTCCAGGATTGTTTCATAAAGAGGGATGTGGTTGGTCTTTAACCCTATGAATGCTGGCTGAGGATGCCTGCGGAACCTGTAGTGAAGCTTTCAGGGGCTGCTCGGGTTCTGGCTGGTAGGTGAACACTGTCCATCTTGCCGGCTGGGACACAGTGACTCTGGGTAGTTGTGTAAGAGAGGGGCCCTTGGCAGACAAACAGGTTCTTCTCTGTTGGTGGGCCAGCCAGCAGGTCAGTGGGAAGGTTAAAGGTCATGGGGTTTGGGAGAACTGGGTGAGGAGTTCAGCCCCATCCCCCGTAAAGCTCCTGGGAAGCACTTCTCTACTGGGGCAGCCCCTGATACCAGGGCACTCATTAACCCTCTGGGTGCCAGGGAAAGGGCAGGAGGTGAGTGCTGGGAGGCAGCTGAGGTCAACTTCTTTTGAACTTCCACGTGGTATTTACTCAGAGCAATTGGTGCCAGAGGCTCAGGGCCCTGGAGTATAAAGCAGAATGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAGACGTGAGCAGGTGAGCAGCTGGGGCTGTCTGCTCTCTGTGCCCAG
Touch an Ultra And You …?
http://bejerano.stanford.edu 40
Touch an Ultra And You -
DIY
Nadav
Ahituv, Eddy Rubin, LBNL
http://bejerano.stanford.edu 41[Ahituv
et al, 2007]
Complete the Sentence: Ultra KO Mice Are ------
http://bejerano.stanford.edu 4242
Unchangeable but expendible?
Under Strong Selection:
Selection coeff.
Functional:
And expendible??
Under Strong Selection:
http://bejerano.stanford.edu 4343
allneutralfunctional
-
75 50
Primate-Dog Non-Exonic
Rodent-Specific Losses
[International Mouse Genome Sequencing Consortium,Nature, 2002]
Lost in Mouse & Rat(in <1000bp deletions)
ultras~300 fold
more persistent
thanneutral
http://bejerano.stanford.edu 44
What we do understand..Ultraconserved elements exist.They are maintained via strong on-going selection.It is a heterogeneous bunch:Some mediate splicingSome regulate gene expressionSome express ncRNAs(categories are not necessarily mutually exclusive)Knockouts of four regulatory ultras do not lead to severe phenotypes (similar protein cases: Pbx2, Nkx6.2, Gli1)
http://bejerano.stanford.edu 45
What we don’t understand
Their functional density:How did they come to be?What is the selective advantage that lets them persist?
http://bejerano.stanford.edu 46
Kudos
Bejerano Lab:Cory McLeanAbraham BassanShoa
ClarkeEdward ChuongFah
Sathirapongsasuti
UCSC: David Haussler, Craig Lowe, Jim Kent, Sofie
Salama, the lotLBNL: Eddy RubinUCSF: Nadav
Ahituv
Edward Mallinckrodt, Jr
Foundation
http://cs273a.stanford.edu
fall quarterat a classroom
near you..