View
216
Download
0
Category
Preview:
Citation preview
Regulatory Genomics
Lecture 1 November 2012
Yitzhak (Tzachi) Pilpel
Lecture 1 November 2012
Yitzhak (Tzachi) Pilpel1
Course requirements
• Attendance and participation
• Five reading assignments
• A final take home papers reading-based exam
• website
In total 13 or 14 meetings (not 17…)No meeting on Nov 15th
2
Genomics marked the beginning of a new age in
biology and medicine
1900
1953
1977
1980
1983
1990
1994-98
1998
2000
2005
Watson and Crick identify DNA(the double helix) as the Chemical basis of heredity
DNA markers used to map human disease genes to chromosomal regions
Human Genome Projects (HPG) begins-an international effort to map and sequence all the genes in the human genome
DNA markers used to map human disease genes to chromosomal regions
Release of Human Genome Project
Sanger and Gilbert derive methods of sequencing DNA
Huntington disease gene mapped to chromosome 4
Genetic and physical mapping
Working Draft of the human genome sequencing complete
Rediscovery of Mendel's laws helps establish the science of genetics
Source: Health Policy Research Bulletin, volume 1 issue2, September 2001
3
The genome browser
Link4
Number of protein coding genes20,210
19,735
13,601
5,616
20,568
482Mycoplasma genitalium
Mouse Fruit fly Mustered(Arabidopsis)
Worm (C elegans)
Yeast(S Cerevisiae)
5
How comes we have so few genes give that we are so complex???
19,735
21,710•We have many non-protein coding genes
•Our genes are longer and more complex
•Regulation of human genes activity is more complex
•Repeats (formerly known as “junk DNA” (yet not garbage) contribute to complexity
•Combinatorial interactions among genes and products
6
The hierarchical structure of the genome
Lodish et al. Molecular Biology of the Cell (5th ed.). W.H. Freeman & Co., 2003.
7
Expressing the genome
8
The Central Dogma: a cellular context
915
DNA mRNA Protein
Inactive DNA
The Central Dogma of Molecular BiologyExpressing the genome
RNA
10
Evolution
11
Corrected view of evolution
12
The tree of life
13
How genomes evolve?
Consider two distinct possibilities:
•Genomes evolve by lots of de-novo “inventions”
•Genomes evolve predominantly by mixing and matching existings
parts 14
Classification of protein structures
15
Very slow growth in number of protein folds
Very few structural “inventions”
16
Comparing a certain family (e.g. kinases) in different species reveals few
“inventions”
17
Analogy:
•Technology
•Language
18
Some basic evolutionary operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes, chromosomes, genomes)
• Transfer of genes from one organism to another
• Functionalization of “junk DNA”
• Reverse transcription??19
Stress condition induce high DNA replication error rate
Because most newly arising mutations are neutral or deleterious, it has been argued that the mutation rate has evolved to be as low as possible, limited only by the cost of error-avoidance and error-correction mechanisms. But up to one per cent of natural bacterial isolates are 'mutator' clones that have high mutation rates. We consider here whether high mutation rates might play an important role in adaptive evolution. Models of large, asexual, clonal populations adapting to a new environment show that strong mutator genes (such as those that increase mutation rates by 1,000-fold) can accelerate adaptation, even if the mutator gene remains at a very low frequency (for example, 10[-5]). …
20
Some basic evolutionary operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes, chromosomes, genomes)
• Transfer of genes from one organism to another
• Functionalization of “junk DNA”
• Reverse transcription??21
A slight change in expression program can make a big change: olfactory receptor can “smell the egg”
22
Science. 2003 Mar 28;299(5615):2054-8.Identification of a testicular odorant receptor mediating human sperm chemotaxis.Spehr M, Gisselmann G, Poplawski A, Riffell JA, Wetzel CH, Zimmer RK, Hatt H.SourceDepartment of Cell Physiology, Ruhr University Bochum, 150 University Street, D-44780 Bochum, Germany.AbstractAlthough it has been known for some time that olfactory receptors (ORs) reside in spermatozoa, the function of these ORs is unknown. Here, we identified, cloned, and functionally expressed a previously undescribed human testicular OR, hOR17-4. With the use of ratiofluorometric imaging, Ca2+ signals were induced by a small subset of applied chemical stimuli, establishing the molecular receptive fields for the recombinantly expressed receptor in human embryonic kidney (HEK) 293 cells and the native receptor in human spermatozoa. Bourgeonal was a powerful agonist for both recombinant and native receptor types, as well as a strong chemoattractant in subsequent behavioral bioassays. In contrast, undecanal was a potent OR antagonist to bourgeonal and related compounds. Taken together, these results indicate that hOR17-4 functions in human sperm chemotaxis and may be a critical component of the fertilization process.
23
Some basic evolutionary operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes, chromosomes, genomes)
• Transfer of genes from one organism to another
• Functionalization of “junk DNA”
• Reverse transcription??24
nonfunctionalization
neofunctionalization subfunctionalization
duplication
Gene duplication might provide redundancy
25
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
chromosome III duplicates in heat
Gene Index
log 2(e
xpre
ssio
n ev
o39
/ evo
30)
all genes
chromosome III genes
P value < 10e-100
26
Heat shock tolerance correlates with chromosome III copy number
0
0.5
1
1.5
2
2.5
3
3.5
Rela
tive
Surv
ival
WT Two copies
Evolved 3 copies
WT One copyWT, 3 copies27
• Chromosomes are easily gained and lost in yeast evolution
• A more fine-tuned solution may follow chromosome duplication
• A sticking similarity between repetitive experiments
• A chromosome-condition specificity?
Conclusions from the experiment
28
29
Many gene duplicate distances Correspond to 60-70 mya!!
Sequences similarity between gene pairs
30
Some basic evolutionary operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes, chromosomes, genomes)
• Transfer of genes from one organism to another
• Functionalization of “junk DNA”
• Reverse transcription??31
Horizontal (“lateral”) gene transfer: transfer genes between organisms –
mostly in stress
32
Some basic evolutionary operations
• Mutating existing DNA
• Change gene expression profiles
• Duplications of existing material (genes, chromosomes, genomes)
• Transfer of genes from one organism to another
• Functionalization of “junk DNA”
• Reverse transcription??33
Evolution of transcriptional switches
Similar function
Neutral selection
Disrupted function
Low ratepurifying selection
TF1
TF2
Altered function
Low ratepurifying selection
TF1
Gained function
TF1
CACGCGTACACGCGTT
TF1
CACGAGTTCACGCGTT
CACACGTTCACGCGTTCACACGTTCACGCGTT
Low ratepurifying selection
34
Evolution of transcription networks
35
pilpel@weizmann.ac.il
36
Repetitive elements in the human genome
•Alu are repetitive retrotransposons elements in the Human genome. •Alu elements are about 300 base pairs long and are therefore classified as short interspersed elements (SINEs) •There are over one million Alu elements interspersed throughout the human genome•About 10% of the human genome consists of Alu sequences. 37
Retro-transposition
38
Alus may contain binding sites for TFs, microRNAs…
Alus
Alus 39
Can the phenotype shape the genotype?
Classical Darwinian theory
Lamarckian Theory
Genotype Phenotype
Genotype Phenotype
40
The Central Dogma: a cellular context
4115
42
cell division
celldeath
Attack a virus differentiate
proteinsynthesis
* * *
Cell membrane
Nucleus
43
* * *From parts to networks…
44
Reporter gene reveal spatio-temporal expression programs
45
In uni-cellulars response to environmental signals affect gene expression dramatically
Genes
Gasch et al Mol Biol Cell. 2000 Dec;11(12):4241-57.46
The transcriptome during the cell cycle
Spellman et al Mol Biol Cell. 1998 Dec;9(12):3273-9747
Coding DNA strandNon-coding strandRNA
48
Transcription regulation
• The hardware
• The software
• The input
• The output
49
The initiation machinery complex
50
Transcription factors bind the DNA
51
ATACGAT
Keys (regulators) can scan the genomes in search for their locks (recognition sites)
52
Transcription regulation
• The hardware
• The software
• The input
• The output
53
http://esg-www.mit.edu:8001/esgbio/pge/lac.html
In the absence of Lactose
54
http://esg-www.mit.edu:8001/esgbio/pge/lac.html
In the presence of Lactose
The Lac Operon (Jacob and Monod)
55
http://esg-www.mit.edu:8001/esgbio/pge/lac.html
In the absence of Glucose
56
The logic of the Lac operon regulation
CAPsite
Operator
Glu
cose
Lac
tose
+ -
- +
- -
+ +
Activity
OFF
ON
OFF
OFF
Lactose
n y
OFF
n y
OFF ON
Glucose
57
Genomic Regulatory Logic
58
DNA binding proteins for unique pathways
59
A global map of combinatorial expression control
mRPE72
SWI5
SFF '
MCM1
SFFMCM1'
ECB SCB
MCB
PAC
mRRPE
mRRSE3
GCN4
BAS1
LYS14
RAP1
mRPE34
mRPE57
mRPE6mRPE58
STRE
RPN4 ABF1
PDR
CCA
PHO4
AFT1
STE12
MIG1
CSRE
HAP234
ALPHA1'
ALPHA1
ALPHA2
mRPE8
mRPE69
Heat-shockCell cycleSporulationDiauxic shiftMAPK signalingDNA damage
*High connectivity
*Hubs*Alternative partners in various conditions
60
Transcription regulation
• The hardware
• The software
• The input
• The output
61
5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT
5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG
5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT
5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC
5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA
5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA
5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA
AlignACE ExampleAlignACE Example
…HIS7 …ARO4…ILV6…THR4…ARO1…HOM2…PRO3
300-600 bp of upstream sequence per gene are searched in
Saccharomyces cerevisiae.
62
5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT
5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG
5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT
5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC
5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA
5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA
5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA
AAAAGAGTCA
AAATGACTCA
AAGTGAGTCA
AAAAGAGTCA
GGATGAGTCA
AAATGAGTCA
GAATGAGTCA
AAAAGAGTCA
**********
AlignACE ExampleAlignACE Example
MAP score = 20.37
…HIS7
…ARO4
…ILV6
…THR4
…ARO1
…HOM2
…PRO3
The Best MotifThe Best Motif
63
Transcription regulation
• The hardware
• The software
• The input
• The output
64
Expression regulation of genes determines complex spatio-temporal patterns
65
Monitor expression during
cell cycle
0 5 10 15-2
-1
0
1
2
3
4
Time
mR
NA
exp
ress
ion
leve
l
G1 S G2 M G1 S G2 M 66
Time-point 1
Tim
e-po
int 3
Tim
e-po
int 2
-1.8
-1.3
-0.8
-0.3
0.2
0.7
1.2
1 2 3
-2
-1.5
-1
-0.5
0
0.5
1
1.5
1 2 3
-1.5
-1
-0.5
0
0.5
1
1.5
1 2 3
Time -pointTime -point
Time -point
Nor
mal
ized
Exp
ress
ion
Nor
mal
ized
Exp
ress
ion
Nor
mal
ized
Exp
ress
ion
Genes can be clustered based on time-dependent expression profilesGenes can be clustered based on time-dependent expression profiles
67
The K-means algorithm
• Start with random positions of centroids.
Iteration = 0
68
K-means
• Start with random positions of centroids.
• Assign data points to centroids
Iteration = 1
69
K-means
• Start with random positions of centroids.
• Assign data points to centroids.
• Move centroids to center of assigned points.
Iteration = 1
70
K-means
• Start with random positions of centroids.
• Assign data points to centroids.
• Move centroids to center of assigned points.
• Iterate till minimal cost. Iteration = 3
71
The diauxic shift
Time 72
Genetic reprogramming of the yeast metabolism upon glucose deletion
73
Glucose
Pyruvate
2 ADP+Pi
2 ATP
NAD+
NADH
EthanolLactateFermentFerment
AcetylCoA
TCA
NAD+NADHRespirate
At the beginning – whenglucose is abundant
74
Glucose
Pyruvate
2 ADP+Pi
2 ATP
NAD+
NADH
EthanolLactateFermentFerment
O2O2
AcetylCoA
TCA
NAD+NADHRespirate
~20 hours laterwhen glucose is depleted
75
The promoter sequences of co-expressed genes
5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT
5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG
5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT
5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC
5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA
5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA
5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA
…HIS7 …ARO4…ILV6…THR4…ARO1…HOM2…PRO3
76
Promoter Motifs and expression
profilesCGGCCCCGCGGA
CTCCTCCCCCCCTTC TGGCCAATCA
ATGTACGGGTG
77
Recommended