Combinatorics of promoter regulatory elements determines
gene expression profiles
Yitzhak (Tzachi) Pilpel
Priya Sudarsanam
George Church
DJ Club, Feb. 2001
Goals of study
• Identify regulatory networks on a genome-wide scale
• study the combinatorial nature of transcription regulation
• Propose causal link between promoter sequence elements and expression patterns
.
Time Point 1
The current methodology for expression - regulatory motif analysis(Tavazoie et al.)
Collaboration
?
Co-occurrence
(AND)
Redundancy
(OR)
In case of two motifs derived from a cluster
Two motifs derived from the same cell-cycle cluster
Nor
mal
ized
exp
ress
ion
leve
l
0 5 10 15-3
-2
-1
0
1
2
3
4
MCB andSCB
0 5 10 15-3
-2
-1
0
1
2
3
4
Time
MCB but notSCB
0 5 10 15-4
-3
-2
-1
0
1
2
3
4
SCB but notMCB
TimeTime
.
Time Point 1
Is this motif necessarily non-functional ?
In case of multiple clusters that give rise to a motif
Condition-specific TF-TF interaction can be identified (in cell cycle)
Mcm1 ForkheadForkhead & Mcm1
0 5 10 15-3
-2
-1
0
1
2
3
4
0 5 10 15-3
-2
-1
0
1
2
3
4
Time Time Time
0 5 10 15-3
-2
-1
0
1
2
3
4
Assigning promoters to motifs :ScanACE(Hughes et al.)
Expression
.
Time Point 1
A proposed reversed analysis method:ScanACE
ScanACE
To avoid circularity we generated expression-independent motif data set
• 327 - motifs derived from MIPs functional classification (Hughes J et al.)
• 40 motifs of known TFs were added (27 overlapped to the MIPs derived motifs)
Expression experiments used
• Cell cycle (Cho et al.) • Sporulation (Chu et al.) • Diauxic shift (DeRisi et al.) • Heat shock (Eisen et al.) • Cold shock (Eisen et al.) • Reduction with dtt (Eisen et al.) • MAPK signaling (Roberts et al.) • NER (Jalinski et al.) • Peroxide (Cohen et al.)
.
0 2 4 6 8-3
-2
-1
0
1
2
3
time
.
0 5 10 15-3
-2
-1
0
1
2
3
4
time
.
0 2 4 6 8-3
-2
-1
0
1
2
3
time
Ndt80.
0 5 10 15-3
-2
-1
0
1
2
3
4
time
Putativemotif
Sporulation Cell-cycleUse a Diversity of expression data to diagnose motifs
The expression coherence score
*
**
*
*
*
*
**
*dij
Threshold dij (top 5 %)
Expression coherence=fraction of i,j pairs with dij <Threshold dij
Gene Set 1 Gene Set 2
Identification of functional motifs
0
0.05
0.1
0.15
0.2
0.25
0.3
cell cycle
sporulation
diauxic shift
heat shock
MAPK
NER
0.00
5.00
10.00
15.00
20.00
25.00
30.00
cell cycle
sporulation
diauxic shift
temp shift
New significantly highly scoring motifs
For a motif with 300 occurrences in URs the genome, the p-value for an expression coherence score of 0.1 is < 1e-12 P ( p) ~ BinomCDF(p,P,0.05), where p, and P are numbers of correlated pairs and total number of pairs, respectively
For two motifs, RRPE and PAC
.
0.06 0.1 0.14 0.18 0.22 0.260
50
100
150
200
RRPE-PAC
Expression coherence
PACRRPE
For every combination of N=2,3 motifs
•Calculate the expression coherence score of the orf that have the N motifs
•Calculate the expression coherence score of orfs that have every possible subset of N-1 motifs
•Test (statistically) the hypothesis the score of the orfs with N motifs is significantly higher than that of orfs that have any sub set of N-1 motifs
Ribosomal motifs
Rap1rRSE3
rRPE
PAC
LYS
rRSE10
RPE58 RPE49
RPE34
OCSE15
RPE57
RPE69
RPE21
RPE6
RPE72
CCA
MERE17
RPE8
RPE17
Rap1-rRPErRPE-PACPAC-rPPS2...
Cell cycle and sporulation motifs
MCB
SCB
Ndt80
SSF
Mcm1
Middlesporulation
G1-Scell cycle
G1-Scell cycle
G2-Mcell cycle
G2-Mcell cycle
Cell-cycle
Sporulation
Motif combinations establish sequence-expression causality
2
222
11 1 11
1
12121212
* *
*
53
53 5465
5
546
54
InterGMC
123456
123456
Intra-GMC
0.3
0.6
0.9
1.2
1.5
1.8
1-C
.C
0.2
0.4
0.6
0.8
1
Exp
ress
ion
cohe
renc
e 'MCB' 'cytok9' 'ndt80' 'Ume6' 'meiosis_3' 'SCB' 'CLB2' 'FKH1Sh'
Cell-cycle
Less than a minuteon a PowerMac G4(after pre-processing)
0
0.3
0.6
0.9
1.2
1.5
1.8
1-C
.C
0.2
0.4
0.6
0.8
1
Exp
ress
ion
cohe
renc
e 'MCB' 'cytok9' 'ndt80' 'Ume6' 'meiosis_n3' 'SCB' 'CLB2' 'FKH1Sh'
Sporulation
From the literature: 1)Meiotic role of SWI6 in
(Nucleic Acids Res. 1998)
2) Role for MCB in sporulation(Nature Genetics 2001)
• Different role for MCB and SCB
• A potential role of SCB-fkh in giving rise to an Ndt80-type of response
• Ndt80’s only synergistic partners in sporulation are cell cycle motifs
We add:
.
0.3
0.6
0.9
1.2
1.5
0.2
0.4
0.6
0.8
'Rap1' 'RPE6' 'PAC' 'rRPE' 'rRSE3' 'rRSE10' 'Abf1' 'REB1' 'CCA' 'RPN4' 'HAP234' 'LFTE17'
'Rap1' 'RPE6' 'PAC' 'rRPE' 'rRSE3' 'rRSE10' 'Abf1' 'REB1' 'CCA' 'RPN4' 'HAP234' 'LFTE17'
NER
What can we infer about specific network architecture ?
• Asses the contribution of each motif in a combination
• Establish hierarchy motifs
• Identify the logical association between motifs: OR for cases of redundancy, and for cases of synergy
A global motif interaction map
RPN4
Abf1
HAP2-3-4
STRE
MCB
Gcr1FKH1
Rap1
MERE11 MERE4
rRSE3rRPE
rRSE10CCA
LFTE17
OCSE15
Mcm1
FKH1Sh
SCB
Leu3
GCN4
PAC
RPE6
LYS14
cytokinesis9
Cell cycleRibosomalproteinsrRNAtranscriptiona.ametabolismStressEnergy
ChromosomeStructure
a1
2
1
What can we learns about global interaction ?
• Identify central motif players
• Suggest regulatory role of un-annotated motifs
Acknowledgments• Priya Sudarsanam
• Barak Cohen
• John Aach
• Aimee Dudley
• Jason Hughes
• Rob Mitra
• Wayne Rindone
• Fritz Roth
• Uri Keich (UCSF)
• George Church
1 2 3 . . . NGMC1
GMC1
GMC2
GMC2
GMC1
GMC1
GMC1
GMC1
GMC1
Genes defined by Motif Combination (GMC)