Upload
cory-greer
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
From patterns to pathways. Causal analysis of gene expression data
Alexander Kel
BIOBASE GmbH
Halchtersche Strasse 33D-38304 Wolfenbuettel
Germany
[email protected] www.biobase.de
TRANSCompel
TRANSFAC
TRANSPATH
Patho DBS/MARt DB
- mechanistic- semantic
Match Patch
Catch
Pathway builder Array analyser
Cytomer TRANSGenome TRANSPLORER
CMFinder
The TRANSFAC® System comprises 7 databases:
TRANSFAC® Professional Suite
TRANSFAC® Professional
Transcription factor database
TRANSCompel® Professional
Composite elements database
PathoDB® Professional
Pathologically altered transcription factors
TRANSPRO™Professional
Collection of human promoter sequences
S/MARt DB™Professional
Scaffold or Matrix Attached Regions databases
Cytomer® Ontology of cells, structures, organs
TRANSPATH® Professional
Signal transduction pathways
TRANSFAC® Professional
Transcription factor database
…cis
trans
Human genes Sequences and positions of AP-1 binding sites glutathione P-
transferase
enhancer at -2500
hemoglobin, epsilon
-80 н.п.
Akt-2
-100 н.п.
IFN-
-89 н.п.
Apo АII
-792 н.п.
Melanotransferin
-2013 н.п.
Collagenase
-72 н.п.
proto-oncogene
c-myc
-335 н.п.
porphobilinogen deaminase
-162 н.п.
GM-CSF
enhancer at -3500
TGAСTTT
TGACATC
TGTCACC
TGACTCA
TGAGTCA
TGAGTCA
TGATTTA
TGACTCA
TGACTCA
ST
GM-CSF Homo sapiens
+1
T-cell specific inducible enhancer at –3500 bp Promoter
TATTT
-54
AP-1
NFAT
CE
NF-Bp50/p65
-88
AP-1
NFAT
CE
AP-1
NFAT
CE
AP-1
NFAT
AP-1
NFAT
CE
NF-Bc-Rel/p65
HMG Y(I)
-114
CD28 response element
CBF CBF
Structure of regulatory regions of eukaryotic genes
T F IIA T F IIE
T F IIH
T F 1
S 1S 2 S 3
T F IIF
R N A p o l II
T F IID
H isto n e a c e ty la seT F IIB
T F 2 T F 3
Protein-DNA and protein-protein interactions in gene transcriptional regulation.
Transcription factors
Sequence-specific DNA binding
Non-DNA binding
TF1 TF2 TF3 TF4
adapter
Co-activator
HAT
DNA
Layer I
Layer III
Layer II
interactingfactor
coding regionregulatory region
gene
expression
SITE
FACTOR
GENE
SYNONYMS
FEATURES
CLASS SPECIES
MATRIX
SEQUENCE
METHODCELL Q
FUNCTIONAL ELEMENT
TRANSFAC: relational scheme
Manual annotation of the databases: input client
TRANSFAC: GENE table
TRANSFAC: SITE table
Structure of transcription factors
USF-1, dimer
DNA binding domain
Activation domain
oligomerization domain
Ligand- binding domain
Protein-protein interaction domain
Structure of transcription factors
TRANSFAC: FACTOR table, protein sequence
TRANSFAC: FACTOR table, protein domains
TRANSFAC: FACTOR table, structural and functional features
TRANSFAC: FACTOR table, links to other databases
TRANSFAC: classification of transcription factors
TRANSFAC: CLASS table
TRANSFAC 8.1 (2004-03-31): number of factor entries for different species
human
mouse
rat
other vertebrates
fruit fly
plants
Fungi
Other
0
200
400
600
800
1000
1200
1400
0
100
200
300
400
500
600
700
800
TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5‘ regions of genes.
TRANSFAC: FACTOR table, protein-DNA and protein-protein interactions
TRANSFAC: MATRIX table
TRANSCompel® Professional
Composite elements database
tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt… … . . . . . . .
-96 -79 ST
COMPEL:C00050NF-ATp
AP-1
Mouse Interleukin-2gene promoter
TGAGTCA
AP-1 consensus
Synergistic activation of transcription
Low level of transcription
Low level of transcription
F1
F1
F1
F2
F2
F2
Composite elements
Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expressionand provide cross-coupling of different signal transduction pathways.
N Gene Scheme of CE 1. IgH ** , Mus
musculus
2. IL-2, Homo sapiens
-283 -268 : :
3.
IL-2, Homo sapiens
-167 -142 : :
5.
4. Il-2, Mus musculus
-167 -142 : :
IgH ** ,Homo sapiens
6.
Serum amyloid А1, Rattus norv
-117 -73 : :
7. IRF-1, Mus musculus
-123 -113 -49 -40 : : : :
AP-1Ets
AP-1NFAT
AP-1NF-B
Ets CBF
AP-1 Oct-2
NF-BC/EBP
NF-BSTAT-1
Combinatorial regulation by the composite elements
Ternary complex NFATp - AP1 - DNA
Description of an evidence (experiment, cell type, two individual interactions)
flat files
Link to the TRANSFAC
GENE table
Link to EMBL
Link to the TRANSFAC FACTOR
table
M e m b ran e re ce p tor
S rc
S H 3
S H 2 R a s
R a s
G D P
G T P
A d aptorsP L C
P I3 -K
Phospho ry la tion
IP 3
C a 2+
C a 2+C a2+
Ca2+ dependent cana l
Calc ineurin
E R K
E R K
JN K
JN K
P 3 8M A P K
P 3 8M A P K
N FAT p N FAT p
NFATp
P
P Pc-F o s c-F o s
с-F os
c-Ju n
c-Jun
c-Ju n
c-Ju n
AT F -2 AT F -2
AT F -2
IL -2
P K B /A k t
C om posite e lem ent
cytoplasm
Nucleus
Cross-coupling of signal transduction pathways
Tissue-specific
32
Inducible
44 119
Cell-cycle dependent
1 2
Dev. stage-dependent
3
Ubiquitous constitutive
39 60 2 12
F1 F2
Tissue-specific
Indu-cible
Cell-cycle dep.
Dev. stage-dependent
Ubiquit. constitut.
2
Inducible/inducible
19 CE‘s ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways;
15 CE‘s NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways;
14 CE‘s NF-B / C/EBP NF-B is inducible by IL-1 and TNF-; C/EBP is inducible by IL-6.
Tissue-specific
32
Inducible
44 119
Cell-cycle dependent
1 2
Dev. stage-dependent
3
Ubiquitous constitutive
39 60 2 12
F1 F2
Tissue-specific
Indu-cible
Cell-cycle dep.
Dev. stage-dependent
Ubiquit. constitut.
2
Inducible/constitutive
9 CE‘s ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway;
5 CE‘s Smad / TEF3 Smads are inducible by TGF- signalling.
Tissue-specific
32
Inducible
44 119
Cell-cycle dependent
1 2
Dev. stage-dependent
3
Ubiquitous constitutive
39 60 2 12
F1 F2
Tissue-specific
Indu-cible
Cell-cycle dep.
Dev. stage-dependent
Ubiquit. constitut.
2
Inducible/tissue-restricted
CE‘s Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors;
S S
F F
S S
F F
1 1
11
2 2
22
1)Cooperative binding to DNA and ternary complex formation
SS
F
1 2
2
3)
F1
Sim ultaneous interaction of activation domains w ith the com ponents of the basal complex
Mechanisms of functioning of synergistic composite elements
S S
F F
S S
F F
1 1
11
2 2
22
2)A new protein surface for DNA recognition could be formed
S
F
S
F
1
1
2
2
4) Form ing a new protein surface for in teraction w ith the basal complex
Mechanisms of functioning of synergistic composite elements
F2F1
s1 s2
F1F2
5)Relief of autoinhibition as a result of protein-protein interactions
7)
F1
F2
DNA wrapping around a nucleosome allows transcription factors to in teract
SS 1 2
2
8)
F
HAT com plex
F1
Recruitm ent of a HAT com plex by one of the transcription factors
Mechanisms of functioning of synergistic composite elements
S
SF
F
2
1
2
1
6)DNA bending by one of the transcription factors
HDAC complex
1)HAT com plex
M utually exclusive binding of factor F1(activator) and F2 (repressor)
Mechanisms of functioning of antagonistic composite elements
HDAC complex
HAT complex
2)
Binding of F2 (repressor) results in the conform ational changes of F1 (activator)
Mechanisms of functioning of antagonistic composite elements
TRANSPATH® Professional
Database on signal transduction pathways
TRANSPATH: map of IFN pathway
TRANSPATH®TRANSPATH®
TRANSFAC®TRANSFAC®
Extracellular ligand
Membrane receptor
Adaptor
Second messanger
Kinase(s)
Transcription factor
Target gene
TRANSPATH: molecules
TLR4(h):MyD88(h)TLR4(h):MyD88(h)
complexescomplexes
TLR4(h)TLR4(h) TLR4(m)TLR4(m) TLR5(h)TLR5(h) basicbasic
IL-1/Toll receptor familyIL-1/Toll receptor family
TLRsTLRs
TLR4TLR4 TLR5TLR5
familyfamily
familyfamily
orthologortholog
modified form
modified form
TLR4(h)pTLR4(h)p
TRANSPATH: molecule hierarchy
TLR4a(h)TLR4a(h) TLR4b(m)TLR4b(m)
isoformisoform
TRANSPATH: reactions
•Binding•Phosphorylation•Dephosphoralation•Degradation•Acetylation•Dissociation•Transregulation•Expression•Activation•...
Educts Products
Enzyme
B
C
A
R
Reaction R, catalyzed by catalyst C, converts substance A into substance B.
The elementar reaction step
Smad4
T:TR2p
R2T:
TR2p
:TR1p
R4
S2P:S4
TGFR-II
R1
TGF1
NTP
Smad2
R3
Smad2p
gene
R5
tc
NDP
TGFR-I
Pathway steps:
Pathway steps depict the signaling in a more biochemical way.
In a semantic reaction, just individual key molecules are given.
Semantic: TGF1 TGF-RII TGF-RI Smad2 Smad4 gene
R1 R2 R3 R4 R5
Info about a specific molecule
Parts of a molecule entryParts of a molecule entry
Many synonyms make sure, that you find your protein
Many synonyms make sure, that you find your protein
External database links allow identification of proteins easily
External database links allow identification of proteins easily
Specific molecule (cont.)
Opens data entry of a specific reaction
Opens data entry of a specific reaction
Parts of a molecule entryParts of a molecule entry
Disease information and GO terminology
Disease information and GO terminology
localization of human APP
localization of human APP
Specific reaction of APP(h)
Evaluation of this reaction is based on experimental evidences
Evaluation of this reaction is based on experimental evidences
Part of a reaction entryPart of a reaction entry
Extracellular ligand
Membrane receptor
Adaptor
Second messanger
Kinase(s)
Transcription factor
Target gene
Signal transduction pathways
Connecting path between two molecules
Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)
Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)
Oncostatin M pathway
B-cell antigen receptor pathway
PDGF pathway
Insulin pathway
Overview of a pathway – hand-drawn map
TRANSPATH: number of entries
0
2000
4000
6000
8000
10000
12000
Release Profess ional2.1
Release Profess ional2.4
Release Profess ional3.1
m olecules reactions references
Main tables + NetPro– Molecule 18029 + 7333 – Reaction 20199 + 30316 – Reference 8258 + 9582
Molecules of mammalian origin– Human 2503 3521– Mouse 1653 2025– Rat 810 1224
Prediction26 588 predicted human gene products of which 30.8% (~9000) seem to be
signal transduction relevant (Venter et al., 2001)
=> 28% coverage of predicted proteins in TRANSPATH®
Statistics: TRANSPATH® 5.1 and NetPro 1.1
TRANSFAC® System
From patterns to pathways
The starting point:A set of induced genes from
microarray experiments
Array analysis
The conventional analysis:deduce the gene products
and map them to the network of metabolic pathways
KEGG
biochemical effects
Array analysis
Extension of conventional analysis:
map the induced gene products to the network of regulatory pathways
biological effects
TRANSPATH
Array analysis
Array analysis
Reasoningof experimental findings:
promoter analysis of induced genes connected to network mapping
KEGG
TRANSPATH
Identification ofnew targets
Array analysis
promoter model
TRANSGENOMEdatabase
additionalpredicted genes
extendedpredicted network
Promoter analysis identifies additional target genesand extends the affected network
microarray: set ofinduced genes
indirect hints on causes
retrieval of upstream sequences
promoter analysis
network analysis
new target
TRANSPATH
TRANSFAC
TRANSGENOME
assignment of gene products
modeling of effects
metabolic network mapping
KEGG
regulatory network mapping
TRANSPATH
Array analysis
Causes
Effects
…cis
trans
l
i
l
i
l
ii
ifiI
ifiIibfiIq
1
max
1
min
1
)()(
)()(),()( (1 )
},,,{
)),(4ln(),()(CGTAb
ibfibfiI (2 )
A 9 2 1 0 1 0 0 0 0 1 15 13 13 7C 8 3 1 1 13 3 29 0 22 8 9 1 4 8G 4 2 2 2 15 26 0 29 7 17 3 7 9 8T 8 22 25 26 0 0 0 0 0 3 2 8 3 6
N T T T S G C G C S M D R N
?…
TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC® Professional database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries.
Search for most probable binding sites regulating gene expression
Search for binding sites coinsiding with SNPs
Mouse c-fos promoter (Matrix search for TF binding sites)
1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <=========== E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <=========== E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540
1------------V$AHRARNT_01(0.90) <-----------------V$NF1_Q6(0.85) 2--------V$NMYC_01(0.89) --------->V$AP4_Q5(0.91) 3------>V$USF_Q6(0.89) --------->V$AP4_Q6(0.85) 4------V$USF_C(0.86) ------------...V$YY1_02(0.86) 5 --------->V$AP4_Q5(0.91) 6 --------->V$AP4_Q6(0.86) 7 --------->V$AP4_Q5(0.92) 8 --------->V$AP4_Q6(0.86) 9 --------->V$AP4_Q5(0.86) HS198161_1 ACGCGCAGCAGCAGGCGCAGCACCAGGCGCAGGCCGCGCAGGCGGCGGCAGCGGCCATCT 540 1 ----------------->V$NF1_Q6(0.96) 2 <-----------------V$NF1_Q6(0.90) 3 --------->V$USF_Q6(0.87) 4------->V$YY1_02(0.86) ---------->V$CP2_01(0.88) 5 --------->V$AP4_Q5(0.92) ----------->V$CAAT_01(0.85) 6 --------->V$AP4_Q6(0.85) --------->V$AP4_Q5(0.86)
7 ------...V$CP2_01(0.86) 8 ===========> E2F (0.81) 9 ===========> E2F (0.90)
HS198161_1 CCGTGGGCAGCGGTGGCGCCGGCCTTGGCGCACACCCGGGCCACCAGCCAGGCAGCGCAG 600 1 <---------V$CETS1P54_01(0.89) <--------...V$GATA_C(0.86) 2 ----------------->V$NF1_Q6(0.85) <-------...V$GATA1_02(0.90) 3 --------->V$CETS1P54_01(0.90) <-------...V$GATA1_03(0.92) 4 <--------------------V$R_01(0.88) <-----...V$LMO2COM_02(0.90) 5 <---------------V$AHRARNT_01(0.86) 6 ----------->V$AP2_Q6(0.95) 7---->V$CP2_01(0.86) <-------...V$GATA1_04(0.87)
8 <----...V$CETS1P54_01(0.87) 9 ===========> E2F (0.80)
HS198161_1 GCCAGTCTCCGGACCTGGCGCACCACGCCGCCAGCCCCGCGGCGCTGCAGGGCCAGGTAT 660 1--V$GATA_C(0.86) <---------V$CETS1P54_01(0.89) 2------V$GATA1_02(0.90) --------...V$DELTAEF1_01(0.96) 3------V$GATA1_03(0.92) <---...V$CEBPB_01(0.88) 4---V$LMO2COM_02(0.90) 5 <-----------V$IK2_01(0.92) 6 <---------------V$E47_02(0.87) 7-----V$GATA1_04(0.87) 8-----V$CETS1P54_01(0.87) 9 <--------------V$E47_01(0.86) 10 ---------->V$DELTAEF1_01(0.99) 11 <-----------V$LMO2COM_01(0.94) 12 <-----------V$MYOD_01(0.87) 13 --------->V$MYOD_Q6(0.91) 14 ------->V$USF_C(0.93) HS198161_1 CCAGCCTGTCCCACCTGAACTCCTCGGGCTCGGACTACGGCACCATGTCCTGCTCCACCT 720
Exon 2 sequence of human thyroid transcription factor-1 (TTF-1) gene (HS198161)
(Matrix search for TF binding sites)
Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W-binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF-1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery.
Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66
Enhanceosome
..WRGAAAA.. ..TGASTCA..
8-12 bp
5’ 3’
Recognition method forT-cell specific Composite Elements NFAT/AP-1
NFATp
AP-1
1 2 3 4 5 6 7 8
ACGT
5588
1212
11
20
231
00
260
26000
25010
25100
15524
1 2 3 4 5 6 7 8 9
ACGT
193
169
425
36
42
338
36425
313292
000
47
24401
47000
28
2413
0,7
1,7
2,7
3,7
4,7
5,7
6,7
0,7 1,2 1,7 2,2 2,7 3,2 3,7 4,2 4,7
NFAT/AP-1 (training)Random
NFAT = -log(1-scoreNFAT)
AP-1 = -log(1-scoreAP-1)
Composite score
3.50.88
4.71.47w
APNFAT
APNFATCE
1
10,17
TTTGGCGCGAAA
Selection of motifs with high frequencyin a window
WSGmotif:
window: [ ]
Promoters of cell-cycle genes:
Exon 2 sequences:
. . . . . . . . . . . . .}
}Frequencyof the motifsin the window
. . . . . . . . . . . . .
N Motif () Window (w)1)
NY ff ˆˆ 2) Utility
i
1 MGCG [27,34] 0.0048 / 0.0041 = 1.179 0.80 -0.394 2 TTT [39,41] 0.0112 / 0.0032 = 3.536 0.75 0.9618 3 CGSK [17,38] 0.0851 / 0.0341 = 2.499 0.90 0.5353 4 HKCG [13,16] 0.0675 / 0.0095 = 7.071 0.79 0.5904 5 VDWW [17,46] 0.1233 / 0.0536 = 2.299 0.72 0.223 6 DWTT [21,26] 0.0337 / 0.0000 0.80 0.5036
Positive
characteristics
7 GSDM [3,69] 0.0980 / 0.0559 = 1.754 0.82 0.595 8
VWS
[7,66]
0.1258 / 0.1932 = 0.651
0.91
-0.095 9 HSWY [26,65] 0.0413 / 0.0813 = 0.508 0.79 -0.2297 10 VTV [19,34] 0.0427 / 0.1354 = 0.315 0.71 -0.261 N
egative
characteristics
11 BAY [7,65] 0.0274 / 0.0614 = 0.447 0.78 -0.566 =-5.6767
k
iiii XwfXd
0
),,()(
Motifs found in the local context of E2F sites in promoters of cell cycle-related genes
Score of context:
+1 1000 3000 5000 7000 9000
+1 1000 3000 5000 7000 9000
-1000
-1000
Human uracil DNA-glycosylase (E2F sites)
+ score of context
ttTTTGCCGCGAAAag q=0.92 d=2.8 (known site)
SITEVIDEO systemBuilding of E2F site recognition program (step 2)
SITEVIDEO systemBuilding of E2F site recognition program (step 3)
Composite modules
w
...
Start of transcription
)1(offcutq
)2(offcutq
)(koffcutq
)1( )2( )(k
...
...
...
Kk
kavr
k
wwqC
,1
)()( )(max )()( wq kavr
)1(1s
)2(1s
)(1
ks )(knk
s...
Parameters of the model to be estimated
)2(2s
K - number of TF matrixes
ws
qsqni
ki
ki
koffcut
ki
k
sq
)(
)()( )(,1
)( )(
Composite modules
w
...
Start of transcription
)1(offcutq
)2(offcutq
)(koffcutq
)1( )2( )(k
...
...
...
)1(1s
)2(1s
)(1
ks )(knk
s...
Parameters of the model to be estimated
)2(2s
Genetic Algorithms
Weight: TF matrix
1.000000 0.840072 V$E2F_19
0.954483 0.737637 V$TATA_01
0.888064 0.939687 V$CREB_01
0.816179 0.941583 V$SP1_Q6
0.039746 0.839702 V$TAL1BETAE47_01
No
of
seq
ue
nce
s
0
10
20
30
40
-0,5 0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0
Exon-2 sequences
Cell cycle-related promoters
offcutq
Composite module in promoters of cell cycle-related genes
5,1
)()(
k
koffcut
k qC
Mouse c-fos promoter
Cell cycle composite module
1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <----------- E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <----------- E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540 1----------->V$ER_Q6(0.86) 2--------V$TCF11_01(0.87) 3 --------->V$AP4_Q5(0.91) 4 --------->V$AP4_Q6(0.87) 5 ---------->V$AP1FJ_Q2(0.93) 6 ---------->V$AP1_Q2(0.90) 7 ---------->V$AP1_Q4(0.87) 8 <-----------V$IK2_01(0.94) MMCFOS_1 GCAGTGACCGCGCTCCCACCCAGCTCTGCTCTGCAGCTCC 580
Computationally predicted E2F target genes confirmed by in vivo footprint
Gene EMBL Sequence of the potential sites
Position rel. start of
transcription
Score, q
Score of context,
d
Positions of PCR primers
(-) gcCTTGGCGCGTGTcc -165 .. -176 0.92 -201 -> (-) ggGGTGGCGCGCGGgc -92 .. –103 0.84 2.92 +96 <- (+) ccTCTGGCGCCACCgt -90 .. –79 0.88
c-fos, Hs HSFOS
(-) acGGTGGCGCCAGAgg -78 .. –89 0.83 (+) gcTATCGCGCCAGAga 79 .. 90 0.89 -27 -> (-) tcTCTGGCGCGATAgc 91 .. 80 0.91 +313 <-
JunB, Hs HS207341
(-) ggGCTGGCGCGGGCgg 169 .. 158 0.82 3.17 (+) ctGTTTGCGGGGCGga -513 .. -502 0.80 2.03 -122 -> (+) ccCTTCGCGCCCTGgg -298 .. -287 0.91 +210 <- (+) ctCTTGGCGCGACGct 28 .. 39 0.93 (-) agCGTCGCGCCAAGag 40 .. 29 0.83
tgf-1, Hs
HSTGFB1P
R
(+) ccTTTGCCGCCGGGga 85 .. 96 0.85 (-) ctCTCCGCGCGCGGga -1384 .. -1395 0.81 4.11 -404 -> (-) gtCTTGGCGACCGTtg -1009 .. -1020 0.81 -143 <- (-) ggCCTGGCGCCGGAct -739 .. -750 0.81 (+) tgATTGGCGGATAGag -589 .. -578 0.83
p14ARF, Hs AF082338
(-) acTTTCCCGCCCTGtg -265 .. -276 0.86 (-) gtTTTCGCGGGAAAac -491 .. -502 0.93 3.53 -667 -> (-) ctTTCAGCGCCCGTgc -409 .. -420 0.82 -330 <- (+) gcAGTGGCGCCTCCcg -377 .. -366 0.80 (+) ggCGTGGCGCGGAGcc -175 .. -164 0.83 4.39
Mcm4 (Cdc21), Hs
HSU63630
(+) ctTGTCGCGCAGGTac -93 .. -82 0.86 (+) agTTTCGCGCCAAAtt -187 .. -176 0.99 4.91 -211 -> (-) aaTTTGGCGCGAAAct -175 .. -186 1.00 +88 <- (+) ttTTTCCCGCGAAAct 8 .. 19 0.89 3.01
mcm5 (P1-cdc46), Hs
HS286B10
(-) agTTTCGCGGGAAAaa 20 .. 9 0.93 4.21 (+) aaGCTCGCGCCACTgc -270 .. -259 0.81 -137 -> (-) gcAGTGGCGCGAGCtt -258 .. -269 0.84 +123 <-
Von Hippel-Lindau (VHL), Hs
AF010238
(-) gtCTTCGCGCGCGCtc -28 .. 39 0.92 2.22
(-) gtCCTGGCGCGCGGgc -72 .. –83 0.83 -296 -> B-myb, Hs HSBMYBD
NA (+) cgCTTGGCGGGAGAta -53 .. -42 0.87 1.18 +14 <-
(-) ttTTTGGCGCCGGCtg -297 .. -308 0.97 -407 -> nucleolin, Hs
HSNUCLEO (-) ccGTGGGCGCGCGGgt -256 .. -267 0.81 2.91 -41 <-
(-) cgTTTGGCGCGGCTtg -296 .. -307 0.97 6.67 -538 -> nucleolin, Cg
CSNUCLEO -198 <-
(-) agTTTGGCGCGGCTtg -306 .. -317 0.97 1.76 -531 -> nucleolin, Ms
MMNUCLE
O -232 <-
Chromatin crosslinking
Immunoprecipitation
PCR
G1 G1/S S G2 G1 G1/S S G2
G1/S-cycle
G1/S-growth
a ) R e l a t i v e i m p o r t a n c e
)( k
C u t - o f f v a l u e )( k
offcutq
M a t r i x A C M a t r i x I D
0 . 1 4 1 4 2 0 0 . 9 2 3 0 7 7 M 1 0 0 0 9 V $ E 2 F _ 1 9 0 . 3 8 9 9 4 1 0 . 9 4 7 4 3 4 M 0 0 1 7 5 V $ A P 4 _ Q 5 0 . 9 0 5 3 2 5 0 . 8 3 8 1 0 6 M 0 0 0 8 8 V $ I K 3 _ 0 1 - 0 . 5 9 5 2 5 9 0 . 8 5 6 0 5 5 M 0 0 0 9 8 V $ P A X 2 _ 0 1 - 0 . 9 8 2 5 9 3 0 . 9 9 7 6 3 9 M 0 0 2 5 3 V $ C A P _ 0 1 - 0 . 8 1 4 9 4 3 0 . 7 3 4 6 9 7 M 0 0 1 3 7 V $ O C T 1 _ 0 3 b )
Histogram of G1/S cycle vs. G1/S growth
Site combination score
No of
obs
0
1
2
3
4
5
-1,8 -1,6 -1,4 -1,2 -1,0 -0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6
Results of selection of a specific combinations of sites that distinguish G1/S cycle and G1/S growth promoters. (microarray data)
E2F and a set of additional factors can distinguish these two sets of promoters. AP-4 factors – an ubiquitous factor that have similar structure of DNA binding domains as E2F and Myc – main cell cycle regulators; IK3 (Ik-1...Ik-5 - a family of zink finger TF that play a role in development of the lymphocytes). Pax-2 factor is known to be involved in regulating cell cycle by inhibiting the p53 transcription. It is known that Oct-3 differentially phosphorylated during cell cycle and may have a role in the regulation of the G1/S growth promoters. As for Cup site, it was already speculated that the structure of the basal promoter may play an important role in differentiating gene expression during cell cycle
TGASTCA
AP-1
...
Jun Fos
human TNF promoter
mast cells
T-cells + ?
dendritic cells
T-cells
-107 -74
NFAT
NFATAP-1
NF-kB
C/EBPAP-1
VDR
Fuzzy puzzle hypothesis of the multipurpose structure of the eukaryotic promoters: of coding multiple regulatory messages in the same DNA sequence. A,B,C and D,E,F – two sets of TF; 1,2 – two sites in DNA; BC – basal complex.
A B C
D EF
B C
BC
1
2
1
2
There‘s More Then One Way To Do It
(Convergent evolution)
RefSeq LocusLink symbol synonyms
NM_002421 4312 MMP1 CLG, CN2 matrix metalloproteinase 1 (interstitial collagenase)
NM_004530 4313 MMP2 CLG4, CLG4Amatrix metalloproteinase 2 (gelatinase A, 72kD gelatinase, 72kD type IV collagenase)
NM_000611 966 CD59 MSK21, MIC11, MIN2, MIN1, MIN3CD59 antigen p18-20 (antigen identified by monoclonal antibodies 16.3A5, EJ16, EJ30, EL32 and G344)
NM_001972 1991 ELA2 elastase 2, neutrophilNM_005317 3004 GZMM LMET1, MET1 granzyme M (lymphocyte met-ase 1)NM_005532 3429 IFI27 P27 interferon, alpha-inducible protein 27
NM_001548 3434 IFIT1 GARG-16, IFNAI1, G10P1, IFI56 interferon-induced protein with tetratricopeptide repeats 1NM_000565 3570 IL6R interleukin 6 receptorNM_001565 3627 SCYB10 chemokine (C-X-C motif) ligand 10NM_001572 3665 IRF7 IRF-7A interferon regulatory factor 7NM_005564 3934 LCN2 NGAL lipocalin 2 (oncogene 24p3)NM_005567 3959 LGALS3BP 90K, MAC-2-BP lectin, galactoside-binding, soluble, 3 binding protein
NM_002422 4314 MMP3 STMY, STMY1 matrix metalloproteinase 3 (stromelysin 1, progelatinase) NM_002423 4316 MMP7 MPSL1, PUMP-1 matrix metalloproteinase 7 (matrilysin, uterine)
NM_004994 4318 MMP9 CLG4Bmatrix metalloproteinase 9 (gelatinase B, 92kD gelatinase, 92kD type IV collagenase)
NM_004995 4323 MMP14 MT1-MMP matrix metalloproteinase 14 (membrane-inserted)NM_002428 4324 MMP15 MT2-MMP matrix metalloproteinase 15 (membrane-inserted)NM_002534 4938 OAS1 IFI-4, OIASI, OIAS 2',5'-oligoadenylate synthetase 1 (40-46 kD)
NM_002787 5683 PSMA2 proteasome (prosome, macropain) subunit, alpha type, 2NM_004586 6197 RPS6KA3 ribosomal protein S6 kinase, 90kD, polypeptide 3NM_007315 6772 STAT1 STAT91 signal transducer and activator of transcription 1, 91kD
NM_003254 7076 TIMP1 CLGI, EPO, TIMPtissue inhibitor of metalloproteinase 1 (erythroid potentiating activity, collagenase inhibitor)
NM_003255 7077 TIMP2 tissue inhibitor of metalloproteinase 2
NM_000362 7078 TIMP3 SFDtissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy, pseudoinflammatory)
NM_003684 8569 MKNK1 MNK1 MAP kinase-interacting serine/threonine kinase 1NM_006417 10561 IFI44 p44, MTAP44 interferon-induced protein 44
AXX list of genes
>ELA2 elastase 2, neutrophil; chrom=19p13.3; LocusLink=1991; 15-AUG-2002;length=1200 ggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactga ggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccg tatcacagggccctgggtaaactgaggcaggtgacacagctgcatgtggccggtatcacggggccctggataaacagagg caggcgacacagctgcatgtggccggtatcacggggccctgggtaaactgaggcaggcgaggccacccccatcaagtccc tcaggtctaggtttggcaggtttggcaaaaacacagcaacgctcggttaaatctgaatttcgggtaagtatatcctgggc ctcatttggaagagacttagattaaaaaaaaaacgtcgagaccagcccggccaacacggtgaaaccccgtctctactaaa aatacaaaaaattagccaggcgcagtggctcacgcctgtgatcccagcactctgggaggctgaggcaggcggatcacccg aggtcagatgttcaagaccagcctggccgacagggcgaaacactgtctctactacaaatacaaaaattagccgggagtgg tggcaggtgcctgtaatctcagctattcaggaggctgaggcaggagaatcacttgaacctgggaggcggaggttgccgtg agccgggatcacgccaccgcactccagcctgggcgatagagcaagactctgtctccaaaaaaataaattaaaaaacccac attgattatctgacatttgaatgcgattgtgcatcctgaattttgtctggaggccccacccgagccaatccagcgtcttg tcccccttctcccccttttcatcaacgccctgtgccaggggagaggaagtggagggcgctggccggccgtggggcaatgc aacggcctcccagcacagggctataagaggagccgggcgggcacggaggggcagagaccccggagccccagccccaccat gaccctcggccgccgactcgcgtgtcttttcctcgcctgtgtcctgccggccttgctgctggggggtgagtttttgagtc caacctcccgctgctccctctgtcccgggttctgttcccacctctccatagagggccccaccagtgtgggtccctcatcc >MMP3 matrix metalloproteinase 3 (stromelysin 1, progelatinase); chrom=11q22.3; LocusLink=4314; 15-AUG-2002;length=1200 aaagttttacaaaatgtcttcctctgaatatgtttagagtcttgcattcaagcatttattatacaccaataatgtgagca acactttacttgacaaagaaacagaaaagaaaggaaaggaagaaaacagaagagcatgaagagaaaatttaggatggatt ctgttcttcaacttcaaagcatctgctaatttgaatttagggaggaggggaaaaggttgaaagagaataagacatgtgta gaagacaaggacagagagaatttcagtccggtaagcaatgtaattcatttcagttctacaactatttatggagcagctac gtgggcccatcacccattaataaattggttacagaattaaaaccaacccaaagggaatatacttccttctttttcacaga ccctctttgttctattctgcccatgaggttttcctcctcaagaaccagcaaatccaacgacagtcaatagcaggcattac aaatcagattcagaaaaataaatcaccccttctaaatttcttctagatattatcttttatgttttgagtataattgtata tagtatagactatagctatgtatgtacactttccacttacatcttttatttgcttttataatgtctttcttaaaataaaa ctgcttttagaagttctgcacaattctgatttttaccaagtcaacctacttcttctctcaaaaggacaaacataaattgt ctagtgaattccagtcaatttttccagaagaaaaaaaatgctccagttttctcctctaccaagacaggaagcacttcctg gagattaatcactgtgttgccttgcaaaattgggaaggttgagagaaattagtaaagtaggttgtatcatcctactttga atttggaatgtttggaaatggtcctgctgccatttggatgaaagcaaggatgagtcaagctgcgggtgatccaaacaaac actgtcactctttaaaagctgcgctcccgaggttggacctacaaggaggcaggcaagacagcaaggcatagagacaacat agagctaagtaaagccagtggaaatgaagagtcttccaatcctactgttgctgtgcgtggcagtttgctcagcctatcca ttggatggagctgcaaggggtgaggacaccagcatgaaccttgttcaggtaattaacactaactgacctggccaggtggg >IL6R interleukin 6 receptor; chrom=1; LocusLink=3570; 15-AUG-2002;length=1200 ttctctccttcctttccttccttcccctctatccctccttccctccctccctccctcctcccttccttttctttctttct tttctttttttttttttctttccagacagggtctcactgtcatccaggctggagtagcagcccccaatcacggctcactg taccctggatctcccggactcaagcaattttcccacctcagcttccctagtagctgggactataggtgtgtaccaccaca cccagctaatttttaaatttttttatagaaatgggggtctcactttgttacacaggctggtctagaattcctggactgaa gcaatccacccacccggctctcccaaagtgttggggttacaggcgtgagccactgcccctggtgttagtgtctgtctgtc aagtcaggagggcagccatgaacgttctgatgtctactgagcacgtgtggcccagaccgtgtgtcaggtgtttaggtgcc atccacagaaccttcctaataaccctgggcagcataggctttcttatctctgacagatgaggaaatggagactcagattc tgaaccgaagtcacagacacagtagatggtaggtctaaatggggacccaggtctatctgactgcaaagtccaaaccgttt ccttgcctctgctgcagcctgcgaggagcagctgggcagaaagactgtgcctttacggtggtgagtcttccgatgcccaa gcctcaccccagaccgatgaaatcagaatctctggagacccgacccagacattggtgggttttagggctcctggctgatt
ExtractpromotersusingTRANSGENOME
AXX promoterset
Histogram (tt1.STA 2v*188c)
y = 13 * 0,42348 * normal (x; 1,503956; 0,895746)
VAR1
Pe
rce
nt
of
ob
s
0%5%
10%15%20%25%30%35%40%45%50%55%60%65%70%75%80%85%90%95%
100%
<= ,423 (,423;,847] (,847;1,27] (1,27;1,694] (1,694;2,117] > 2,117
Importance Core cut-off Matr. Cut-off AC Matrix--------------------------------------------- ---------------------------------
0.917751 0.877000 0.930000 M00062 V$IRF1_01
0.323077 1.000000 0.948000 M00339 V$ETS1_B
0.640828 0.989000 0.982000 M00199 V$AP1_C
0.276923 0.840000 0.853000 M00037 V$NFE2_01
1.000000 0.756000 0.760000 M00481 V$AR_01
0.159172 0.869000 0.866000 M00699 V$ICSBP_Q6
Interferon regulatory factor 1
Ets factorsAP-1
NF-E2 – an erythroid-specific factor
Androgen receptor
Interferon Consensus Sequence binding protein
Composite module found in the AXX promoters
Sites in the AXX promoter set: Yes V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0 0.951000 1.742000 Char = 0.78964 ELA2 elastase 2, neutrophil 1 1.941000 0.984000 0.876000 Char = 1.50025 MMP3 matrix metalloproteinase 3 2 0.772000 Char = 0.77200 IL6R interleukin 6 receptor 3 1.681000 Char = 1.68100 MMP2 matrix metalloproteinase 2 4 0.964000 0.856000 0.764000 1.764000 Char = 1.59327 OAS1 2',5'-oligoadenylate synthetase 1 5 1.000000 0.880000 1.644000 Char = 2.52852 MMP1 matrix metalloproteinase 1 6 0.984000 Char = 0.63057 TIMP1 tissue inhibitor of metalloproteinase 1 7 1.860000 0.939000 Char = 1.85648 STAT1 signal transducer and activator of transc 8 1.987000 1.850000 0.812000 Char = 2.59763 MMP9 matrix metalloproteinase 9 9 0.868000 1.548000 Char = 1.78836 MMP15 matrix metalloproteinase 15 10 0.985000 0.862000 1.575000 Char = 2.44492 MMP7 matrix metalloproteinase 7 11 0.780000 Char = 0.78000 MMP14 matrix metalloproteinase 14 12 1.966000 0.853000 Char = 1.49608 CD59 CD59 antigen p18-20 13 Char = 0.00000 LCN2 lipocalin 2 (oncogene 24p3) 14 1.921000 1.715000 Char = 2.33563 GZMM granzyme M (lymphocyte met-ase 1) 15 0.802000 Char = 0.80200 IFI27 interferon, alpha-inducible protein 27 16 0.975000 1.766000 Char = 2.08100 TIMP3 tissue inhibitor of metalloproteinase 3 17 1.866000 1.852000 Char = 2.00731 IFIT1 interferon-induced protein with tetratr 18 1.569000 1.892000 Char = 1.87015 IFI44 interferon-induced protein 44 19 0.760000 Char = 0.76000 MKNK1 MAP kinase-interacting serine/threonine 20 1.886000 0.810000 Char = 2.54087 IRF7 interferon regulatory factor 7 21 0.765000 Char = 0.76500 TIMP2 tissue inhibitor of metalloproteinase 2 22 0.948000 0.873000 Char = 0.54803 LGALS3BP lectin, galactoside-binding, soluble 23 1.892000 0.885000 Char = 1.87725 SCYB10 24 Char = 0.00000 PSMA2
Sites in the other human promoters Not V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0 Char = 0.00000 1 Char = 0.00000 2 Char = 0.00000 3 Char = 0.00000 4 Char = 0.00000 5 Char = 0.00000 6 Char = 0.00000 7 Char = 0.00000 8 Char = 0.00000 9 Char = 0.00000 10 Char = 0.00000 11 Char = 0.00000 12 Char = 0.00000 13 Char = 0.00000 14 Char = 0.00000 15 Char = 0.00000
Insulin pathway
?
InsR
Insulin Part of the insulin signaling network in TRANSPATH
STAT1
Ras
InsR
Signaling network analysis
AhR targetsGene expression Log(Experiment/Control)
-4
-2
0
2
4
6
8
10
log(Experiment/Control)
-4
-2
0
2
4
6
8
10
-4 -2 0 2 4 6 8 10
real expression
pre
dic
ted
ex
pre
ssio
n
S41 distance = 0.417599 D2:0.658627 SIG:0.000000 MIN_LENGTH 3000.000000 3.581248 1.000000 0.933000 M00026 V$AHR_Q5 2.942371 1.000000 0.917000 M00639 V$HNF6_Q6 0.798865 0.844000 0.900000 M00220 V$SREBP1_01 0.409376 0.962000 0.926000 M00173 V$AP1_Q2 0.055716 0.959000 0.989000 M00726 V$USF2_Q6
-1.329975 1.000000 0.959000 M00235 V$AHRARNT_01 -0.713625 1.000000 0.918000 M00156 V$RORA1_01 -0.668375 0.903000 0.854000 M00201 V$CEBP_C
Composite model correlate with theexpression level
TSS
-1000 +1000
V$AHR_Q5
V$AHRARNT_01
0.0983 * V$TCF11MAFG_01(0.821)0.0471 * V$FOXO4_01(0.961)0.0301 * V$IPF1_Q4(0.852)0.0410 * V$AR_01(0.851)0.0766 * V$GR_Q6(0.971)0.0482 * V$STAT1_02(0.995)0.0508 * V$CEBPB_01(0.98)0.0281 * V$STAT5A_02(0.826)
0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908)0.0751 * V$TCF1P_Q6(0.726) -50- V$STAT6_01(0.861)0.0728 * V$SF1_Q6(0.684) -50- V$SMAD3_Q6(0.833)0.0419 * V$ELK1_02(0.862) -50- V$GRE_C(0.842)
Sma1Norm
-0.1 0.0 0.1 0.2 0.3 0.4 0.50
50
100
150
200
250
300
350
400
450
No ofobs
0
5
10
15
20
25
30
35
40
Sma1NormSma1Norm
-0.1 0.0 0.1 0.2 0.3 0.4 0.50
50
100
150
200
250
300
350
400
450
No ofobs
0
5
10
15
20
25
30
35
40
Composite module found in promoters of differentially expressed genes in liver of
growth hormone-deficient mice (Sma1).
differentially
expressed
genes
Non-changed
genes
Results of the ArrayAnalyzer™ search upstream from TFs resulting in identifying: growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules involved in differential expression of the genes in liver of growth hormone-deficient mice (Sma1).
TRANSPATH and tools, ArrayAnalyzer and PathwayBuilder4
At the next step, one can map the transcription factors found at the previous step on the signaling network of the TRANSPATH. If the factors found are parts of the same cascades that have been suggested on the step 1, then probability is increased that those factors are responsible for the coordinated gene regulation.
cytokines, chemokines
membrane receptors
adaptor proteins
PI3K
Calcineurin, Ca2+ binding proteins
NF-ATs
Ras, Raf
ERK, JNK, MAPK
Jun, Fos
NF-AT/Jun:Fos
Groups that are statistically enriched by potential target genesfor Jun:Fos and NFATs (as shown in the table above).
Other groups that contain potential target genes for Jun:Fosand NFATs.
cytokines, chemokines
membrane receptors
adaptor proteins
PI3K
Calcineurin, Ca2+ binding proteins
NF-ATs
Ras, Raf
ERK, JNK, MAPK
Jun, Fos
NF-AT/Jun:Fos
Groups that are statistically enriched by potential target genesfor Jun:Fos and NFATs (as shown in the table above).
Other groups that contain potential target genes for Jun:Fosand NFATs.
Feedback loops in activating immune cells through Feedback loops in activating immune cells through
NF-AT/AP-1NF-AT/AP-1
Network Network controlling S controlling S
phase entry in phase entry in response to a response to a proliferative proliferative
signalsignalErk-1 JNK
c-myc
cdc2 cycE cycD1 cdk4 cycD3
e2f-1
rb1
B-myb c-fosc-ets c-jun
_
+
++ ++ +
+
+
+
+ + + +
+
+
+
+
+
+
+
+
+ +
+ +
+
+
+
+
c-Myc B-Myb c-Fosc-Ets c-Jun
cycEcdc2
cycEcdk2
cycD1cdk4
cycD3
cdk4
pRB pRB
erk-1
c-rashtf9a
MEK
RafRas
RanRanBP1
_
p
p
?
ada, odc, ts
Nucleolines
cdc21, cdc46, p1 co-factor
Histones: H1, H2B-143,H3-143
+ +
+
E2F-1
DP-1
Enzymes of nucleotidemetabolism: dhfr, tk, cad
Factors and enzymes of replicationDNA pol , cdc6, ori1
S-phase entry
1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) MMNUCLEO TCTCCCCAC-CACACCAGGAAGTCACCTCTCTCA----------ACCTG---GAGTTATA 225 1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) RNNUCIA1 TCTCCCACCACACACCAGGAAGTCACCTCTCTGA----------ACCTG---GAGTTATA 221 1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) CSNUCLEO CCTCC-AGCACACACCAGGAAGTCACCTCTCCGAGACCGTCCCCATCAG---GAGTTAAA 229 1 <===============V$TH1E47_01(0.85) HSNUCLEO TGGCCCTGT-GAGGCCAGAAAGTTACTTCTCCGAGGCCAGTTCCCCATGTCTGAGAAATA 229 ** * **** **** ** **** * * *** * * ============================================================================= 1 <==========V$DELTAEF1_01(0.82) MMNUCLEO CCTACCG-CGAGAGGTCACCGACATTACATGGATCGCTTGTGCACTGCTCGTA--CACAC 282 1 <======== ==V$DELTAEF1_01(0.87) RNNUCIA1 CCTACCG-CGTGAGGTCA--GAGATTAAATGGACTGTTTGTGCACTGCTCACA--CACAC 276 1 <======== ==V$DELTAEF1_01(0.84) CSNUCLEO TCTACCG-CGCGAGGTTG--GACATTAAGCGAGCTGTTTGAGCACTGCACACAGGCGCGC 286 1 <========= =V$DELTAEF1_01(0.84) HSNUCLEO TCTCCCAACTTGAGGTTCT-GTGGGGTAGGGGAGGGTTCGTGACTTTCTCACAGAAAACC 288 ** ** * ***** * * * * * * * * * * * ============================================================================= 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87) MMNUCLEO ACACACGCAC------------AACTGCTTTTATTAGGAGCT----CTCAGGAAAGCGGG 326 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87) RNNUCIA1 ACACACGCGCGCGCGCGCGCGAAATTGCTTTTATTAGGAGCT----CTCAGGAAAGTGGT 332 1 =======>V$NKX25_02(0.82) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.84) CSNUCLEO ACACACGCACGC----------AACTGCCTTTATTGGGAGCTGTCTCTCAGGAGAACAGC 336 1 <=======V$NKX25_02(0.83) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.86) HSNUCLEO TCGTACAGACCC-------CGCCACTGCCTTTATTAACAGCT----CTCAGGAGACTGCC 337 * ** * * *** ****** **** ******* * ============================================================================= MMNUCLEO GACTCGCATCA---TAGCCAAG----AAGCCGTTCGCGAC-TCCGCGGAGAACAGGCCGA 378 RNNUCIA1 GGCTCGCATCAGGCTACCACAGCC--AAGAGGACCGCCACCTCTACCGAGGGCAGGCCAA 390 CSNUCLEO GGCCCGCGGCGCAACACTAGAGCCCCGGGATGTTCTCGGC-TCTGCCGAGGGCAG-CCGA 394 HSNUCLEO TGCAGGAGGGGGGTCGCTCCGGCC---CCATGCTCGCGGG-CAAGCAGGGATAAG--CTG 391 * * * * * * * * * ** *
============================================================================= MMNUCLEO GGCCCGCTCATCAGCCCGAGGGAACCCTAGG--CC------TTCCGGCGTTCT------- 423 RNNUCIA1 GGCCCACTAAACGGCCCGAATGAACTCTAGG--CC------TTCCGGCGCTCT------- 435 CSNUCLEO GGCC-GCGAGCTGGCCCCAGTGG-CTCTAGG--CCCTCAACTTCCGGCGCTCTCCGGCTC 450 HSNUCLEO TGCCTCCAAAAGGGCCAACGGGAACTCCGCGGTCCCTGAACTTCCGGTGCTGGAGG---A 448 *** * *** * * * * ** ****** * * ============================================================================= MMNUCLEO -TCAGCAGGACCACGCGGCG---------------------------------------- 442 RNNUCIA1 -CCAGCTCTTCAGCGCGGCGAACGTTCTAGGCCCCTGAGAAGTCCACCGGGAGGCGCAGG 494 CSNUCLEO CTCAGCGGGAACGCGCGGCGAGCAGTTGAGGCCGCCGCGGATTCCAACGGGTTGGGGACG 510 HSNUCLEO CTCCTCGCTCCAGGGCCACCAGGAGCCGCGGC---------------------GTGAGTG 487 * * ** * ============================================================================= MMNUCLEO --------------GGGGGAAA-----GCACCGAGAAACGCCCAGACCACCTGAGCATCG 483 RNNUCIA1 TTTCCGCTACGCGAGGGGGAAA-----TCCCCGAGAAATGCCCAGACCACCTAAGCACAG 549 CSNUCLEO TTCGC----AGCGCGGGGGATGCTCGGGCCACCCACCACCCCCCCACCCCCCCGGCCACG 566 HSNUCLEO CGTGCCGGAACCGAGGGCGGGG-----TCTCTGAGGAACTCCAAGGCTGCCCAAGCCTAC 542 *** * * * ** * ** ** ============================================================================= MMNUCLEO CCGCCC--------ATGCTGCCTCGGAACACCTGAGGGAATCCGGGCCACGCCGCCACCT 535 RNNUCIA1 ACGTCC--------ATGCGGCGTACGGATACCTGAGGGAATCCGGGCCATACCGCCACCT 601 CSNUCLEO AGGCCCGGAGCTCCAGGTAGCAGTGCAGCACTAGGCGGCGTCCGGGCCACGCCGCCCAAT 626 HSNUCLEO GGACCC---------AGCCACATTGGCGAACC----GGAGACCGCCCGATTCCACCACC- 588 ** * * ** ** *** * * ** ** ============================================================================= 1 <=======V$E2F_02(1.00) MMNUCLEO ACCCGCG--CCTCACACACAAGCCGCGCCAAACTCGCCCGTCCCACTGCGCAGGCGTGGG 593 1 <=======V$E2F_02(1.00) RNNUCIA1 ACTCGCG--CCTCACTC--AAGCCGCGCCAAACTCGCGCGTTTCACTGCGCAGGCGTGTA 657 1 <=======V$E2F_02(1.00) CSNUCLEO TCCCCCGAGCCCCTTCCACAAGCCGCGCCAAACGGGTCTG---CACCGCGCAGGCG--GC 681 1 <=======V$E2F_02(1.00) HSNUCLEO -CCCGCGCTCCCCTCAC--AGCCGGCGCCAAAAACGCCAGTCCCACGACGCAGGC----- 640 * * ** ** * * * * ******** * * *** *******
Phylogenetic footprint of promoter regions of nucleolin genes
HSNUCLEO - Homo sapiens;CSNUCLEO - Cricetulus griseus;MMNUCLEO - Mus musculus;RNNUCIA1 – Rattus norvegicus
TFBS identification via pattern search
A
T
G
C
A
T
G
C
A
T
G
C
A
T
G
C
1) 2) 3)
0,65 0,7 0,75 0,8 0,85 0,9 0,95
Kernel
MEME
CONSENSUSGIBBS
0,000
0,200
0,400
0,600
0,800
1,000 Kernel
MEME
CONSENSUS
GIBBS
Table 1. Comparison of 3 programs performing the best for the low levels of value.
Kernel MULTIPROFILER PROJECTION 0,65 0,205 0,208 0,260 0,7 0,165 0,255 0,304
Result of comparison of four different pattern discovery programs on the sets of simulated sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of squared differences between reveled matrix and the original one; x-axis: values, that are the probabilities of “consensus nucleotide” in each position of the matrix.
Gradual evolutionby fixation of multiple substitutions (Protein functional centres)
Edited bipolymerby fixation of a small number of substitutions (Protein folding)
Evolution at onceby fixation of single substitutions(Regulatory regions of eukaryoticgenes)
Three mechanisms of biopolymer evolution
Thank you !
www.biobase.de