From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany [email protected]

From patterns to pathways. Causal analysis of gene expression data

Alexander Kel

BIOBASE GmbH

Halchtersche Strasse 33D-38304 Wolfenbuettel

Germany

[email protected] www.biobase.de

mailto:[email protected]



http://www.biobase.de/



TRANSCompel

TRANSFAC

TRANSPATH

Patho DBS/MARt DB

- mechanistic- semantic

Match Patch

Catch

Pathway builder Array analyser

Cytomer TRANSGenome TRANSPLORER

CMFinder

The TRANSFAC® System comprises 7 databases:

TRANSFAC® Professional Suite

TRANSFAC® Professional

Transcription factor database

TRANSCompel® Professional

Composite elements database

PathoDB® Professional

Pathologically altered transcription factors

TRANSPRO™Professional

Collection of human promoter sequences

S/MARt DB™Professional

Scaffold or Matrix Attached Regions databases

Cytomer® Ontology of cells, structures, organs

TRANSPATH® Professional

Signal transduction pathways

TRANSFAC® Professional

Transcription factor database

…cis

trans

Human genes Sequences and positions of AP-1 binding sites glutathione P-

transferase

enhancer at -2500

hemoglobin, epsilon

-80 н.п.

Akt-2

-100 н.п.

IFN-

-89 н.п.

Apo АII

-792 н.п.

Melanotransferin

-2013 н.п.

Collagenase

-72 н.п.

proto-oncogene

c-myc

-335 н.п.

porphobilinogen deaminase

-162 н.п.

GM-CSF

enhancer at -3500

TGAСTTT

TGACATC

TGTCACC

TGACTCA

TGAGTCA

TGAGTCA

TGATTTA

TGACTCA

TGACTCA

ST

GM-CSF Homo sapiens

+1

T-cell specific inducible enhancer at –3500 bp Promoter

TATTT

-54

AP-1

NFAT

CE

NF-Bp50/p65

-88

AP-1

NFAT

CE

AP-1

NFAT

CE

AP-1

NFAT

AP-1

NFAT

CE

NF-Bc-Rel/p65

HMG Y(I)

-114

CD28 response element

CBF CBF

Structure of regulatory regions of eukaryotic genes

T F IIA T F IIE

T F IIH

T F 1

S 1S 2 S 3

T F IIF

R N A p o l II

T F IID

H isto n e a c e ty la seT F IIB

T F 2 T F 3

Protein-DNA and protein-protein interactions in gene transcriptional regulation.

Transcription factors

Sequence-specific DNA binding

Non-DNA binding

TF1 TF2 TF3 TF4

adapter

Co-activator

HAT

DNA

Layer I

Layer III

Layer II

interactingfactor

coding regionregulatory region

gene

expression

SITE

FACTOR

GENE

SYNONYMS

FEATURES

CLASS SPECIES

MATRIX

SEQUENCE

METHODCELL Q

FUNCTIONAL ELEMENT

TRANSFAC: relational scheme

Manual annotation of the databases: input client

TRANSFAC: GENE table

TRANSFAC: SITE table

Structure of transcription factors

USF-1, dimer

DNA binding domain

Activation domain

oligomerization domain

Ligand- binding domain

Protein-protein interaction domain

Structure of transcription factors

TRANSFAC: FACTOR table, protein sequence

TRANSFAC: FACTOR table, protein domains

TRANSFAC: FACTOR table, structural and functional features

TRANSFAC: FACTOR table, links to other databases

TRANSFAC: classification of transcription factors

TRANSFAC: CLASS table

TRANSFAC 8.1 (2004-03-31): number of factor entries for different species

human

mouse

rat

other vertebrates

fruit fly

plants

Fungi

Other

0

200

400

600

800

1000

1200

1400

0

100

200

300

400

500

600

700

800

TRANSFAC 8.1 (2004-03-31): distribution of experimentally known TFBS in 5‘ regions of genes.

TRANSFAC: FACTOR table, protein-DNA and protein-protein interactions

TRANSFAC: MATRIX table

TRANSCompel® Professional

Composite elements database

tgccacacaggtagactcttTTGAAAATAtgTGTAATAtgtaaaa catcgtgaca cccccatatt… … . . . . . . .

-96 -79 ST

COMPEL:C00050NF-ATp

AP-1

Mouse Interleukin-2gene promoter

TGAGTCA

AP-1 consensus

Synergistic activation of transcription

Low level of transcription

Low level of transcription

F1

F1

F1

F2

F2

F2

Composite elements

Minimal functional units where both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene expressionand provide cross-coupling of different signal transduction pathways.

N Gene Scheme of CE 1. IgH ** , Mus

musculus

2. IL-2, Homo sapiens

-283 -268 : :

3.

IL-2, Homo sapiens

-167 -142 : :

5.

4. Il-2, Mus musculus

-167 -142 : :

IgH ** ,Homo sapiens

6.

Serum amyloid А1, Rattus norv

-117 -73 : :

7. IRF-1, Mus musculus

-123 -113 -49 -40 : : : :

AP-1Ets

AP-1NFAT

AP-1NF-B

Ets CBF

AP-1 Oct-2

NF-BC/EBP

NF-BSTAT-1

Combinatorial regulation by the composite elements

Ternary complex NFATp - AP1 - DNA

Description of an evidence (experiment, cell type, two individual interactions)

flat files

Link to the TRANSFAC

GENE table

Link to EMBL

Link to the TRANSFAC FACTOR

table

M e m b ran e re ce p tor

S rc

S H 3

S H 2 R a s

R a s

G D P

G T P

A d aptorsP L C

P I3 -K

Phospho ry la tion

IP 3

C a 2+

C a 2+C a2+

Ca2+ dependent cana l

Calc ineurin

E R K

E R K

JN K

JN K

P 3 8M A P K

P 3 8M A P K

N FAT p N FAT p

NFATp

P

P Pc-F o s c-F o s

с-F os

c-Ju n

c-Jun

c-Ju n

c-Ju n

AT F -2 AT F -2

AT F -2

IL -2

P K B /A k t

C om posite e lem ent

cytoplasm

Nucleus

Cross-coupling of signal transduction pathways

Tissue-specific

32

Inducible

44 119

Cell-cycle dependent

1 2

Dev. stage-dependent

3

Ubiquitous constitutive

39 60 2 12

F1 F2

Tissue-specific

Indu-cible

Cell-cycle dep.


Ubiquit. constitut.

2

Inducible/inducible

19 CE‘s ETS / AP-1 providing cross-coupling of Ras/Raf- and PKC-dependent signalling pathways;

15 CE‘s NFATp / AP-1 providing cross-coupling of Ca2+ - and PKC-dependent signalling pathways;

14 CE‘s NF-B / C/EBP NF-B is inducible by IL-1 and TNF-; C/EBP is inducible by IL-6.

Tissue-specific

32

Inducible

44 119


1 2


3


39 60 2 12

F1 F2

Tissue-specific

Indu-cible

Cell-cycle dep.


Ubiquit. constitut.

2

Inducible/constitutive

9 CE‘s ETS / Sp1 ETS factors are inducible through Ras/Raf- dependent signalling pathway;

5 CE‘s Smad / TEF3 Smads are inducible by TGF- signalling.

Tissue-specific

32

Inducible

44 119


1 2


3


39 60 2 12

F1 F2

Tissue-specific

Indu-cible

Cell-cycle dep.


Ubiquit. constitut.

2

Inducible/tissue-restricted

CE‘s Pit-1 / AP-1 Pit1 is pituitary-restricted transcription factor whereas AP-1 and Ets are ubiquitous inducible factors;

S S

F F

S S

F F

1 1

11

2 2

22

1)Cooperative binding to DNA and ternary complex formation

SS

F

1 2

2

3)

F1

Sim ultaneous interaction of activation domains w ith the com ponents of the basal complex

Mechanisms of functioning of synergistic composite elements

S S

F F

S S

F F

1 1

11

2 2

22

2)A new protein surface for DNA recognition could be formed

S

F

S

F

1

1

2

2

4) Form ing a new protein surface for in teraction w ith the basal complex


F2F1

s1 s2

F1F2

5)Relief of autoinhibition as a result of protein-protein interactions

7)

F1

F2

DNA wrapping around a nucleosome allows transcription factors to in teract

SS 1 2

2

8)

F

HAT com plex

F1

Recruitm ent of a HAT com plex by one of the transcription factors


S

SF

F

2

1

2

1

6)DNA bending by one of the transcription factors

HDAC complex

1)HAT com plex

M utually exclusive binding of factor F1(activator) and F2 (repressor)

Mechanisms of functioning of antagonistic composite elements

HDAC complex

HAT complex

2)

Binding of F2 (repressor) results in the conform ational changes of F1 (activator)

Mechanisms of functioning of antagonistic composite elements

TRANSPATH® Professional

Database on signal transduction pathways

TRANSPATH: map of IFN pathway

TRANSPATH®TRANSPATH®

TRANSFAC®TRANSFAC®

Extracellular ligand

Membrane receptor

Adaptor

Second messanger

Kinase(s)

Transcription factor

Target gene

TRANSPATH: molecules

TLR4(h):MyD88(h)TLR4(h):MyD88(h)

complexescomplexes

TLR4(h)TLR4(h) TLR4(m)TLR4(m) TLR5(h)TLR5(h) basicbasic

IL-1/Toll receptor familyIL-1/Toll receptor family

TLRsTLRs

TLR4TLR4 TLR5TLR5

familyfamily

familyfamily

orthologortholog

modified form

modified form

TLR4(h)pTLR4(h)p

TRANSPATH: molecule hierarchy

TLR4a(h)TLR4a(h) TLR4b(m)TLR4b(m)

isoformisoform

TRANSPATH: reactions

•Binding•Phosphorylation•Dephosphoralation•Degradation•Acetylation•Dissociation•Transregulation•Expression•Activation•...

Educts Products

Enzyme

B

C

A

R

Reaction R, catalyzed by catalyst C, converts substance A into substance B.

The elementar reaction step

Smad4

T:TR2p

R2T:

TR2p

:TR1p

R4

S2P:S4

TGFR-II

R1

TGF1

NTP

Smad2

R3

Smad2p

gene

R5

tc

NDP

TGFR-I

Pathway steps:

Pathway steps depict the signaling in a more biochemical way.

In a semantic reaction, just individual key molecules are given.

Semantic: TGF1 TGF-RII TGF-RI Smad2 Smad4 gene

R1 R2 R3 R4 R5

Info about a specific molecule

Parts of a molecule entryParts of a molecule entry

Many synonyms make sure, that you find your protein

Many synonyms make sure, that you find your protein

External database links allow identification of proteins easily

External database links allow identification of proteins easily

Specific molecule (cont.)

Opens data entry of a specific reaction

Opens data entry of a specific reaction

Parts of a molecule entryParts of a molecule entry

Disease information and GO terminology

Disease information and GO terminology

localization of human APP

localization of human APP

Specific reaction of APP(h)

Evaluation of this reaction is based on experimental evidences

Evaluation of this reaction is based on experimental evidences

Part of a reaction entryPart of a reaction entry

Extracellular ligand

Membrane receptor

Adaptor

Second messanger

Kinase(s)

Transcription factor

Target gene

Signal transduction pathways

Connecting path between two molecules

Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)

Connection between one specific molecule (magenta) and a group of molecules (transcription factors in blue)

Oncostatin M pathway

B-cell antigen receptor pathway

PDGF pathway

Insulin pathway

Overview of a pathway – hand-drawn map

TRANSPATH: number of entries

0

2000

4000

6000

8000

10000

12000

Release Profess ional2.1



m olecules reactions references

Main tables + NetPro– Molecule 18029 + 7333 – Reaction 20199 + 30316 – Reference 8258 + 9582

Molecules of mammalian origin– Human 2503 3521– Mouse 1653 2025– Rat 810 1224

Prediction26 588 predicted human gene products of which 30.8% (~9000) seem to be

signal transduction relevant (Venter et al., 2001)

=> 28% coverage of predicted proteins in TRANSPATH®

Statistics: TRANSPATH® 5.1 and NetPro 1.1

TRANSFAC® System

From patterns to pathways

The starting point:A set of induced genes from

microarray experiments

Array analysis

The conventional analysis:deduce the gene products

and map them to the network of metabolic pathways

KEGG

biochemical effects

Array analysis

Extension of conventional analysis:

map the induced gene products to the network of regulatory pathways

biological effects

TRANSPATH

Array analysis

Array analysis

Reasoningof experimental findings:

promoter analysis of induced genes connected to network mapping

KEGG

TRANSPATH

Identification ofnew targets

Array analysis

promoter model

TRANSGENOMEdatabase

additionalpredicted genes

extendedpredicted network

Promoter analysis identifies additional target genesand extends the affected network

microarray: set ofinduced genes

indirect hints on causes

retrieval of upstream sequences

promoter analysis

network analysis

new target

TRANSPATH

TRANSFAC

TRANSGENOME

assignment of gene products

modeling of effects

metabolic network mapping

KEGG

regulatory network mapping

TRANSPATH

Array analysis

Causes

Effects

…cis

trans

l

i

l

i

l

ii

ifiI

ifiIibfiIq

1

max

1

min

1

)()(

)()(),()( (1 )

},,,{

)),(4ln(),()(CGTAb

ibfibfiI (2 )

A 9 2 1 0 1 0 0 0 0 1 15 13 13 7C 8 3 1 1 13 3 29 0 22 8 9 1 4 8G 4 2 2 2 15 26 0 29 7 17 3 7 9 8T 8 22 25 26 0 0 0 0 0 3 2 8 3 6

N T T T S G C G C S M D R N

?…

TRANSPLORER (TRANScription exPLORER) is a software package for the analysis of transcription regulatory sequences. Currently, TRANSPLORER site prediction tool uses position weight matrices (PWM) collections. It is able to use several matrix sources: the largest and most up-to-date library of matrices derived from TRANSFAC® Professional database, other matrix libraries as well as any user-developed matrix libraries. This means that it provides an opportunity to search for a great variety of different transcription factor binding sites. A search can be made using all or subsets of matrices from the libraries.

http://www.biobase.de/pages/products/databases



Search for most probable binding sites regulating gene expression

Search for binding sites coinsiding with SNPs

Mouse c-fos promoter (Matrix search for TF binding sites)

1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <=========== E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <=========== E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540

1------------V$AHRARNT_01(0.90) <-----------------V$NF1_Q6(0.85) 2--------V$NMYC_01(0.89) --------->V$AP4_Q5(0.91) 3------>V$USF_Q6(0.89) --------->V$AP4_Q6(0.85) 4------V$USF_C(0.86) ------------...V$YY1_02(0.86) 5 --------->V$AP4_Q5(0.91) 6 --------->V$AP4_Q6(0.86) 7 --------->V$AP4_Q5(0.92) 8 --------->V$AP4_Q6(0.86) 9 --------->V$AP4_Q5(0.86) HS198161_1 ACGCGCAGCAGCAGGCGCAGCACCAGGCGCAGGCCGCGCAGGCGGCGGCAGCGGCCATCT 540 1 ----------------->V$NF1_Q6(0.96) 2 <-----------------V$NF1_Q6(0.90) 3 --------->V$USF_Q6(0.87) 4------->V$YY1_02(0.86) ---------->V$CP2_01(0.88) 5 --------->V$AP4_Q5(0.92) ----------->V$CAAT_01(0.85) 6 --------->V$AP4_Q6(0.85) --------->V$AP4_Q5(0.86)

7 ------...V$CP2_01(0.86) 8 ===========> E2F (0.81) 9 ===========> E2F (0.90)

HS198161_1 CCGTGGGCAGCGGTGGCGCCGGCCTTGGCGCACACCCGGGCCACCAGCCAGGCAGCGCAG 600 1 <---------V$CETS1P54_01(0.89) <--------...V$GATA_C(0.86) 2 ----------------->V$NF1_Q6(0.85) <-------...V$GATA1_02(0.90) 3 --------->V$CETS1P54_01(0.90) <-------...V$GATA1_03(0.92) 4 <--------------------V$R_01(0.88) <-----...V$LMO2COM_02(0.90) 5 <---------------V$AHRARNT_01(0.86) 6 ----------->V$AP2_Q6(0.95) 7---->V$CP2_01(0.86) <-------...V$GATA1_04(0.87)

8 <----...V$CETS1P54_01(0.87) 9 ===========> E2F (0.80)

HS198161_1 GCCAGTCTCCGGACCTGGCGCACCACGCCGCCAGCCCCGCGGCGCTGCAGGGCCAGGTAT 660 1--V$GATA_C(0.86) <---------V$CETS1P54_01(0.89) 2------V$GATA1_02(0.90) --------...V$DELTAEF1_01(0.96) 3------V$GATA1_03(0.92) <---...V$CEBPB_01(0.88) 4---V$LMO2COM_02(0.90) 5 <-----------V$IK2_01(0.92) 6 <---------------V$E47_02(0.87) 7-----V$GATA1_04(0.87) 8-----V$CETS1P54_01(0.87) 9 <--------------V$E47_01(0.86) 10 ---------->V$DELTAEF1_01(0.99) 11 <-----------V$LMO2COM_01(0.94) 12 <-----------V$MYOD_01(0.87) 13 --------->V$MYOD_Q6(0.91) 14 ------->V$USF_C(0.93) HS198161_1 CCAGCCTGTCCCACCTGAACTCCTCGGGCTCGGACTACGGCACCATGTCCTGCTCCACCT 720

Exon 2 sequence of human thyroid transcription factor-1 (TTF-1) gene (HS198161)

(Matrix search for TF binding sites)

Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W-binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF-1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains (AD), which contact the RNA polymerase II basal transcription machinery.

Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66

Enhanceosome

..WRGAAAA.. ..TGASTCA..

8-12 bp

5’ 3’

Recognition method forT-cell specific Composite Elements NFAT/AP-1

NFATp

AP-1

1 2 3 4 5 6 7 8

ACGT

5588

1212

11

20

231

00

260

26000

25010

25100

15524

1 2 3 4 5 6 7 8 9

ACGT

193

169

425

36

42

338

36425

313292

000

47

24401

47000

28

2413

0,7

1,7

2,7

3,7

4,7

5,7

6,7

0,7 1,2 1,7 2,2 2,7 3,2 3,7 4,2 4,7

NFAT/AP-1 (training)Random

NFAT = -log(1-scoreNFAT)

AP-1 = -log(1-scoreAP-1)

Composite score

3.50.88

4.71.47w

APNFAT

APNFATCE

1

10,17

TTTGGCGCGAAA

Selection of motifs with high frequencyin a window

WSGmotif:

window: [ ]

Promoters of cell-cycle genes:

Exon 2 sequences:

. . . . . . . . . . . . .}

}Frequencyof the motifsin the window

. . . . . . . . . . . . .

N Motif () Window (w)1)

NY ff ˆˆ 2) Utility

i

1 MGCG [27,34] 0.0048 / 0.0041 = 1.179 0.80 -0.394 2 TTT [39,41] 0.0112 / 0.0032 = 3.536 0.75 0.9618 3 CGSK [17,38] 0.0851 / 0.0341 = 2.499 0.90 0.5353 4 HKCG [13,16] 0.0675 / 0.0095 = 7.071 0.79 0.5904 5 VDWW [17,46] 0.1233 / 0.0536 = 2.299 0.72 0.223 6 DWTT [21,26] 0.0337 / 0.0000 0.80 0.5036

Positive

characteristics

7 GSDM [3,69] 0.0980 / 0.0559 = 1.754 0.82 0.595 8

VWS

[7,66]

0.1258 / 0.1932 = 0.651

0.91

-0.095 9 HSWY [26,65] 0.0413 / 0.0813 = 0.508 0.79 -0.2297 10 VTV [19,34] 0.0427 / 0.1354 = 0.315 0.71 -0.261 N

egative

characteristics

11 BAY [7,65] 0.0274 / 0.0614 = 0.447 0.78 -0.566 =-5.6767

k

iiii XwfXd

0

),,()(

Motifs found in the local context of E2F sites in promoters of cell cycle-related genes

Score of context:

+1 1000 3000 5000 7000 9000

+1 1000 3000 5000 7000 9000

-1000

-1000

Human uracil DNA-glycosylase (E2F sites)

+ score of context

ttTTTGCCGCGAAAag q=0.92 d=2.8 (known site)

SITEVIDEO systemBuilding of E2F site recognition program (step 2)

SITEVIDEO systemBuilding of E2F site recognition program (step 3)

Composite modules

w

...

Start of transcription

)1(offcutq

)2(offcutq

)(koffcutq

)1( )2( )(k

...

...

...

Kk

kavr

k

wwqC

,1

)()( )(max )()( wq kavr

)1(1s

)2(1s

)(1

ks )(knk

s...

Parameters of the model to be estimated

)2(2s

K - number of TF matrixes

ws

qsqni

ki

ki

koffcut

ki

k

sq

)(

)()( )(,1

)( )(

Composite modules

w

...

Start of transcription

)1(offcutq

)2(offcutq

)(koffcutq

)1( )2( )(k

...

...

...

)1(1s

)2(1s

)(1

ks )(knk

s...

Parameters of the model to be estimated

)2(2s

Genetic Algorithms

Weight: TF matrix

1.000000 0.840072 V$E2F_19

0.954483 0.737637 V$TATA_01

0.888064 0.939687 V$CREB_01

0.816179 0.941583 V$SP1_Q6

0.039746 0.839702 V$TAL1BETAE47_01

No

of

seq

ue

nce

s

0

10

20

30

40

-0,5 0,0 0,5 1,0 1,5 2,0 2,5 3,0 3,5 4,0

Exon-2 sequences

Cell cycle-related promoters

offcutq

Composite module in promoters of cell cycle-related genes

5,1

)()(

k

koffcut

k qC

Mouse c-fos promoter

Cell cycle composite module

1 <------------V$IK1_01(0.86) -----...V$CREBP1CJUN_01(0.85) 2 <-----------V$IK2_01(0.90) -----...V$CREB_01(0.96) 3 ----------->V$AP2_Q6(0.87) <-------------V$GKLF_01(0.87) 4-->V$ATF_01(0.89) <-------V$MZF1_01(0.99) ----...V$ELK1_01(0.87) 5 <-----------V$AP2_Q6(0.92) <------------V$SP1_Q6(0.88) 6>V$AP1FJ_Q2(0.89) <-------------V$GKLF_01(0.85) 7>V$AP1_Q2(0.87) <-------------V$GKLF_01(0.86) 8->V$CREB_Q2(0.86) <---------V$CETS1P54_01(0.90) 9->V$CREB_Q4(0.90) <---------V$NRF2_01(0.90) 10 <-------------V$GC_01(0.88) 11 ----------->V$CAAT_01(0.87) 12 <------------V$TCF11_01(0.87) 13 ----------->V$AP2_Q6(0.87) 14 <---------V$USF_Q6(0.93) 16 --------...V$ATF_01(0.94) 17 -------...V$AP1FJ_Q2(0.95) 20 -------...V$CREBP1_Q2(0.93) 21 -------...V$CREB_Q2(0.95) 23 ---...V$IK2_01(0.85) MMCFOS_1 GAGCGCCCGCAGAGGGCCTTGGGGCGCGCTTCCCCCCCCTTCCAGTTCCGCCCAGTGACG 420 1-->V$CREBP1CJUN_01(0.85) -------------->V$BARBIE_01(0.86) 2-->V$CREB_01(0.96) -------------->V$TATA_01(0.95) 3 ----------->V$CAAT_01(0.91) --------->V$AP4_Q5(0.95) 4----------->V$ELK1_01(0.87) --------------------->V$HEN1_01(0.87) 5 --------->V$AP4_Q5(0.88) <---...V$CMYB_01(0.93) 6 <---------V$CDPCR3HD_01(0.93) --...V$VMYB_02(0.89) 7 <--------------V$TATA_01(0.88) 8 --------------------->V$HEN1_02(0.87) 9 <---------------------V$HEN1_02(0.86) 10 <-----------------V$AP4_01(0.88) 11 ----------->V$LMO2COM_01(0.93) 12 <-----------V$LMO2COM_01(0.93) 13 <-----------V$MYOD_01(0.88) 17--->V$AP1FJ_Q2(0.95) <---------V$AP4_Q6(0.99) 20---->V$CREBP1_Q2(0.93) <---------V$MYOD_Q6(0.96) 21---->V$CREB_Q2(0.95) Transcription start 23-------->V$IK2_01(0.85) 24 <----------- E2F (0.80) MMCFOS_1 TAGGAAGTCCATCCATTCACAGCGCTTCTATAAAGGCGCCAGCTGAGGCGCCTACTACTC 480 1 <-----------------V$CMYB_01(0.91) -------...V$ER_Q6(0.86) 2 <-----------V$LMO2COM_01(0.90) <----...V$TCF11_01(0.87) 3 --------->V$MYOD_Q6(0.90) -------->V$STAT_01(0.93) 4 --------->V$VMYB_01(0.89) <--------V$STAT_01(0.89) 5--------------V$CMYB_01(0.93) -------->V$LMO2COM_02(0.93) 6------>V$VMYB_02(0.89) <-----------V$CAAT_01(0.85) 7 -------->V$VMYB_02(0.88) 8 -------------->V$EVI1_04(0.86) 9 ------------->V$GATA1_02(0.93) 12 <------------V$ZID_01(0.85) 13 <----------V$CP2_01(0.97) 14 ---------->V$GATA_C(0.92) 15 ----------------->V$CMYB_01(0.86) 16 --------->V$CREL_01(0.91) 24 <----------- E2F (0.82) MMCFOS_1 CAACCGCGACTGCAGCGAGCAACTGAGAAGACTGGATAGAGCCGGCGGTTCCGCGAACGA 540 1----------->V$ER_Q6(0.86) 2--------V$TCF11_01(0.87) 3 --------->V$AP4_Q5(0.91) 4 --------->V$AP4_Q6(0.87) 5 ---------->V$AP1FJ_Q2(0.93) 6 ---------->V$AP1_Q2(0.90) 7 ---------->V$AP1_Q4(0.87) 8 <-----------V$IK2_01(0.94) MMCFOS_1 GCAGTGACCGCGCTCCCACCCAGCTCTGCTCTGCAGCTCC 580

Computationally predicted E2F target genes confirmed by in vivo footprint

Gene EMBL Sequence of the potential sites

Position rel. start of

transcription

Score, q

Score of context,

d

Positions of PCR primers

(-) gcCTTGGCGCGTGTcc -165 .. -176 0.92 -201 -> (-) ggGGTGGCGCGCGGgc -92 .. –103 0.84 2.92 +96 <- (+) ccTCTGGCGCCACCgt -90 .. –79 0.88

c-fos, Hs HSFOS

(-) acGGTGGCGCCAGAgg -78 .. –89 0.83 (+) gcTATCGCGCCAGAga 79 .. 90 0.89 -27 -> (-) tcTCTGGCGCGATAgc 91 .. 80 0.91 +313 <-

JunB, Hs HS207341

(-) ggGCTGGCGCGGGCgg 169 .. 158 0.82 3.17 (+) ctGTTTGCGGGGCGga -513 .. -502 0.80 2.03 -122 -> (+) ccCTTCGCGCCCTGgg -298 .. -287 0.91 +210 <- (+) ctCTTGGCGCGACGct 28 .. 39 0.93 (-) agCGTCGCGCCAAGag 40 .. 29 0.83

tgf-1, Hs

HSTGFB1P

R

(+) ccTTTGCCGCCGGGga 85 .. 96 0.85 (-) ctCTCCGCGCGCGGga -1384 .. -1395 0.81 4.11 -404 -> (-) gtCTTGGCGACCGTtg -1009 .. -1020 0.81 -143 <- (-) ggCCTGGCGCCGGAct -739 .. -750 0.81 (+) tgATTGGCGGATAGag -589 .. -578 0.83

p14ARF, Hs AF082338

(-) acTTTCCCGCCCTGtg -265 .. -276 0.86 (-) gtTTTCGCGGGAAAac -491 .. -502 0.93 3.53 -667 -> (-) ctTTCAGCGCCCGTgc -409 .. -420 0.82 -330 <- (+) gcAGTGGCGCCTCCcg -377 .. -366 0.80 (+) ggCGTGGCGCGGAGcc -175 .. -164 0.83 4.39

Mcm4 (Cdc21), Hs

HSU63630

(+) ctTGTCGCGCAGGTac -93 .. -82 0.86 (+) agTTTCGCGCCAAAtt -187 .. -176 0.99 4.91 -211 -> (-) aaTTTGGCGCGAAAct -175 .. -186 1.00 +88 <- (+) ttTTTCCCGCGAAAct 8 .. 19 0.89 3.01

mcm5 (P1-cdc46), Hs

HS286B10

(-) agTTTCGCGGGAAAaa 20 .. 9 0.93 4.21 (+) aaGCTCGCGCCACTgc -270 .. -259 0.81 -137 -> (-) gcAGTGGCGCGAGCtt -258 .. -269 0.84 +123 <-

Von Hippel-Lindau (VHL), Hs

AF010238

(-) gtCTTCGCGCGCGCtc -28 .. 39 0.92 2.22

(-) gtCCTGGCGCGCGGgc -72 .. –83 0.83 -296 -> B-myb, Hs HSBMYBD

NA (+) cgCTTGGCGGGAGAta -53 .. -42 0.87 1.18 +14 <-

(-) ttTTTGGCGCCGGCtg -297 .. -308 0.97 -407 -> nucleolin, Hs

HSNUCLEO (-) ccGTGGGCGCGCGGgt -256 .. -267 0.81 2.91 -41 <-

(-) cgTTTGGCGCGGCTtg -296 .. -307 0.97 6.67 -538 -> nucleolin, Cg

CSNUCLEO -198 <-

(-) agTTTGGCGCGGCTtg -306 .. -317 0.97 1.76 -531 -> nucleolin, Ms

MMNUCLE

O -232 <-

Chromatin crosslinking

Immunoprecipitation

PCR

G1 G1/S S G2 G1 G1/S S G2

G1/S-cycle

G1/S-growth

a ) R e l a t i v e i m p o r t a n c e

)( k

C u t - o f f v a l u e )( k

offcutq

M a t r i x A C M a t r i x I D

0 . 1 4 1 4 2 0 0 . 9 2 3 0 7 7 M 1 0 0 0 9 V $ E 2 F _ 1 9 0 . 3 8 9 9 4 1 0 . 9 4 7 4 3 4 M 0 0 1 7 5 V $ A P 4 _ Q 5 0 . 9 0 5 3 2 5 0 . 8 3 8 1 0 6 M 0 0 0 8 8 V $ I K 3 _ 0 1 - 0 . 5 9 5 2 5 9 0 . 8 5 6 0 5 5 M 0 0 0 9 8 V $ P A X 2 _ 0 1 - 0 . 9 8 2 5 9 3 0 . 9 9 7 6 3 9 M 0 0 2 5 3 V $ C A P _ 0 1 - 0 . 8 1 4 9 4 3 0 . 7 3 4 6 9 7 M 0 0 1 3 7 V $ O C T 1 _ 0 3 b )

Histogram of G1/S cycle vs. G1/S growth

Site combination score

No of

obs

0

1

2

3

4

5

-1,8 -1,6 -1,4 -1,2 -1,0 -0,8 -0,6 -0,4 -0,2 0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6

Results of selection of a specific combinations of sites that distinguish G1/S cycle and G1/S growth promoters. (microarray data)

E2F and a set of additional factors can distinguish these two sets of promoters. AP-4 factors – an ubiquitous factor that have similar structure of DNA binding domains as E2F and Myc – main cell cycle regulators; IK3 (Ik-1...Ik-5 - a family of zink finger TF that play a role in development of the lymphocytes). Pax-2 factor is known to be involved in regulating cell cycle by inhibiting the p53 transcription. It is known that Oct-3 differentially phosphorylated during cell cycle and may have a role in the regulation of the G1/S growth promoters. As for Cup site, it was already speculated that the structure of the basal promoter may play an important role in differentiating gene expression during cell cycle

TGASTCA

AP-1

...

Jun Fos

human TNF promoter

mast cells

T-cells + ?

dendritic cells

T-cells

-107 -74

NFAT

NFATAP-1

NF-kB

C/EBPAP-1

VDR

Fuzzy puzzle hypothesis of the multipurpose structure of the eukaryotic promoters: of coding multiple regulatory messages in the same DNA sequence. A,B,C and D,E,F – two sets of TF; 1,2 – two sites in DNA; BC – basal complex.

A B C

D EF

B C

BC

1

2

1

2

There‘s More Then One Way To Do It

(Convergent evolution)

RefSeq LocusLink symbol synonyms

NM_002421 4312 MMP1 CLG, CN2 matrix metalloproteinase 1 (interstitial collagenase)

NM_004530 4313 MMP2 CLG4, CLG4Amatrix metalloproteinase 2 (gelatinase A, 72kD gelatinase, 72kD type IV collagenase)

NM_000611 966 CD59 MSK21, MIC11, MIN2, MIN1, MIN3CD59 antigen p18-20 (antigen identified by monoclonal antibodies 16.3A5, EJ16, EJ30, EL32 and G344)

NM_001972 1991 ELA2 elastase 2, neutrophilNM_005317 3004 GZMM LMET1, MET1 granzyme M (lymphocyte met-ase 1)NM_005532 3429 IFI27 P27 interferon, alpha-inducible protein 27

NM_001548 3434 IFIT1 GARG-16, IFNAI1, G10P1, IFI56 interferon-induced protein with tetratricopeptide repeats 1NM_000565 3570 IL6R interleukin 6 receptorNM_001565 3627 SCYB10 chemokine (C-X-C motif) ligand 10NM_001572 3665 IRF7 IRF-7A interferon regulatory factor 7NM_005564 3934 LCN2 NGAL lipocalin 2 (oncogene 24p3)NM_005567 3959 LGALS3BP 90K, MAC-2-BP lectin, galactoside-binding, soluble, 3 binding protein

NM_002422 4314 MMP3 STMY, STMY1 matrix metalloproteinase 3 (stromelysin 1, progelatinase) NM_002423 4316 MMP7 MPSL1, PUMP-1 matrix metalloproteinase 7 (matrilysin, uterine)

NM_004994 4318 MMP9 CLG4Bmatrix metalloproteinase 9 (gelatinase B, 92kD gelatinase, 92kD type IV collagenase)

NM_004995 4323 MMP14 MT1-MMP matrix metalloproteinase 14 (membrane-inserted)NM_002428 4324 MMP15 MT2-MMP matrix metalloproteinase 15 (membrane-inserted)NM_002534 4938 OAS1 IFI-4, OIASI, OIAS 2',5'-oligoadenylate synthetase 1 (40-46 kD)

NM_002787 5683 PSMA2 proteasome (prosome, macropain) subunit, alpha type, 2NM_004586 6197 RPS6KA3 ribosomal protein S6 kinase, 90kD, polypeptide 3NM_007315 6772 STAT1 STAT91 signal transducer and activator of transcription 1, 91kD

NM_003254 7076 TIMP1 CLGI, EPO, TIMPtissue inhibitor of metalloproteinase 1 (erythroid potentiating activity, collagenase inhibitor)

NM_003255 7077 TIMP2 tissue inhibitor of metalloproteinase 2

NM_000362 7078 TIMP3 SFDtissue inhibitor of metalloproteinase 3 (Sorsby fundus dystrophy, pseudoinflammatory)

NM_003684 8569 MKNK1 MNK1 MAP kinase-interacting serine/threonine kinase 1NM_006417 10561 IFI44 p44, MTAP44 interferon-induced protein 44

AXX list of genes

>ELA2 elastase 2, neutrophil; chrom=19p13.3; LocusLink=1991; 15-AUG-2002;length=1200 ggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactga ggcaggcgacacagctgcatgtggccggtatcacagggccctgggtaaactgaggcaggcgacacagctgcatgtggccg tatcacagggccctgggtaaactgaggcaggtgacacagctgcatgtggccggtatcacggggccctggataaacagagg caggcgacacagctgcatgtggccggtatcacggggccctgggtaaactgaggcaggcgaggccacccccatcaagtccc tcaggtctaggtttggcaggtttggcaaaaacacagcaacgctcggttaaatctgaatttcgggtaagtatatcctgggc ctcatttggaagagacttagattaaaaaaaaaacgtcgagaccagcccggccaacacggtgaaaccccgtctctactaaa aatacaaaaaattagccaggcgcagtggctcacgcctgtgatcccagcactctgggaggctgaggcaggcggatcacccg aggtcagatgttcaagaccagcctggccgacagggcgaaacactgtctctactacaaatacaaaaattagccgggagtgg tggcaggtgcctgtaatctcagctattcaggaggctgaggcaggagaatcacttgaacctgggaggcggaggttgccgtg agccgggatcacgccaccgcactccagcctgggcgatagagcaagactctgtctccaaaaaaataaattaaaaaacccac attgattatctgacatttgaatgcgattgtgcatcctgaattttgtctggaggccccacccgagccaatccagcgtcttg tcccccttctcccccttttcatcaacgccctgtgccaggggagaggaagtggagggcgctggccggccgtggggcaatgc aacggcctcccagcacagggctataagaggagccgggcgggcacggaggggcagagaccccggagccccagccccaccat gaccctcggccgccgactcgcgtgtcttttcctcgcctgtgtcctgccggccttgctgctggggggtgagtttttgagtc caacctcccgctgctccctctgtcccgggttctgttcccacctctccatagagggccccaccagtgtgggtccctcatcc >MMP3 matrix metalloproteinase 3 (stromelysin 1, progelatinase); chrom=11q22.3; LocusLink=4314; 15-AUG-2002;length=1200 aaagttttacaaaatgtcttcctctgaatatgtttagagtcttgcattcaagcatttattatacaccaataatgtgagca acactttacttgacaaagaaacagaaaagaaaggaaaggaagaaaacagaagagcatgaagagaaaatttaggatggatt ctgttcttcaacttcaaagcatctgctaatttgaatttagggaggaggggaaaaggttgaaagagaataagacatgtgta gaagacaaggacagagagaatttcagtccggtaagcaatgtaattcatttcagttctacaactatttatggagcagctac gtgggcccatcacccattaataaattggttacagaattaaaaccaacccaaagggaatatacttccttctttttcacaga ccctctttgttctattctgcccatgaggttttcctcctcaagaaccagcaaatccaacgacagtcaatagcaggcattac aaatcagattcagaaaaataaatcaccccttctaaatttcttctagatattatcttttatgttttgagtataattgtata tagtatagactatagctatgtatgtacactttccacttacatcttttatttgcttttataatgtctttcttaaaataaaa ctgcttttagaagttctgcacaattctgatttttaccaagtcaacctacttcttctctcaaaaggacaaacataaattgt ctagtgaattccagtcaatttttccagaagaaaaaaaatgctccagttttctcctctaccaagacaggaagcacttcctg gagattaatcactgtgttgccttgcaaaattgggaaggttgagagaaattagtaaagtaggttgtatcatcctactttga atttggaatgtttggaaatggtcctgctgccatttggatgaaagcaaggatgagtcaagctgcgggtgatccaaacaaac actgtcactctttaaaagctgcgctcccgaggttggacctacaaggaggcaggcaagacagcaaggcatagagacaacat agagctaagtaaagccagtggaaatgaagagtcttccaatcctactgttgctgtgcgtggcagtttgctcagcctatcca ttggatggagctgcaaggggtgaggacaccagcatgaaccttgttcaggtaattaacactaactgacctggccaggtggg >IL6R interleukin 6 receptor; chrom=1; LocusLink=3570; 15-AUG-2002;length=1200 ttctctccttcctttccttccttcccctctatccctccttccctccctccctccctcctcccttccttttctttctttct tttctttttttttttttctttccagacagggtctcactgtcatccaggctggagtagcagcccccaatcacggctcactg taccctggatctcccggactcaagcaattttcccacctcagcttccctagtagctgggactataggtgtgtaccaccaca cccagctaatttttaaatttttttatagaaatgggggtctcactttgttacacaggctggtctagaattcctggactgaa gcaatccacccacccggctctcccaaagtgttggggttacaggcgtgagccactgcccctggtgttagtgtctgtctgtc aagtcaggagggcagccatgaacgttctgatgtctactgagcacgtgtggcccagaccgtgtgtcaggtgtttaggtgcc atccacagaaccttcctaataaccctgggcagcataggctttcttatctctgacagatgaggaaatggagactcagattc tgaaccgaagtcacagacacagtagatggtaggtctaaatggggacccaggtctatctgactgcaaagtccaaaccgttt ccttgcctctgctgcagcctgcgaggagcagctgggcagaaagactgtgcctttacggtggtgagtcttccgatgcccaa gcctcaccccagaccgatgaaatcagaatctctggagacccgacccagacattggtgggttttagggctcctggctgatt

ExtractpromotersusingTRANSGENOME

AXX promoterset

Histogram (tt1.STA 2v*188c)

y = 13 * 0,42348 * normal (x; 1,503956; 0,895746)

VAR1

Pe

rce

nt

of

ob

s

0%5%

10%15%20%25%30%35%40%45%50%55%60%65%70%75%80%85%90%95%

100%

<= ,423 (,423;,847] (,847;1,27] (1,27;1,694] (1,694;2,117] > 2,117

Importance Core cut-off Matr. Cut-off AC Matrix--------------------------------------------- ---------------------------------

0.917751 0.877000 0.930000 M00062 V$IRF1_01

0.323077 1.000000 0.948000 M00339 V$ETS1_B

0.640828 0.989000 0.982000 M00199 V$AP1_C

0.276923 0.840000 0.853000 M00037 V$NFE2_01

1.000000 0.756000 0.760000 M00481 V$AR_01

0.159172 0.869000 0.866000 M00699 V$ICSBP_Q6

Interferon regulatory factor 1

Ets factorsAP-1

NF-E2 – an erythroid-specific factor

Androgen receptor

Interferon Consensus Sequence binding protein

Composite module found in the AXX promoters

Sites in the AXX promoter set: Yes V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0 0.951000 1.742000 Char = 0.78964 ELA2 elastase 2, neutrophil 1 1.941000 0.984000 0.876000 Char = 1.50025 MMP3 matrix metalloproteinase 3 2 0.772000 Char = 0.77200 IL6R interleukin 6 receptor 3 1.681000 Char = 1.68100 MMP2 matrix metalloproteinase 2 4 0.964000 0.856000 0.764000 1.764000 Char = 1.59327 OAS1 2',5'-oligoadenylate synthetase 1 5 1.000000 0.880000 1.644000 Char = 2.52852 MMP1 matrix metalloproteinase 1 6 0.984000 Char = 0.63057 TIMP1 tissue inhibitor of metalloproteinase 1 7 1.860000 0.939000 Char = 1.85648 STAT1 signal transducer and activator of transc 8 1.987000 1.850000 0.812000 Char = 2.59763 MMP9 matrix metalloproteinase 9 9 0.868000 1.548000 Char = 1.78836 MMP15 matrix metalloproteinase 15 10 0.985000 0.862000 1.575000 Char = 2.44492 MMP7 matrix metalloproteinase 7 11 0.780000 Char = 0.78000 MMP14 matrix metalloproteinase 14 12 1.966000 0.853000 Char = 1.49608 CD59 CD59 antigen p18-20 13 Char = 0.00000 LCN2 lipocalin 2 (oncogene 24p3) 14 1.921000 1.715000 Char = 2.33563 GZMM granzyme M (lymphocyte met-ase 1) 15 0.802000 Char = 0.80200 IFI27 interferon, alpha-inducible protein 27 16 0.975000 1.766000 Char = 2.08100 TIMP3 tissue inhibitor of metalloproteinase 3 17 1.866000 1.852000 Char = 2.00731 IFIT1 interferon-induced protein with tetratr 18 1.569000 1.892000 Char = 1.87015 IFI44 interferon-induced protein 44 19 0.760000 Char = 0.76000 MKNK1 MAP kinase-interacting serine/threonine 20 1.886000 0.810000 Char = 2.54087 IRF7 interferon regulatory factor 7 21 0.765000 Char = 0.76500 TIMP2 tissue inhibitor of metalloproteinase 2 22 0.948000 0.873000 Char = 0.54803 LGALS3BP lectin, galactoside-binding, soluble 23 1.892000 0.885000 Char = 1.87725 SCYB10 24 Char = 0.00000 PSMA2

Sites in the other human promoters Not V$IRF1_01 V$ETS1_B V$AP1_C V$NFE2_01 V$AR_01 V$ICSBP_Q6 0 Char = 0.00000 1 Char = 0.00000 2 Char = 0.00000 3 Char = 0.00000 4 Char = 0.00000 5 Char = 0.00000 6 Char = 0.00000 7 Char = 0.00000 8 Char = 0.00000 9 Char = 0.00000 10 Char = 0.00000 11 Char = 0.00000 12 Char = 0.00000 13 Char = 0.00000 14 Char = 0.00000 15 Char = 0.00000

Insulin pathway

?

InsR

Insulin Part of the insulin signaling network in TRANSPATH

STAT1

Ras

InsR

Signaling network analysis

AhR targetsGene expression Log(Experiment/Control)

-4

-2

0

2

4

6

8

10

log(Experiment/Control)

-4

-2

0

2

4

6

8

10

-4 -2 0 2 4 6 8 10

real expression

pre

dic

ted

ex

pre

ssio

n

S41 distance = 0.417599 D2:0.658627 SIG:0.000000 MIN_LENGTH 3000.000000 3.581248 1.000000 0.933000 M00026 V$AHR_Q5 2.942371 1.000000 0.917000 M00639 V$HNF6_Q6 0.798865 0.844000 0.900000 M00220 V$SREBP1_01 0.409376 0.962000 0.926000 M00173 V$AP1_Q2 0.055716 0.959000 0.989000 M00726 V$USF2_Q6

-1.329975 1.000000 0.959000 M00235 V$AHRARNT_01 -0.713625 1.000000 0.918000 M00156 V$RORA1_01 -0.668375 0.903000 0.854000 M00201 V$CEBP_C

Composite model correlate with theexpression level

TSS

-1000 +1000

V$AHR_Q5

V$AHRARNT_01

0.0983 * V$TCF11MAFG_01(0.821)0.0471 * V$FOXO4_01(0.961)0.0301 * V$IPF1_Q4(0.852)0.0410 * V$AR_01(0.851)0.0766 * V$GR_Q6(0.971)0.0482 * V$STAT1_02(0.995)0.0508 * V$CEBPB_01(0.98)0.0281 * V$STAT5A_02(0.826)

0.1040 * V$CETS1P54_02(0.949) -50- V$TCF4_Q5(0.908)0.0751 * V$TCF1P_Q6(0.726) -50- V$STAT6_01(0.861)0.0728 * V$SF1_Q6(0.684) -50- V$SMAD3_Q6(0.833)0.0419 * V$ELK1_02(0.862) -50- V$GRE_C(0.842)

Sma1Norm

-0.1 0.0 0.1 0.2 0.3 0.4 0.50

50

100

150

200

250

300

350

400

450

No ofobs

0

5

10

15

20

25

30

35

40

Sma1NormSma1Norm

-0.1 0.0 0.1 0.2 0.3 0.4 0.50

50

100

150

200

250

300

350

400

450

No ofobs

0

5

10

15

20

25

30

35

40

Composite module found in promoters of differentially expressed genes in liver of

growth hormone-deficient mice (Sma1).

differentially

expressed

genes

Non-changed

genes

Results of the ArrayAnalyzer™ search upstream from TFs resulting in identifying: growth hormone (GH) and receptor tyrosine kinases (RTK) as potential key molecules involved in differential expression of the genes in liver of growth hormone-deficient mice (Sma1).

TRANSPATH and tools, ArrayAnalyzer and PathwayBuilder4

At the next step, one can map the transcription factors found at the previous step on the signaling network of the TRANSPATH. If the factors found are parts of the same cascades that have been suggested on the step 1, then probability is increased that those factors are responsible for the coordinated gene regulation.

cytokines, chemokines

membrane receptors

adaptor proteins

PI3K

Calcineurin, Ca2+ binding proteins

NF-ATs

Ras, Raf

ERK, JNK, MAPK

Jun, Fos

NF-AT/Jun:Fos

Groups that are statistically enriched by potential target genesfor Jun:Fos and NFATs (as shown in the table above).

Other groups that contain potential target genes for Jun:Fosand NFATs.

cytokines, chemokines

membrane receptors

adaptor proteins

PI3K

Calcineurin, Ca2+ binding proteins

NF-ATs

Ras, Raf

ERK, JNK, MAPK

Jun, Fos

NF-AT/Jun:Fos

Groups that are statistically enriched by potential target genesfor Jun:Fos and NFATs (as shown in the table above).

Other groups that contain potential target genes for Jun:Fosand NFATs.

Feedback loops in activating immune cells through Feedback loops in activating immune cells through

NF-AT/AP-1NF-AT/AP-1

Network Network controlling S controlling S

phase entry in phase entry in response to a response to a proliferative proliferative

signalsignalErk-1 JNK

c-myc

cdc2 cycE cycD1 cdk4 cycD3

e2f-1

rb1

B-myb c-fosc-ets c-jun

_

+

++ ++ +

+

+

+

+ + + +

+

+

+

+

+

+

+

+

+ +

+ +

+

+

+

+

c-Myc B-Myb c-Fosc-Ets c-Jun

cycEcdc2

cycEcdk2

cycD1cdk4

cycD3

cdk4

pRB pRB

erk-1

c-rashtf9a

MEK

RafRas

RanRanBP1

_

p

p

?

ada, odc, ts

Nucleolines

cdc21, cdc46, p1 co-factor

Histones: H1, H2B-143,H3-143

+ +

+

E2F-1

DP-1

Enzymes of nucleotidemetabolism: dhfr, tk, cad

Factors and enzymes of replicationDNA pol , cdc6, ori1

S-phase entry

1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) MMNUCLEO TCTCCCCAC-CACACCAGGAAGTCACCTCTCTCA----------ACCTG---GAGTTATA 225 1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) RNNUCIA1 TCTCCCACCACACACCAGGAAGTCACCTCTCTGA----------ACCTG---GAGTTATA 221 1 <===========V$CREB_02(0.85) 2 <=======V$CREB_01(0.82) CSNUCLEO CCTCC-AGCACACACCAGGAAGTCACCTCTCCGAGACCGTCCCCATCAG---GAGTTAAA 229 1 <===============V$TH1E47_01(0.85) HSNUCLEO TGGCCCTGT-GAGGCCAGAAAGTTACTTCTCCGAGGCCAGTTCCCCATGTCTGAGAAATA 229 ** * **** **** ** **** * * *** * * ============================================================================= 1 <==========V$DELTAEF1_01(0.82) MMNUCLEO CCTACCG-CGAGAGGTCACCGACATTACATGGATCGCTTGTGCACTGCTCGTA--CACAC 282 1 <======== ==V$DELTAEF1_01(0.87) RNNUCIA1 CCTACCG-CGTGAGGTCA--GAGATTAAATGGACTGTTTGTGCACTGCTCACA--CACAC 276 1 <======== ==V$DELTAEF1_01(0.84) CSNUCLEO TCTACCG-CGCGAGGTTG--GACATTAAGCGAGCTGTTTGAGCACTGCACACAGGCGCGC 286 1 <========= =V$DELTAEF1_01(0.84) HSNUCLEO TCTCCCAACTTGAGGTTCT-GTGGGGTAGGGGAGGGTTCGTGACTTTCTCACAGAAAACC 288 ** ** * ***** * * * * * * * * * * * ============================================================================= 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87) MMNUCLEO ACACACGCAC------------AACTGCTTTTATTAGGAGCT----CTCAGGAAAGCGGG 326 1 <=======V$NKX25_02(0.84) 2 =========>V$CETS1P54_01(0.87) RNNUCIA1 ACACACGCGCGCGCGCGCGCGAAATTGCTTTTATTAGGAGCT----CTCAGGAAAGTGGT 332 1 =======>V$NKX25_02(0.82) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.84) CSNUCLEO ACACACGCACGC----------AACTGCCTTTATTGGGAGCTGTCTCTCAGGAGAACAGC 336 1 <=======V$NKX25_02(0.83) 2 <==========V$DELTAEF1_01(0.81) 3 =========>V$CETS1P54_01(0.86) HSNUCLEO TCGTACAGACCC-------CGCCACTGCCTTTATTAACAGCT----CTCAGGAGACTGCC 337 * ** * * *** ****** **** ******* * ============================================================================= MMNUCLEO GACTCGCATCA---TAGCCAAG----AAGCCGTTCGCGAC-TCCGCGGAGAACAGGCCGA 378 RNNUCIA1 GGCTCGCATCAGGCTACCACAGCC--AAGAGGACCGCCACCTCTACCGAGGGCAGGCCAA 390 CSNUCLEO GGCCCGCGGCGCAACACTAGAGCCCCGGGATGTTCTCGGC-TCTGCCGAGGGCAG-CCGA 394 HSNUCLEO TGCAGGAGGGGGGTCGCTCCGGCC---CCATGCTCGCGGG-CAAGCAGGGATAAG--CTG 391 * * * * * * * * * ** *

============================================================================= MMNUCLEO GGCCCGCTCATCAGCCCGAGGGAACCCTAGG--CC------TTCCGGCGTTCT------- 423 RNNUCIA1 GGCCCACTAAACGGCCCGAATGAACTCTAGG--CC------TTCCGGCGCTCT------- 435 CSNUCLEO GGCC-GCGAGCTGGCCCCAGTGG-CTCTAGG--CCCTCAACTTCCGGCGCTCTCCGGCTC 450 HSNUCLEO TGCCTCCAAAAGGGCCAACGGGAACTCCGCGGTCCCTGAACTTCCGGTGCTGGAGG---A 448 *** * *** * * * * ** ****** * * ============================================================================= MMNUCLEO -TCAGCAGGACCACGCGGCG---------------------------------------- 442 RNNUCIA1 -CCAGCTCTTCAGCGCGGCGAACGTTCTAGGCCCCTGAGAAGTCCACCGGGAGGCGCAGG 494 CSNUCLEO CTCAGCGGGAACGCGCGGCGAGCAGTTGAGGCCGCCGCGGATTCCAACGGGTTGGGGACG 510 HSNUCLEO CTCCTCGCTCCAGGGCCACCAGGAGCCGCGGC---------------------GTGAGTG 487 * * ** * ============================================================================= MMNUCLEO --------------GGGGGAAA-----GCACCGAGAAACGCCCAGACCACCTGAGCATCG 483 RNNUCIA1 TTTCCGCTACGCGAGGGGGAAA-----TCCCCGAGAAATGCCCAGACCACCTAAGCACAG 549 CSNUCLEO TTCGC----AGCGCGGGGGATGCTCGGGCCACCCACCACCCCCCCACCCCCCCGGCCACG 566 HSNUCLEO CGTGCCGGAACCGAGGGCGGGG-----TCTCTGAGGAACTCCAAGGCTGCCCAAGCCTAC 542 *** * * * ** * ** ** ============================================================================= MMNUCLEO CCGCCC--------ATGCTGCCTCGGAACACCTGAGGGAATCCGGGCCACGCCGCCACCT 535 RNNUCIA1 ACGTCC--------ATGCGGCGTACGGATACCTGAGGGAATCCGGGCCATACCGCCACCT 601 CSNUCLEO AGGCCCGGAGCTCCAGGTAGCAGTGCAGCACTAGGCGGCGTCCGGGCCACGCCGCCCAAT 626 HSNUCLEO GGACCC---------AGCCACATTGGCGAACC----GGAGACCGCCCGATTCCACCACC- 588 ** * * ** ** *** * * ** ** ============================================================================= 1 <=======V$E2F_02(1.00) MMNUCLEO ACCCGCG--CCTCACACACAAGCCGCGCCAAACTCGCCCGTCCCACTGCGCAGGCGTGGG 593 1 <=======V$E2F_02(1.00) RNNUCIA1 ACTCGCG--CCTCACTC--AAGCCGCGCCAAACTCGCGCGTTTCACTGCGCAGGCGTGTA 657 1 <=======V$E2F_02(1.00) CSNUCLEO TCCCCCGAGCCCCTTCCACAAGCCGCGCCAAACGGGTCTG---CACCGCGCAGGCG--GC 681 1 <=======V$E2F_02(1.00) HSNUCLEO -CCCGCGCTCCCCTCAC--AGCCGGCGCCAAAAACGCCAGTCCCACGACGCAGGC----- 640 * * ** ** * * * * ******** * * *** *******

Phylogenetic footprint of promoter regions of nucleolin genes

HSNUCLEO - Homo sapiens;CSNUCLEO - Cricetulus griseus;MMNUCLEO - Mus musculus;RNNUCIA1 – Rattus norvegicus

TFBS identification via pattern search

A

T

G

C

A

T

G

C

A

T

G

C

A

T

G

C

1) 2) 3)

0,65 0,7 0,75 0,8 0,85 0,9 0,95

Kernel

MEME

CONSENSUSGIBBS

0,000

0,200

0,400

0,600

0,800

1,000 Kernel

MEME

CONSENSUS

GIBBS

Table 1. Comparison of 3 programs performing the best for the low levels of value.

Kernel MULTIPROFILER PROJECTION 0,65 0,205 0,208 0,260 0,7 0,165 0,255 0,304

Result of comparison of four different pattern discovery programs on the sets of simulated sequences with implanted TF binding sites for one matrix; y-axis: the averaged sum of squared differences between reveled matrix and the original one; x-axis: values, that are the probabilities of “consensus nucleotide” in each position of the matrix.

Gradual evolutionby fixation of multiple substitutions (Protein functional centres)

Edited bipolymerby fixation of a small number of substitutions (Protein folding)

Evolution at onceby fixation of single substitutions(Regulatory regions of eukaryoticgenes)

Three mechanisms of biopolymer evolution

Thank you !

www.biobase.de

Documents

From patterns to pathways. Causal analysis of gene expression data Alexander Kel BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel Germany [email protected]