46
March 03 March 03 Identification of Identification of Transcription Factor Transcription Factor Binding Sites Binding Sites Presenting: Presenting: Mira & Tali Mira & Tali

March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Embed Size (px)

Citation preview

Page 1: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

March 03March 03

Identification of Identification of Transcription Factor Transcription Factor

Binding SitesBinding Sites

Presenting:Presenting:

Mira & TaliMira & Tali

Page 2: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

GoalGoalAGCCA

AGCCA

AGCCA

AGCCA

AGCCA

AGCCA

Regulatory Regulatory regionsregions

Motif – Motif –

Binding site???Binding site???

Page 3: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Why Bother?Why Bother?

Gene expression regulation

Co-regulation

UNDERSTAND

Page 4: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

DifficultiesDifficulties

Multiple factors for a single geneMultiple factors for a single gene

Variability in binding sitesVariability in binding sites The nature of variability is NOT well understoodThe nature of variability is NOT well understood Usually Transitions Usually Transitions Insertions and deletions are uncommonInsertions and deletions are uncommon

Location, location, location…Location, location, location…

Page 5: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

EMSA – Electrophoretic mobility shift EMSA – Electrophoretic mobility shift assayassay

Nuclease protection assay Nuclease protection assay

Experimental methodsExperimental methods

NOT ENOUGH!!!!!

Page 6: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

So, what can we do?So, what can we do?

Find conserved sequences in Find conserved sequences in regulation regionsregulation regions

1. Define what you want to find1. Define what you want to find

2. Define what is a good result2. Define what is a good result

3. Decide how to find it…3. Decide how to find it…

Page 7: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Global optimum Global optimum Enumerative methodsEnumerative methods

Going over ALL possibilitiesGoing over ALL possibilities

Taking the best oneTaking the best one

Principal Methods:Principal Methods:

Disadvantage :

Limited to small search spaces

Advantage :

Certainty

Page 8: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Principal Methods:Principal Methods:

Disadvantage :

You can never know…

Advantage :

Basically good results, faster

Local optimum Local optimum Gibbs sampling, AlignACEGibbs sampling, AlignACE

Start somewhere (arbitrary)Start somewhere (arbitrary)Next step direction – proportional to what Next step direction – proportional to what we “gain” from itwe “gain” from itWe can get We can get anywhereanywhere with some with some probabilityprobability

Page 9: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Identifying motifsIdentifying motifs Expression patterns Expression patterns Phylogenetic footprintingPhylogenetic footprinting

Identifying networksIdentifying networks Common motifs in expression clustersCommon motifs in expression clusters Combinatorial analysisCombinatorial analysis

Articles OverviewArticles Overview

Page 10: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Discovery of novel trancription Discovery of novel trancription factor binding sites by statistical factor binding sites by statistical

overrepresentationoverrepresentationS. Sinha, M. TompaS. Sinha, M. Tompa

Identify binding sites in yeast

Goal:

Use sets of co-regulated genes

Identify over-represented

upstream sequences

Enumeration YMF algorithm

Page 11: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

What constitutes a motif?What constitutes a motif?(tailored for S.cerevisiae)(tailored for S.cerevisiae)

In S.cerevisiae typically 6-10 In S.cerevisiae typically 6-10 conserved bases – The motifconserved bases – The motif

Spacers varying in length (1-11bp)Spacers varying in length (1-11bp) Usually located in the middle Usually located in the middle

Taken from SCPD – S.cerevisiae promoter database

ACCNNNNNNGTT

Page 12: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Z-score – Z-score – MotifMotif over-representationover-representation

PPmaxmax(X) – (X) – Probability of ZProbability of Zscorescore >= X >= X

How do we measure motifs?How do we measure motifs?

Page 13: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

YMF algorithmYMF algorithmYeast Motif FinderYeast Motif Finder

INPUT:

A set of promoter regions Motif length -

l

• modest values

Maximum number of spacers allowed - w

Transition Matrix

6

11

Page 14: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

YMF algorithmYMF algorithm

Post Processing:

FindExplanators:

artificial overrepresentation

W-score

Co-expression score

TCACGCT (motif)

CACGCTA (artifact)

Page 15: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

ExperimentsExperiments

Validate YMF resultsValidate YMF results Running YMF on regulons with known Running YMF on regulons with known

binding sites (SCPD)binding sites (SCPD)

Run YMF on MIPS catalogsRun YMF on MIPS catalogs(MIPS - Munich Information center for Protein Sequences)(MIPS - Munich Information center for Protein Sequences)

Functional Functional Mutant phenotypeMutant phenotype

Page 16: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Validation Validation

Page 17: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

New binding sitesNew binding sites or false positives? or false positives?

Page 18: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

A novel site candidateA novel site candidate

Page 19: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Further researchFurther research

Validation of novel binding sites and Validation of novel binding sites and transcription factorstranscription factors

Modification of the algorithm to be Modification of the algorithm to be applicable for other organismsapplicable for other organisms

Page 20: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Systematic determination of Systematic determination of genetic network architecturegenetic network architectureSaeed Tavazoie, Jason D. Hughes, Michael J. Campbell, Raymond J. Cho, Saeed Tavazoie, Jason D. Hughes, Michael J. Campbell, Raymond J. Cho,

George M. ChurchGeorge M. Church

Cluster by expression patterns

Identify upstream sequence patterns

Identify co- regulated networks of genes in yeast

Goal:

AlignACE

Aligns Nucleic Acid Conserved Elements

Page 21: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

ClustersClusters

Cluster – a group of genes with a Cluster – a group of genes with a similar expression patternsimilar expression pattern

Cluster’s members Cluster’s members Tend to participate in common Tend to participate in common

processesprocesses Tend to be co-regulatedTend to be co-regulated

Page 22: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

ClustersClusters 10-54

Page 23: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Identifying motifsIdentifying motifs Using AlignACE Using AlignACE

18 motifs from 18 motifs from 12 clusters 12 clusters were found.were found.

7 of the found 7 of the found motifs were motifs were identified identified experimentally experimentally

And what about the And what about the others????others????

Page 24: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Scanning for more binding Scanning for more binding sitessites

Once a significant motif was found Once a significant motif was found the whole genome was scanned for itthe whole genome was scanned for it

Most motifs were cluster specificMost motifs were cluster specific

Page 25: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Why so few motifs?Why so few motifs?

Too stringent rules for defining a Too stringent rules for defining a “significant” motif“significant” motif

Post transcriptional regulation (mRNA Post transcriptional regulation (mRNA stability)stability)

Some clusters represent “noise”Some clusters represent “noise”

Page 26: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

““Tightness”Tightness”

““Tightness” of a clusterTightness” of a cluster how close are the cluster members of a how close are the cluster members of a

particular cluster to its mean particular cluster to its mean

A strong correlation between the A strong correlation between the presence of significant motifs and presence of significant motifs and the “tightness” of a clusterthe “tightness” of a cluster

Page 27: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Things to rememberThings to remember

Discovering regulons and motifs using Discovering regulons and motifs using expression based clusteringexpression based clustering

Minimal biases Minimal biases Validation as a methodology for new Validation as a methodology for new

organismsorganisms

Identifying expected cis-regulatory Identifying expected cis-regulatory motif EACH TIME!!motif EACH TIME!!

Page 28: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Identifying regulatory networks Identifying regulatory networks by combinatorial analysis of by combinatorial analysis of

promoter elementspromoter elementsby Yitzhak Pilpel, Priya Sudarsanam & George M.Churchby Yitzhak Pilpel, Priya Sudarsanam & George M.Church

Understand transcriptional

network

Goals:

Identify motif combinations

affecting expression patterns in yeast

Page 29: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Basic definitionsBasic definitions

Expression coherence Expression coherence score-score-

Synergistic motifs – Synergistic motifs –

EC(a&b) > EC(a\b) , EC(b\EC(a&b) > EC(a\b) , EC(b\a)a)

Page 30: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Methods:Methods:A database of motifs

Gene sets

Calculating EC score

Significant synergistic combinations

Visualizing the transcriptional

network

Understanding the effect of individual and combination of

motifs

Page 31: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

GMCGMC

GMC – Gene Motif Combination.GMC – Gene Motif Combination.

Motif numbers: Motif numbers: (m1, m2, m3, m4, m5) = (1,0,1,1,0)(m1, m2, m3, m4, m5) = (1,0,1,1,0)

Synergistic motif combination-Synergistic motif combination-EC(n motifs) > max(EC(n-1 motifs))EC(n motifs) > max(EC(n-1 motifs))

GMC – what is it good for?GMC – what is it good for?

Page 32: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

CombinogramsCombinograms

ClusteringClustering

GMCsGMCs

Page 33: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Combinograms – what is it Combinograms – what is it good for?good for?

They help visualizing They help visualizing the “single motif - the “single motif - specific expression specific expression pattern” connectionpattern” connection

They also show which They also show which motif is more critical motif is more critical in determining in determining expression pattern.expression pattern.

Page 34: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Motif synergy mapMotif synergy mapvisualizing transcription networksvisualizing transcription networks

Page 35: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

conclusionconclusion

The combinogram importanceThe combinogram importance

The motif synergy map importanceThe motif synergy map importance

Page 36: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Phylogenetic footprinting of Phylogenetic footprinting of transcription factor binding transcription factor binding

sites in proteobacterial sites in proteobacterial genomesgenomes

Lee Ann McCue, William Thompson, C.Steven Carmack, Michael P.Ryan, Jun Lee Ann McCue, William Thompson, C.Steven Carmack, Michael P.Ryan, Jun S.Liu, Victoria Derbyshire and Charles E.LawrenceS.Liu, Victoria Derbyshire and Charles E.Lawrence

Goals:

Identifying novel TF binding sites in

E.coli

Describing transcription

regulatory network

Finding

orthologsIdentify

upstream sequence patterns

Local optimum

Gibbs sampling algorithm

Page 37: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Methods:Methods:

Data set

Gibbs sampling algorithm

Motif

One E.coli gene One E.coli gene and orthologsand orthologs

MAP score – a measure MAP score – a measure of of

overrepresentationoverrepresentation

of motifof motif

Page 38: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Applying the method in a small Applying the method in a small scale – Validationscale – Validation

Choosing 190 E.coli genes.Choosing 190 E.coli genes. Creating 184 data sets.Creating 184 data sets. Running Gibbs sampling algorithm.Running Gibbs sampling algorithm. More than 67% success in the More than 67% success in the

prediction for the most probable motif.prediction for the most probable motif.

Page 39: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Motif ModelMotif Model

Page 40: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Identification of the YijC Identification of the YijC binding sitesbinding sites

A strongly predicted site was A strongly predicted site was upstream of the fabA, fabB and yqfA upstream of the fabA, fabB and yqfA genes.genes.

Chromatography – identifying the Chromatography – identifying the factor.factor.

Page 41: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Identifying the YijC binding Identifying the YijC binding sites and predicting gene sites and predicting gene

functionfunction

Mass spectrometry Mass spectrometry identification – YijCidentification – YijC

Predicting a Predicting a function for yqfA. function for yqfA.

wei

ght

fabAfa

bByq

fAfa

dB

Page 42: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Applying the method genome Applying the method genome widewide

Choosing 2113 E.coli ORFs.Choosing 2113 E.coli ORFs.

For 2097 a TF-binding site was For 2097 a TF-binding site was predicted.predicted.

Page 43: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Map scores- ortholog Map scores- ortholog distributiondistribution

Study set

Full set

Page 44: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Adding binding sites for known Adding binding sites for known TFsTFs

Building a TF binding site model for Building a TF binding site model for known TFs.known TFs.

Scanning E.coli upstream regions.Scanning E.coli upstream regions.

187 new probable sites.187 new probable sites.

Page 45: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

Building a regulatory Building a regulatory networknetwork

Required steps:Required steps: Identifying motif modelsIdentifying motif models Clustering the modelsClustering the models

Problem:Problem: Specifity Specifity

Page 46: March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali

ConclusionConclusion

What have we gained so far?What have we gained so far?

A better prediction of gene function.A better prediction of gene function.

New possibilities for identification of TF New possibilities for identification of TF binding site and the TF which binds binding site and the TF which binds them!!!them!!!