82
Predicting the targets of mRNA-binding proteins + GeneMANIA & ISOLATE Quaid Morris Banting and Best Department of Medical Research, Departments of Computer Science and Molecular Genetics

Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Predicting the targets of mRNA-binding proteins

+ GeneMANIA & ISOLATE

Quaid Morris

Banting and Best Department of Medical Research,

Departments of Computer Science and Molecular Genetics

Page 2: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

The Donnelly Centre, University of Toronto

Page 3: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

3

Xiao Li Hilal Kazan

mRNA-binding

proteins

ISOLATE for in

silico purification of

tumour samples

Gerald Quon

Outline

Collaborators: Paul Boutros & Syed Haider (OICR), Blaise Clarke (UHN)

Ang Cui,

Amit Deshwar

(no pictures avail)

Collaborators: Tim Hughes & Deb Ray (Donnelly), Howard Lipshitz (U of T, MoGen),

Wei Jiao (U of T, MoGen)

Page 4: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

4

GeneMANIA: gene function prediction for the masses

Gary

Bader

Sara Mostafavi David Warde-Farley Khalid Zuberi

Christian Lopes Jason Montojo Harold Rodriguez

Max Franz

Deb Ray Chris

Grouios

Farzana

Kazi

Page 5: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

5

Other projects in the lab

Cancer networks (w/ Sara and Gerald)

GWAS & QTLs of complex disease (w/ David)

Anna Goldenberg, PhD

Hossein Radfar

microRNA target prediction

Page 6: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

mRNA-binding proteins

Page 7: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Post-transcriptional regulation (PTR)

transcription

poly-adenylation

export

mRNA stability

AAAAAA

Nucleus

Cytoplasm

pre-mRNA

mature mRNA

genomic DNA

splicing vs

vs AAA

vs

localization mRNA translation rate

vs

vs

spliced mRNA

vs

C C

Page 8: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Patterns of mRNA localization before the MZT

Adapted from LeCuyer et al, (2008) Cell 131

Red: Nucleus

Green: mRNA

Page 9: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

9

Recipe for modeling PTR

• Identify the trans-acting regulators:

– RNA-binding proteins and other ncRNAs

• Identify sequences recognized by ncRNAs (e.g. miRNAs):

– TargetScan, PicTar, miRanda, GenMiR, PITA, rna22, etc.

• Identify sequences recognized by RNA-binding proteins:

– RBPDB (http://rbpdb.ccbr.utoronto.ca) is a start

• Design algorithms to predict post-transcriptional fate given cis-elements and the mRNA sequence.

9

Page 10: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

10

Page 11: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

RNAcompete Method

dsDNA Pool

Cleave linker/ RNA Transcription/

DNase

ssDNA

Linker Ligation

Primer Extension

PCR

Label pulldown

RNA

Hybridize To Array/ Computational

Analysis

RNA Pool

Pulldown RNA

Affinity Matrix

Epitope Tag

RBP

RNA

Agilent microarray

RNA Pool

Strip ssDNA

RNA Pulldown

Unstructured RNA

Structured RNA (8loop)

Structured RNA (7loop)

Label RNA pool

Deb Ray and Tim Hughes

Hilal Kazan

Page 12: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Validation of RNAcompete

RNAcompete 7mers

0 6

0

6

Log affinity from Set A (a.u.) 2 4

2

4

8 10

8

10 UUUUUUU

UUUAUUU

UUUGUUU

10 highest ratio probes

(RED indicates motif match)

AGACCUUCAUGUUCUGUUUCUUUUACUGCUUUGGUUGU

AGAAGUUUGUUUUGGUUUUUGCUUUCUCCUCGUGUGCA

AGAUUUUUUACUUUCAUGUCUAACUCACCUUUGGGGAA

AGGUUUCGUUUUGUGUACUUCUCCCUAGUUUCUUUGGC

AGGAGUUAGUUUGGUUUUACUUUGUGUCCGCCUACGGU

AGAAGUAGGGAAGUUUUUUUCGUUGAUUUAGUGGGCCU

AGAUUAUUUCUCGUUAUAAUAAUGCGUUUAGUGCCCAC

AGAUUCAGAUUUAGCACCACUGUUGUAGUAUGUUUUGU

AGAAUUGAUUUAUUUUUGUUGUCUCACCCAUCUGCGUU

AGACUGCUAACUUAUUUUUUUGUCGUCCUAACGUUGCA

AGGGCCAAUGCGGUGCAGGCGCUGCGUUGGC

AGGUUCAAACGCAUGCGGGUCAAGCUACGAACGCCACU

AGAGAUGGAUUGGGCUGGCACCGGUCCAUC

AGGGCGUUUAACGGCGGGUCCCGUUAGGCGC

AGAGAGUGCAUAGCUGGCUUCUAUGCGCUC

AGGGCCAAUGCGGUGCAGGCGCUGCGUUGGC

AGAGUGUCAGGUACAUAAAGUAGCUCAAUGAGUUCGUU

AGAGUCCAGAACGGCAGGCACGUUUUGGAC

AGAGACAGGAGCUUAUUCAUUGAUCAGCGAUCGCG

AGAGGUUGAGACGGCAGGUCCCGUCUCGGCC

(RED indicates motif match BLUE indicates intended paired bases GREEN indicates predicted paired bases)

Log

affi

nit

y fr

om

Set

B (

a.u

.)

0 15

0

15

5 10

5

10

GCUGGCC

GCUGGUC

GCGGGCC

RNAcompete 7mers

10 highest ratio probes

Vts1 HuR

Log affinity from Set A (a.u.)

Log

affi

nit

y fr

om

Set

B (

a.u

.)

Page 13: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

FUSIP1

AAAGAAA

GAAAGAA

AAAGAAG

HuR/ELAV1

UUUGUUU

UUUUUUU

UUUAUUU

PTB

UCUUUCU

CUUUCUU

RBM4

GCGCGCG

GCGCGGG

CGCGCGG

GCGAGCG

GAGCGAG

CGAGCGG

SFRS1/SF2/ASF

SLM2 AGAUAAA

AUUAAAC

AUAAACA

U1A

UUGCACU

AUUGCAC

UUGCACG

Vts1

GCUGGUC

CGCUGGC

YB1

CCUGCGG

0

5 10 15

5

10

15

GCGGGCC

0

-2

0 2 4

0

2

4

-2 6 8

6

8

0 2 4

0

5

-2 6 8

10

10

GCCUGCG

CUGCGGU

-2

0 2 4

0

2

4

-2 6 8

6

-2

0 2 4

0

2

4

-2 6 8

6

-2

0 2 4

0

2

4

-2 6

6

0

2 4 6

2

4

6

0 8 10

8

10

-5

0 5

0

5

-5 10

10

-4

0 4

-2

0

2

-4 8

4 CUUUCCU

SetA and SetB 7-mers are highly correlated for all 9 RBPs profiled and reflect known sequence binding preferences where known.

Log affinity from Set B (a.u.)

Log

affi

nit

y fr

om

Set

A (

a.u

.)

Page 14: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Gerber et. 2004, PLoS Biology

RIP-chip for detecting in vivo RBP targets

Page 15: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

YB1 RNAcompete predicts its in vivo targets

Skabkina 2005 (50.4)

PWM A (96.6) PWM B (58.6)

7mers A (98.8) 7mers B (98.7)

1-Specificity 1 0.8 0.6 0.4 0.2 0

0

Sen

siti

vity

1

0.8

0.6

0.4

0.2

UCCA(A/G)CAA Skabkina 2005

CCUGCGG GCCUGCG CUGCGGU GGUCUGC CCCUGCG

Top five 7-mers

RNAcompete PWM A

YB1 co-immunoprecipitates (cDNAs) from (Dong, J. et al. RNA Biol 2009), 95 positives, 99 negatives

Page 16: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

SF2/ASF RNAcompete predicts its in vivo targets

1-Specificity 1 0.8 0.6 0.4 0.2 0

0

Sen

siti

vity

1

0.8

0.6

0.4

0.2

Sanford 2009

Sanford 2008

Clip-seq data from (Sanford et al Genome Research 2009) 296 positives, 314 negatives

RNAcompete PWM A

Sanford 2009 A (87.9) Sanford 2009 B (90.3) Tacke 1995 A (47.7) Tacke 1995 B (54.4) Liu 1998 A (51.6) Liu 1998 B (58.3)

PWM A (90.7) PWM B (90.6)

7mers A (91.2) 7mers B (91.5)

Page 17: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

17

RNAcompete summary

• Measures in vitro affinity of an RBP for ~250k custom-designed 35-mer RNA sequences,

• These affinities can be summarized into highly reproducible 7-mer scores (or affinities),

• RNAcompete 7-mer scores predict in vivo targets for the RBPs that we’ve looked at.

17

Page 18: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Good news:

We’re doing

RNAcompete for all

Drosophila and human

RBPs!

Page 19: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Bad news:

RBP sequence binding

preferences alone might

not be good enough.

Page 20: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Most potential Puf3p sites are not Puf3p-bound

mRNAs whose

3’UTR

contains a

Puf3p site

158

244

Puf3p bound

mRNAs

mRNAs in control

but not Puf3p

bound

Puf3p RIP-chip data from Gerber et. 2004, PLoS Biology

Page 21: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Sequence-specific RBPs bind ssRNA

DNA: B-Form Helix

Xiao Li

Transcription

factor goes

here

RNA: A-Form Helix

Page 22: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

22

Page 23: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Other RBPs recognize different RNA structures

C

N G

G

N(0-3)

VTS1

Staufen

HIV-1 TAT

A

G

A

C

U C

U

A

G

U

C

Aviv et al, Nat Struct Biol (2003), 10:614

Wickham et al, Mol Cell Biol (1999) 19:2220

Weeks et al, Cell (1992) 70:1069 Hilal Kazan

5’ 3’

Page 24: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

1. Does the secondary structure of naked mRNA

constrain RBP binding?

2. Are mRNA secondary structure predictions made with

current tools useful?

Page 25: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Accessibility: Probability being unpaired in the thermodynamic ensemble

5’ 3’

Ensemble:

5’ 3’

1 0.5 0

Targ

et S

ite

Accessib

ility

mRNA

Page 26: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

26

RNAplfold: local thermodynamic folding

mRNA transcript

Local folding windows

w

l u

Page 27: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Proportion of unbound sequences

(false positive rate)

Pro

port

ion o

f bound s

equences

(tru

e p

ositiv

e r

ate

)

0.2 0.4 0.6 0.8 1.0

0.15 0.10 0.05 0

Me

dia

n s

ite

acce

ssib

ility

Area under the

ROC curve: 74%

(P < 2 x 10-14,

ranksum test)

Accessibility

# of target sites

1.0

0.8

0.6

0.4

0.2

0

Target site accessibility predicts Puf3p binding

Bound

No

t b

ou

nd

Bo

un

d

Page 28: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Ranking by site number (#TS)

3 and 3 2 and 2

Area under curve (AUROC): 1. 0 vs. 0.5

FP rate

TP r

ate

Ranking by total accessibility (#ATS)

2.7 1.7 1.3 1.0

1 0.7

1

0.8 0.9

0.7 0.6

Bound

Unbound

#ATS

#TS

Consensus sites

RIP-chip result

0.1

0

0.9

ROC curve for predicting in vivo binding

Scoring accessibility for RBPs with >1 binding sites

0 0 1

1

#ATS Score

#TS Score

Page 29: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Accessibility significantly increases predictive

accuracy for 71% of RBPs tested

Pub1 Nab2

Puf3

Yll032c

Vts1 PTB

PTB PTB

Puf4

Msl5 Puf3

Pumilio

Puf4

HuR

Khd1

*

*

*

* * *

* *

* * *

*

* *

60%

Ran

kin

g b

y to

tal a

cces

sib

ility

(A

UR

OC

)

Ranking by # of target sites (AUROC)

50%

70%

80%

50% 60% 70% 80%

* indicates a significant

improvement in AUROC

(FDR < 0.05, Delong-Delong-

Clarke-Pearson test, BH

correction)

Page 30: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

30

Page 31: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

32

Page 32: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

34

Page 33: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

1. Does the secondary structure of naked mRNA

constrain RBP binding?

Yes.

2. Are mRNA secondary structure predictions made with

current tools useful?

Yes, much more useful than expected.

Also, we have developed an in silico benchmark for

testing predictions of mRNA secondary structure.

Citation: Li, Quon, Lipshitz, Morris, RNA 2010 (16)

Page 34: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

37

Page 35: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

38

Page 36: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Representing structural context by annotating bases

hairpin

loop

hairpin

loop

multiloop

external

region

internal loop

bulge loop

stem P: paired

U: unpaired

P: paired

L: loop

U: unstructured

M: miscellaneous

Page 37: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Representing ensembles of structures using our structural annotation (i)

AGACGCGCGCGUUCGCCGCGCUCGGCGCAUGC

1 UPPMPPPPPPLLLLPPMPPPPPPMPPLLLLPP

2 UUUPPMPPPPPLLLLLPPPPPMPPUUUUUUUU

3 UUUUPPPPMPPMMMPPLLLPPMPPPPPPUUUU

4 UPPPPLLLLPPPPMPPPPLLLLPPPPUUUUUU

5 UUUUUUUUPPPPMPPPPPLLLLPPPPPMPPPP

...

AGACGCGCGCGUUCGCCGCGCUCGGCGCAUGC

P: paired L: loop M: miscellaneous U: unstructured

Annotation

profile

Page 38: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

6.1

5.6

.

.

0.5

0.1

RBP

affinities

sequences and

annotation profiles

RNAcontext inputs and outputs

RNAcontext

..GGGAUACCC..

..GGGCUACCG..

.

.

.

..CCCAUACCC..

..GGGAAACCC..

.

.

.

P

L

M

U

0

1 relative

preference

base

preference

structural

preference

Page 39: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

RNAcontext motif model

binding site score

structural context

base identity × =

Roider et al (2006) Bioinformatics 23:134

*

*

Page 40: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

44

Page 41: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

46

Page 42: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Comparison with other motif models

Proteins RNAcontext MEMERIS MatrixREDUCE Precision

improvement

Reduction

in error

RBM4 91% 43% 63% 28% 75.7%

Fusip 53% 31% 32% 21% 30.9%

VTS1 65% 58% 56% 7% 16.7%

YB1 17% 7% 11% 6% 6.8%

SLM2 81% 49% 77% 4% 17.4%

SFRS1 (SF2/ASF)

70% 50% 66% 4% 11.8%

HuR 96% 74% 94% 2% 33.3%

PTB 69% 26% 67% 2% 6.1%

U1A 30% 27% 21% 3% 4.1%

Average precision on held-out data

Bold

indicates stat.

sig. reduction

in error

Page 43: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

i) Predicted motifs and structural contexts ii) Relative structural

preferences

0 0.5 1

Legend

Paired

Loop

Misc

Unstructured

Page 44: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

49

Page 45: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

RNA-map from SFRS1 knock-down

more exclusion no change more inclusion

Alternatively spliced exon sequences

RNAcontext predicted sequence preferences

Mea

n 7

-mer

aff

init

y

% exon length

structural preference: Loop 1.0 Multi/Internal 0.5

0-10% 20-30% 40-50% 60-70% 80-90%

Change in splicing level

Page 46: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Summary

• is a new discriminative motif model for finding RBP binding preferences

• incorporates structural information with a new way of representing secondary structure

• recovers known sequence and structure binding preferences of most RBPs

RNAcontext:

Page 47: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Acknowledgements: mRNA

Morris Lab Hilal Kazan Xiao Li, Gerald Quon Timothy Hughes Debashish Ray Kate Cook Esther Chan: PWMs Shaheynoor Talukder

Benjamin Blencowe Arneet Saltzman: SF2 knockdown Sidharth Chaudhry: cloning Helpful Feedback: Craig Smibert (U Toronto) Howard Lipshitz (U Toronto) Frank Sicheri (U Toronto) Mariana Kekis (Hughes Lab) Reagent gifts: Carol Lutz: U1A (UMDNJ) Craig Smibert: Vts1 Mariano Garcia-Blanco: PTB (Duke) Kimitoshi Kohno: YB1 (U of OEH)

Howard Lipshitz

Page 48: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

The GeneMANIA project: gene function prediction for the

masses

Page 49: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Genomics revolution, the bad news

• noisy,

• redundant,

• incomplete,

• mysterious,

• massive

Functional genomic datasets are:

Page 50: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

56

How can genomic data be used in the lab?

?!?

Microarray expression profiles

Genetic interactions

CHiP-chip regulatory networks

Protein-protein interactions

Page 51: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Use cases:

1. What is the function of this gene?

2. Give me more genes like these.

3. Here’s a list of genes -- what does it mean?

Page 52: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

58

Page 53: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Recommender systems

Page 54: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Google can’t do biology

Page 55: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana
Page 56: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

62

Label propagation

MCA1

CDC48

CPR3

TDH2

GeneMANIA gene scoring via label propagation

MCA1 CDC48

CPR3

TDH2

MCA1

CDC48

CPR3

TDH2

Guilt-by-association

-1 …

……

…...

.+1

Discriminant Value

Page 57: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

The problem of gene multi-functionality

• Gene function could be a/the:

– Biological process,

– Biochemical/molecular function,

– Subcellular/Cellular localization,

– Regulatory targets,

– Temporal expression pattern,

– Phenotypic effect of deletion.

Some networks may be better for some types of gene function than others

Page 58: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

64

Page 59: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

65

Three rules for network weighting

– Reliability

• Noisy networks get low weight

– Relevance

• Irrelevant networks get low weight

– Redundancy

• Redundant weights share their weights

Page 60: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

66

Page 61: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

68

http://www.genemania.org/plugin/

Page 62: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

69

Page 63: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

70

Page 64: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

71

http://cytoscapeweb.cytoscape.org/

Page 65: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Data providers

+ MODs + ENSEMBL + Gene Ontology

Page 66: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Who?

Gary Bader Quaid Morris

Sara Mostafavi

David Warde-Farley

Ovi Comes

Khalid Zuberi

Christian Lopes

Jason Montojo

Harold Rodriguez Max Franz

Sylva Donaldson

Farzana Kazi

Principal

Investigators

Students

Developers

Outreach

Page 67: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

GeneMANIA: the next steps

1. More data, organisms, and functions

2. Grouping functions together

3. Better handling of profiling data

Page 68: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

ISOLATE: a tool for purifying heterogeneous tumour array data

Page 69: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Tumour samples are heterogeneous

A B

A)An endometrial endometrioid carcinoma with significant amounts of healthy tissue.

B) A sample from the same endometrial endometrioid carcinoma, but almost exclusively tumor.

Page 70: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

77

Page 71: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

78

Page 72: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Acknowledgements

Morrislab

Gerald Quon

Ang Cui

Khalid Zuberi

Amit Deshwar

Paul Boutros (OICR)

Syed Haider

Peter Zandstra (Donnelly)

Wendy Qiao

Rae Yeang (Sick Kids)

Blaise Clarke (UHN)

Funding

Page 73: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

Questions?

Page 74: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

RNA probe set / pool design

81% of all 10-mers >= 12 repeats of all 8-mers >= 64 repeats of all 7-mers

Unstructured or weakly structured sequences

Stem-loops containing loops with <= 7 bases

99 – 100% represented, some 6-loops are repeated. 10bp length stems

Stem-loops containing loops with 8 bases

59% represented 10bp length stems

Two independent 35nt probe sets (SetA and SetB), each probe set contains:

Sequence type Representation

+ ~30k replicate probes + controls on an Agilent 244k format

Now also available: 8-mer design

Page 75: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

PTB 7mers A (74.8)

7mers B (74.6)

Perez 1997 UCUU (65.8) Perez 1997 UCUUC (62.6)

PWM A (71.8) PWM B (74.7)

Oberstrass 2005 CUNN (73.5) Oberstrass 2005 YCN (67.3) Oberstrass 2005 YCU (75.9)

7-mer scores more accurately predict transcripts that co-purify with PTB

1-Specificity

0 1 0.8 0.6 0.4 0.2 0

Sen

siti

vity

1

0.8

0.6

0.4

0.2

CUUUCUU UUUCUUU UCUUUCU UUACUUU UCUAUCU

PWM A

5 highest scoring 7-mers

PTB co-immunoprecipitates (cDNAs) from (Gama-Carvalho, M et al GB 2006)

2,951 positives, 2,052 negatives

Page 76: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

83

Page 77: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

84

Page 78: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

85

Page 79: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

86

Page 80: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

87

Page 81: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

88

Page 82: Predicting the targets of mRNA-binding proteinsSara Mostafavi David Warde-Farley Khalid Zuberi Christian Lopes Jason Montojo Harold Rodriguez Max Franz Chris Deb Ray Grouios Farzana

89