30
FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

FunCoup:reconstructing protein

networks in the worm and other animals

Andrey Alexeyenko,

Erik Sonnhammer

Stockholm Bioinformatics Center

Page 2: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

C. elegans computed interactomes

Page 3: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

FunCoup is a data integration framework to discover

functional coupling in eukaryotic proteomes with

data from model organisms

Aworm

Bworm

?F

ind

ort

hol

og

s*Mouse

Human

Fly

Yeast

Hig

h-th

roug

hput

ev

iden

ce

Page 4: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

FunCoup• Each piece of data is evaluated• Data FROM many eukaryotes (7)• Practical maximum of data sources (>60)• Predicted networks FOR a number of

eukaryotes (8)• Organism-specific efficient and robust

Bayesian frameworks• Orthology-based information transfer and

phylogenetic profiling• Networks predicted for different types of

functional coupling (metabolic, signaling etc.)

Page 5: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

C. elegans’ benefit from the model species data integration:

Li&Vidal’s set5535 pairs

IntAct (Oct. 2007)4517 pairs

6841

Other C. elegans data

36000 predicted C.elegans pairs

Page 6: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Species:•H. sapiens•M. musculus•R. norvegicus•D. melanogaster•C. elegans•S. cerevisiae•A. thaliana

Data sources in FunCoup:

Types:•Protein-protein interactions•Protein domain

associations •Protein-DNA interactions•mRNA expression•Protein expression•miRNA targeting•Sub-cellular co-localization•Phylogenetic profiling

Page 7: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Multilateral data transfer

Human

Ciona

Worm

Mouse Rat

Fly

Yeast

Arabidopsis

FunCoup

Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.

Page 8: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

InParanoid

P r o t e o m e A

P r o t e o m e B

Automatic clustering of orthologs and in-paralogs from pairwise species comparisonsMaido Remm, Christian E. V. Storm and Erik L. L. SonnhammerJournal of Molecular Biology 314, 5 , 14 December 2001, Pages 1041-1052

Reciprocally best hits ~ seed orthologs

Inparalogs

Page 9: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

How orthology works?

Log overlap between KEGG pathways and complexes (Gavin et al., 2006)

1 2 3 4 5 6 7

yeast

worm

fly

mouse

human

thaliana

Lo

g o

verl

ap

KE

GG

vs.

"G

avi

n e

t al.,

20

06

"

Core-Core Core-Modu Core-Attr Modu-Modu Modu-Attr DiffModules Attr-Attr

Page 10: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Comparing networks

Rat Human Mouse

Page 11: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

ConclusionsFunCoup:is a flexible, exhaustive, and robust

framework to infer confident functional links

enables practical web access to candidate interactions in both small and global-scale network context

is open towards better data quality and coverage

http://FunCoup.sbc.su.se

Page 12: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Acknowledgements:• Carsten Daub• Kristoffer Forslund• Anna Henricson• Olof Karlberg• Martin Klammer• Mats Lindskog• Kevin O’Brien• Tomas Ohlson• Sanjit Rupra • Gabriel Östlund• Sean Hooper• All previous interaction

network developers

Page 13: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Talk outline

Other network resources

Why FunCoup

Orthology and InParanoid

Implementation

Applications and future development

Page 14: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

FunCoup is a naïve Bayesian network (NBN)

Bayesian inference:

Genes A and B are functionally coupled

Genes A and B co-expressed

P(C|E) = (P(C) * P(E|C)) / P(E)

A<->B

Page 15: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Problem: Solution:Treat ALL inparalogs equally, and

choose the BEST valueIn situatons with multiple inparalogs, how to deal with alternative evidence?

Page 16: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Problem: Solution:

Naïve Bayesian network.Calculate a belief change instead

(likelihood ratios, LR).Assume NO data dependency

Absolute probabilities of FC are intractable. The full Bayesian network is impossible

A<->B

P(B|C), P(C|B)

P(B|A), P(A|B)

P(B|D), P(D|B)

P(A|C), P(C|A)

P(D|C), P(C|D)

P(A|D), P(D|A)

P(E|+) / P(E|-)

A<->B

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

Page 17: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

gene evolutionfunctional link

Problem: Solution:

Via groups of orthologs that emerged from speciation

How to establish optimal bridges between species?

Page 18: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Homologs

P r o t e o m e A

P r o t e o m e B

Homologs: proteins with similar sequence and, thus, common origin

Page 19: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

An InParanoid cluster of orthologs

Inparalogs

Page 20: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Problem: Solution:

Enforce confidence check and remove insignificant nodes

Some LR are weak and arise due to non-representative sampling

P(E|+) / P(E|-)

A<->B

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

χ2-test

Page 21: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Reciprocally best hits

Reciprocally best hits

P r o t e o m e A

P r o t e o m e B

Page 22: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Problem: Solution:Multinet

Decide which types of FC are needed (provide as positive training sets) and

perform the previous steps customized

Definitions and notions of FC vary

A<>B

P(E|+) / P(E|-)

A| B

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

P(E|+) / P(E|-)

A<>B

A||B

A|B

Page 23: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Proteins of the Parkinson’s disease pathway (KEGG #05020)

Physical protein-protein interaction

“Signaling” link

Metabolic “non-signaling” link

Multinet presents several link types in parallel

Page 24: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

The limits of data integration

1 2 3 4 5

N o . o f spec ies

0.004

0.005

0.006

0.007

0.008

0.009

0.010

0.011

0.012

0.013

Are

a un

der

RO

C,

spec

ifici

ty >

96% P C A -p rocessed

R aw da ta

4 8 12 16 20 24 28 32 36 40 44

N o . o f features

0.004

0.005

0.006

0.007

0.008

0.009

0.010

0.011

0.012

0.013

Are

a un

der

RO

C,

spec

ifici

ty >

96%

P C A -p rocessed R aw da ta

Page 25: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

FunCoup’s web interface

Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics. 2005 Dec 15;21(24):4432-3. Epub 2005 Sep 27.

http://FunCoup.sbc.su.se

Page 26: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Reconctructing the “regulatory blueprint”* in C. intestinalis

*Im

ai K

S, L

evin

e M

, Sat

oh N

, Sat

ou Y

(20

06)

Reg

ulat

ory

blue

prin

t for

a c

hord

ate

embr

yo. S

cien

ce, 2

6:11

83-7

.

Proteins of the “Regulatory Blueprint for a Chordate Embryo” [*]

18 links mentioned in [*] AND found by FunCoup

Links found by FunCoup (about 140)

The rest, 202 links from [*] that FunCoup did not find, not shown

Page 27: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Orthologs

Functional linkInparalogsC. elegansD. melanogasterhumanS cerevisiae

Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, 137-143

Page 28: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

Problem: Solution:Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution

Distribution areas informative of FC may vary

0-1 1Pearson r

+ + + + + + + +++ +++ +++ ++ + ++

- - - ----- -- ------ - - -- - - -

Page 29: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

ValidationJack-knife procedure:

Take “positive” and “negative” sets Split each randomly as 50:50 Use the first parts to train the algorithm, the second to test the

performance Repeat a number of times

Analysis Of VAriance:

Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.)

Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates

Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)

Page 30: FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center