View
214
Download
0
Tags:
Embed Size (px)
Citation preview
FunCoup:reconstructing protein
networks in the worm and other animals
Andrey Alexeyenko,
Erik Sonnhammer
Stockholm Bioinformatics Center
C. elegans computed interactomes
FunCoup is a data integration framework to discover
functional coupling in eukaryotic proteomes with
data from model organisms
Aworm
Bworm
?F
ind
ort
hol
og
s*Mouse
Human
Fly
Yeast
Hig
h-th
roug
hput
ev
iden
ce
FunCoup• Each piece of data is evaluated• Data FROM many eukaryotes (7)• Practical maximum of data sources (>60)• Predicted networks FOR a number of
eukaryotes (8)• Organism-specific efficient and robust
Bayesian frameworks• Orthology-based information transfer and
phylogenetic profiling• Networks predicted for different types of
functional coupling (metabolic, signaling etc.)
C. elegans’ benefit from the model species data integration:
Li&Vidal’s set5535 pairs
IntAct (Oct. 2007)4517 pairs
6841
Other C. elegans data
36000 predicted C.elegans pairs
Species:•H. sapiens•M. musculus•R. norvegicus•D. melanogaster•C. elegans•S. cerevisiae•A. thaliana
Data sources in FunCoup:
Types:•Protein-protein interactions•Protein domain
associations •Protein-DNA interactions•mRNA expression•Protein expression•miRNA targeting•Sub-cellular co-localization•Phylogenetic profiling
Multilateral data transfer
Human
Ciona
Worm
Mouse Rat
Fly
Yeast
Arabidopsis
FunCoup
Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.
InParanoid
P r o t e o m e A
P r o t e o m e B
Automatic clustering of orthologs and in-paralogs from pairwise species comparisonsMaido Remm, Christian E. V. Storm and Erik L. L. SonnhammerJournal of Molecular Biology 314, 5 , 14 December 2001, Pages 1041-1052
Reciprocally best hits ~ seed orthologs
Inparalogs
How orthology works?
Log overlap between KEGG pathways and complexes (Gavin et al., 2006)
1 2 3 4 5 6 7
yeast
worm
fly
mouse
human
thaliana
Lo
g o
verl
ap
KE
GG
vs.
"G
avi
n e
t al.,
20
06
"
Core-Core Core-Modu Core-Attr Modu-Modu Modu-Attr DiffModules Attr-Attr
Comparing networks
Rat Human Mouse
ConclusionsFunCoup:is a flexible, exhaustive, and robust
framework to infer confident functional links
enables practical web access to candidate interactions in both small and global-scale network context
is open towards better data quality and coverage
http://FunCoup.sbc.su.se
Acknowledgements:• Carsten Daub• Kristoffer Forslund• Anna Henricson• Olof Karlberg• Martin Klammer• Mats Lindskog• Kevin O’Brien• Tomas Ohlson• Sanjit Rupra • Gabriel Östlund• Sean Hooper• All previous interaction
network developers
Talk outline
Other network resources
Why FunCoup
Orthology and InParanoid
Implementation
Applications and future development
FunCoup is a naïve Bayesian network (NBN)
Bayesian inference:
Genes A and B are functionally coupled
Genes A and B co-expressed
P(C|E) = (P(C) * P(E|C)) / P(E)
A<->B
Problem: Solution:Treat ALL inparalogs equally, and
choose the BEST valueIn situatons with multiple inparalogs, how to deal with alternative evidence?
Problem: Solution:
Naïve Bayesian network.Calculate a belief change instead
(likelihood ratios, LR).Assume NO data dependency
Absolute probabilities of FC are intractable. The full Bayesian network is impossible
A<->B
P(B|C), P(C|B)
P(B|A), P(A|B)
P(B|D), P(D|B)
P(A|C), P(C|A)
P(D|C), P(C|D)
P(A|D), P(D|A)
P(E|+) / P(E|-)
A<->B
P(E|+) / P(E|-)
P(E|+) / P(E|-)
P(E|+) / P(E|-)
gene evolutionfunctional link
Problem: Solution:
Via groups of orthologs that emerged from speciation
How to establish optimal bridges between species?
Homologs
P r o t e o m e A
P r o t e o m e B
Homologs: proteins with similar sequence and, thus, common origin
An InParanoid cluster of orthologs
Inparalogs
Problem: Solution:
Enforce confidence check and remove insignificant nodes
Some LR are weak and arise due to non-representative sampling
P(E|+) / P(E|-)
A<->B
P(E|+) / P(E|-)
P(E|+) / P(E|-)
P(E|+) / P(E|-)
χ2-test
Reciprocally best hits
Reciprocally best hits
P r o t e o m e A
P r o t e o m e B
Problem: Solution:Multinet
Decide which types of FC are needed (provide as positive training sets) and
perform the previous steps customized
Definitions and notions of FC vary
A<>B
P(E|+) / P(E|-)
A| B
P(E|+) / P(E|-)
P(E|+) / P(E|-)
P(E|+) / P(E|-)
P(E|+) / P(E|-)
P(E|+) / P(E|-)
A<>B
A||B
A|B
Proteins of the Parkinson’s disease pathway (KEGG #05020)
Physical protein-protein interaction
“Signaling” link
Metabolic “non-signaling” link
Multinet presents several link types in parallel
The limits of data integration
1 2 3 4 5
N o . o f spec ies
0.004
0.005
0.006
0.007
0.008
0.009
0.010
0.011
0.012
0.013
Are
a un
der
RO
C,
spec
ifici
ty >
96% P C A -p rocessed
R aw da ta
4 8 12 16 20 24 28 32 36 40 44
N o . o f features
0.004
0.005
0.006
0.007
0.008
0.009
0.010
0.011
0.012
0.013
Are
a un
der
RO
C,
spec
ifici
ty >
96%
P C A -p rocessed R aw da ta
FunCoup’s web interface
Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics. 2005 Dec 15;21(24):4432-3. Epub 2005 Sep 27.
http://FunCoup.sbc.su.se
Reconctructing the “regulatory blueprint”* in C. intestinalis
*Im
ai K
S, L
evin
e M
, Sat
oh N
, Sat
ou Y
(20
06)
Reg
ulat
ory
blue
prin
t for
a c
hord
ate
embr
yo. S
cien
ce, 2
6:11
83-7
.
Proteins of the “Regulatory Blueprint for a Chordate Embryo” [*]
18 links mentioned in [*] AND found by FunCoup
Links found by FunCoup (about 140)
The rest, 202 links from [*] that FunCoup did not find, not shown
Orthologs
Functional linkInparalogsC. elegansD. melanogasterhumanS cerevisiae
Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, 137-143
Problem: Solution:Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution
Distribution areas informative of FC may vary
0-1 1Pearson r
+ + + + + + + +++ +++ +++ ++ + ++
- - - ----- -- ------ - - -- - - -
ValidationJack-knife procedure:
Take “positive” and “negative” sets Split each randomly as 50:50 Use the first parts to train the algorithm, the second to test the
performance Repeat a number of times
Analysis Of VAriance:
Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.)
Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates
Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)