Upload
keiichiro-ono
View
413
Download
1
Embed Size (px)
Citation preview
Overview of Modern Graph Analysis ToolsKeiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology
5/24/2016 Ideker Lab Meeting
Recap
Cytoscape Session File — for sharing results
But what about process?
http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/
https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
Reproducibility…it’s a known issue
Data Preparation
Analysis Visualization
Advanced Users: Cytoscape for Interactive Visualization
R/Python for Data Manipulation / Analysis
Lab Notebook for in silico Experiments
Interactive Command-Line +
Markdown-based Documents
Question
• Cytoscape is a desktop application
• Point & click GUI operation
• Easy to use, but how can we make our workflow reproducible?
REST
What is cyREST?
- Platform-independent, RESTful API module for Cytoscape - Means you can access basic Cytoscape data objects
programmatically - Now it’s a Cytoscape Core feature!
REST
Get full network with unique ID 52 as JSON
GET http://localhost:1234/v1/networks/52
But, don’t use cyREST (directly)!
Language-Specific Shims
For Python For R
RCy3
• R wrapper for cyREST
• Now a part of Bioconductor
• Easy to install
• Natural API for R users
py2cytoscape
• Python wrapper for cyREST
• Supports high-level API
• Cytoscape.js viewer included
• Supports for iOS/Android
Example
Creating an empty network with raw cyREST API
…and with py2cytoscape
http://nbviewer.jupyter.org/gist/keiono/73da21846b6f73de70122bdb545c1c14
https://github.com/cytoscape/cyREST/wiki/Running-your-workflow-in-the-clouds
Now you have…
• Programmatic access to Cytoscape functions
• Notebooks to run your workflows
• Remote machines (clusters/clouds) for CPU intensive tasks
Graph Libraries as Analytic Engine for Cytoscape
In-Memory Graph Analysis
N < millions
NetworkX
Pros:- Easy to install- Most of basic graph operations
Cons:- Slow!
igraph
Pros:
- Has a lot of analysis featuresStandard graph statistics, community detection, label propagation, etc.
- Fast (comparing to NetworkX)
Cons:
- Weird API (for Python Users)
graph-tool
Pros:
- Fast (Optimized with C++)- Nice visualization features
Cons:
Hard to install
Parallel Graph Analytics (PGX)
- Oracle’s experimental project- There are lots of unknowns due to its stage (early experimental release), but has a lots of features, just like igraph
Don’t use NetworkX for large data sets…
FYI: GPU-Based Layouts
~100x faster
Out-of-Core Graph Analysis
N > billions
GraphX
• Part of Apache Spark Project
• Industry Standard
• Lots of documentation and supports from the community
• You can use Python and R, but in Spark world, Scala is still the first-class citizen…
End-to-end PageRank performance (20 iterations, 3.7B edges)
GraphLab Create
• Commercial Service by Dato
• High-level API and data structure
• SFrame/SGraph
• Their version of scalable-DataFrames
• (Semi) automatic parallel processing
Neo4j v3- This one focuses on storing arbitrary large graph (billions of nodes /edges) data
- Has some analysis features
- Now natively support Python
Summary
• Don’t use NetworkX unless it’s necessary!
• Don’t use raw cyREST API if you are Python/R users
• There are lots of new graph analysis tools
• Some of them are bit hard to install / Setup
• Candidates for CI services (?)
• We deploy to servers, and you can access from simple API
2016 Keiichiro Ono [email protected]