View
108
Download
2
Category
Tags:
Preview:
DESCRIPTION
Quick introduction to Cytoscape & tutorial for undergraduate students. 5/12/2014 @ UCSD
Citation preview
Biological Network Visualization with Cytoscape
Keiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology!5/12/2014 Workshop for Undergraduate Bioinformatics Club at UCSD
Made with Cytoscape
Keiichiro Ono
Cytoscape Core Developer!Area of Interest:Data Integration & Visualization
Keiichiro Ono
Computer Science Biology
Keiichiro Ono
Computer Science
Keiichiro Ono
Data Visualization Programming: Java, JavaScript, Python, R, etc
Software Engineering Web Development
Practitioner > Researcher
Outline
• Part 1: Introduction to Cytoscape
• What is Cytoscape?
• Basic Features
• Part 2: Hands-On Tutorial
• Visualize gene expression values and network
• Import data from public databases (optional)
What is Cytoscape?
An Open Source Platform for Biological Network Data Integration, Analysis and Visualization
Cytoscape
Cytoscape- Open Source (LGPL)
- Free for both commercial and academic use - Developed and maintained by universities, companies, and research institutions
- De-facto standard software in biological network research community
- Expandable by Apps- This is why Cytoscape is a Platform, not a simple desktop application
EP300
PPARG
SMARCD3
STMN1
SMARCA4
OPTN
ATP6V1C1
PSMD1
HTT
PRNP
HNRNPUL1
CCDC88A
CLU
HSP90AB1
SMARCD3
MAP4K4
MIF4GDUSP11
MARCH6TUBB
EDF1 CHD8
Protein-Protein Interactions
Directed Network
KEGG Pathway (TCA Cycle) visualized by Cytoscape KGMLReader
Large-Scale Network Analysis and Visualization
Human Interactome data from BioGRID visualized by Cytoscape
…But why we need such tool for biology?
C. Elegans Interactome from BioGRID Database
?
Biological Networks
- Tell us anything by themselves - Just a big hairball…
Module 1
Module 2
In other words…
Module 1
Need a tool to extract meaningful biological modules
Basic Use Case
Networks
Public Interaction Databases
List of Genes
Other Data
Network Data Analysis
Analysis
Graph Analysis
NetworkX
igraph
Cytoscape
Python
Pandas
NumPy
SciPy
Excel
Visualization
Desktop
Gephi
Cytoscape
matplotlib
Web
Cytoscape.js
sigma.js
d3
NDV3
d3.chart
Google Charts
Data Storage
Graph
Neo4j
GraphXDocument
MongoDB
Relational
MySQL
IPython
3rd Party Apps
NetworkAnalyzer
Network Data Analysis
Analysis
Graph Analysis
NetworkX
igraph
Cytoscape
Python
Pandas
NumPy
SciPy
Excel
Visualization
Desktop
Gephi
Cytoscape
matplotlib
Web
Cytoscape.js
sigma.js
d3
NDV3
d3.chart
Google Charts
Data Storage
Graph
Neo4j
GraphXDocument
MongoDB
Relational
MySQL
IPython
3rd Party Apps
NetworkAnalyzer
Network Data Analysis
Analysis
Graph Analysis
NetworkX
igraph
Cytoscape
Python
Pandas
NumPy
SciPy
Excel
Visualization
Desktop
Gephi
Cytoscape
matplotlib
Web
Cytoscape.js
sigma.js
d3
NDV3
d3.chart
Google Charts
Data Storage
Graph
Neo4j
GraphXDocument
MongoDB
Relational
MySQL
IPython
3rd Party Apps
NetworkAnalyzer
3 Basic Steps of Data Visualization with Cytoscape
<?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <!-- Created by igraph --> <key id="degree" for="node" attr.name="degree" attr.type="double"/> <key id="betweenness" for="node" attr.name="betweenness"
attr.type="double"/> <graph id="G" edgedefault="directed"> <node id="n0"> <data key="degree">79</data> <data key="betweenness">0</data> </node> <node id="n1"> <data key="degree">9</data> <data key="betweenness">167</data> </node> <node id="n2"> <data key="degree">18</data> <data key="betweenness">75</data> </node> <node id="n3"> <data key="degree">8</data> <data key="betweenness">12</data> </node> <node id="n4"> <data key="degree">26</data> <data key="betweenness">210</data> </node> <node id="n5"> <data key="degree">29</data> <data key="betweenness">320</data> </node>
Data Integration
Analysis
Visualization
Network Data
Annotated Networks
Attributes
Analyzed Data
Apps
Cytoscape Apps- Extension programs to
add new features to Cytoscape (were called Plugins)
- Large App developer/user community - This is why Cytoscape
is so successful in life science community!
Quick Overview of Apps
A travel guide to Cytoscape plugins !Rintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng-Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076
Tips for Learning Tools
Choose a Right Tool
Choose a Right ToolAnalysis VisualizationData Preparation
Data Visualization Tools
http://selection.datavisualization.ch/
Data Visualization Tools
http://selection.datavisualization.ch/
Data Visualization Tools
http://selection.datavisualization.ch/
Tools
• In some cases, you can finish exact same tasks using different tools
• Example: Data preparation (cleansing)
• But if you choose right tools, you can do it 100x faster than others.
• ex: Re-formatting complex data sets
• Excel vs Python Script
• Some recommendations:
• R/Bioconductor, Python/Pandas, Git/GitHub/Gist
Learning Tools = Saving Your Time
Hands-on: Introduction to Data Visualization with Cytoscape
50-60 min.
Data Visualization
- Goal: Help others to understand your data
- Emphasize what you want to tell
- Use color, shape,
size of objects effectively!
- Excellent resource for data visualization
- Tamara Munzner’s Web Site: http://www.cs.ubc.ca/~tmm/
Data Visualization
Today’s Goal
Story: !
I want to show gene expression changes over time as a network diagram
YPL201C YPL211W YML007WYPL131W YOR327CYDR171W YCL067C
YCL032W YGL208WYER074WYBL050W YLR134WYPL149W
YDR050C YMR311CYGL134WYER102W YBR112CYKL101W YNL199C
YPL222W
YLR264W
YPL089C
YNL098C
YLL028W
YBR072W
YOR326W
YJR066W
YOR039W YNL135C YPR041WYDR174W YIL074C YKL028WYOR362C
YIL162W
YNL189W
YOR212W
YPR080W
YPR145W
YLL019C
YLR284CYPL031C YFR037CYML074C YPL240CYPR048W YBR274W YBR050C YML032C
YJR022WYBR248C YDR382W YER081WYIR009W YDR244W YOL016C
YER103W YGR058WYLR256WYAL003W YOR355WYIL061C YER111C YMR309C
YPL248CYOL127W YBR019CYLR362W YGL035CYPR167C YML123C
YBL026WYLL021W YNL091W YOR178C YIL113WYLR321C YML064C YMR117C YDL194WYNR007C
YOL058WYBR045CYER065CYNL167CYNL047C YGL097WYHR071W YDL078C YDL081CYDR354W
YER145C YGR136WYDR311W YPR119WYER112W YLR214WYCR012W
YER143W YBR043CYKL204W
YGR019WYEL041W YER133W
YOL149W YBR118WYAL038W YDR167WYMR058WYER079W YMR291W
YKL012W YDL113CYDR299W YDL075W YDL236W YGL229CYLR377C YNL145W
YNL236W YJL030W
YOL156W
YGL013C
YHR171W
YBL021CYMR021C
YHR174W
YFL038C
YER090WYPR062W YAR007C YNL307C YGL237CYML024WYDR335W YLR075W
YNL050CYGR046W YAL040CYLR191W YMR138WYIL045W YHR005C YNL301C
YKL211CYLR452C YPL075WYML051W YOL123W YGR088WYHR198C YMR300C
YJR060W YMR043WYPR124WYLR081W YLR319CYKL074C YOR036WYKL001C
YDR100W
YDR395W
YDR009W
YDR309C
YPR102C
YAL030W
YHR084W
YLR345W
YBR170C
YJL089WYFL026W YBR018C YGL115W YHR179WYDL215CYGR009C YOL120C
YFL017C YDR429C
YIL052C YGL073W
YGR108WYPR035W
YJL190CYOL086C YHR055CYBL005WYKR026C
YBR155W
YOR264W YKL109W
YOR167C
YDR070CYEL015W
YIL133C
YGL166WYHR030CYGL008C
YMR146C YBR160W
YOL136C
YOL051W
YBR020W YBR190WYDR323CYLR197W YFR014CYKL161C
YML054C YKR099WYLR340WYGL106W YBR093CYCL040W
YLR044C
YCR086WYDL130W
YJL203W
YEL009CYBR135W
YOR361C YGR085C
YER056CA
YNL216WYMR005W
YBR109C
YLR229C YER124C
YJL157C YDR461WYNL154CYLR117C YKR097W YIL069CYMR186W YJR109CYIL015W
YER040WYDR103W YGR074WYER052C YIL160CYOR290C YLR249W
YGL153WYOR215CYGR254W YLR432WYCR084CYOR089CYGR218W YOR303W
YGL161C YLR293CYDL030WYNL036W YHR135CYER179W YDR277CYDR184CYNL312W YML114C YFL039C YOL059WYER054C YER110CYLR109W YLR116WYNL214W YBL069W
YHR141CYER116CYJL219W YPL111WYDL023C YGL202WYER062C YMR183CYFR034CYGL122C
YIL105C YDL088CYPR010C YJR048W YIL070C YEL039CYDR412WYMR108W
YOR204W YMR255W YLR175W YHR115CYNL164C YJL013C YDL063C YNL117W
YIL143CYOR315W YDR146CYLR310CYGR014WYBR217W
YNR053C
YJL036W
YNL116W
YOR120W
YDL014W
YJL194W
YDL013W
YDR032C
YOR310C YPR113W
YLR153C
YGR048W
YGR203W
YNL113WYOR202W
YNR050C
YCL030C
YJL159W
YHR053CYPR110C?YLR258W YBL079W
YNL069C YNL311CYDR142C YGL044CYMR044W
What is Great Visualization…?
Design is complicated, because humans are complicated. Design is a process to avoid bad designs.
Mike Bostock (New York Times Visualization Team. Creator of D3.js)
It is hard to generalize the design process, but we can avoid pitfalls by following some basic rules.
Avoid Chartjunk
Edward Tufte
http://en.wikipedia.org/wiki/File:Chartjunk-example.svg
Every pixel should carry information.
Edward Tufte
Avoid Data Overload
• Mapping too many attributes makes your visualization awful!
• It is hard to see the overall trend of your data sets if too many channels are used in a image
“Great Artists Steal…”
MSL5
TEM1
PRP40
MUD2
HAP4HAP2
CYC1
GCY1
HAP3
YHR198C
ECI1
YEL015W
GAL1
GAL7
GAL80
GAL3
GAL11
GAL4
GAL2
MLS1
SIP4
FBP1
GAL10
SWI5
SUC2
MIG1
ADH1PGK1
CDC19
GCR1
CBF1ENO1
ENO2
MCK1
CYC7
HAP1
CTT1
NCE103
SSL2
TFB1YNL091W
TRP4
ARG1
GCN4
SKO1
HIS3
ADE4 ILV2
TIF35
TIF5 NIP1
GNA1
PRE10
PRT1
YDR070C
GPD2
RPS17A
BAS1
HIS7
RPS24B
MSL1
HIS4
PDC5
PHO84
PHO4
YNL047C YIL105C
MET16
RPL11BRPS8B
RPL10
RPL11A
CKS1
RPL31A
PHO13
PDC1
SXM1RPL34B
RPL16B
ATC1
CAR1
FCY1
RFA2
ICL1SRP1
TPI1RPL18B
RPL25
PHO5
RPS24ARPL18A
DMC1 RAP1
RPL16A
HSP42
MSL5
TEM1
PRP40
MUD2
HAP4HAP2
CYC1
GCY1
HAP3
YHR198C
ECI1
YEL015W
GAL1
GAL7
GAL80
GAL3
GAL11
GAL4
GAL2
MLS1
SIP4
FBP1
GAL10
SWI5
SUC2
MIG1
ADH1PGK1
CDC19
GCR1
CBF1ENO1
ENO2
MCK1
CYC7
HAP1
CTT1
NCE103
SSL2
TFB1YNL091W
TRP4
ARG1
GCN4
SKO1
HIS3
ADE4 ILV2
TIF35
TIF5 NIP1
GNA1
PRE10
PRT1
YDR070C
GPD2
RPS17A
BAS1
HIS7
RPS24B
MSL1
HIS4
PDC5
PHO84
PHO4
YNL047C YIL105C
MET16
RPL11BRPS8B
RPL10
RPL11A
CKS1
RPL31A
PHO13
PDC1
SXM1RPL34B
RPL16B
ATC1
CAR1
FCY1
RFA2
ICL1SRP1
TPI1RPL18B
RPL25
PHO5
RPS24ARPL18A
DMC1 RAP1
RPL16A
HSP42
Map gene expression values to color
Avoid using more colors in other components (edge/label)
If necessary, map other data into non-overlapping visual properties
(edge score to width)
Part 1: Session File and Basic Navigation
Cytoscape 3.1 Desktop
Toolbar
Network Panel
Bird’s Eve View
Table Browser
Network Views
Table Browser
Local Column
Table Tabs
List Data(Values in [ ])
Shared Column
Session File
- Snapshot of your workspace - Networks - Tables - Visual Styles - System Properties
Open a Session
- Click folder icon - Or, File → Open
Exercise 1: Loading a session
Navigation- Pan: Middle-Click + Drag or
Command + Left-Click + Drag on Mac - Zoom
- IN: Mouse Wheel UP - OUT: Mouse Wheel DOWN
- Selection: Left-Click and Drag - Fit to Window
- Selected region - Entire network
First Neighbor of Nodes
CTR+6
Create New Sub-Network From Selection
CTR+N
- CTR (Command on Mac) + G
Part 2: Data Import
Network Data Formats- SIF - GML - XGMML - GraphML - BioPAX - PSI-MI - SBML - KGML (KEGG) - Excel - Text Table - CSV - Tab
NCBI Gene ID 672
On Chromosome 17
GO Terms DNA Repair Cell Cycle
DNA Binding
Ensemble ID ENSG00000012048
BRCA1
Data Tables for Cytoscape- Example:
- Numeric- Gene expression profiles - Network statistics calculated in other
applications, such as R - Confidence scores for edges
- Text (or categorical)- GO annotation for genes - List of genes related to disease X - Targets for FDA approved drugs - Genes on KEGG Pathway Y - Clusters / group / community calculated
in external programs - …
Your Data Sets- Anything saved as a table can be
loaded into Cytoscape - Excel - Tab Delimited Document - CSV
- As long as proper mapping key is available, Cytoscape can map them to your networks.
Mapping Key in the Network
Mapping Key in the Table
Exercise 2: Loading network and tables
Part 3: Visualization
Layouts
Automatic Layout
- Choose proper algorithm - Tree-like data - Hierarchical Layout - Scale-Free Network - Force-directed - Circular process - Circular Layout
- Tweak parameters if necessary
Manual Layout
- Tweak result from automatic layout - Scale - Align - Rotate
Exercise 3: Apply layouts
Visual Style
- Collection of mappings from Attributes to Visual Properties
Visual Styles
- Defaults + Mappings - Expression values to node color - Gene function to node shape - Interaction detection method to edge line
type - Confidence score to edge width
Core Idea: Data Controls The View
Data Controls The View• Photoshop / Illustrator
• You control the pixels and objects on the display
• Data Visualization Tools (including Cytoscape)
• Data points are mapped to visual properties
• Color
• Size
Data Controls The View
Expression Values To Node Colors
Discrete Mapping Editor
Continuous Mapping Editor
Exercise 4: Create New Visual Style
Part 4: Web Services (Optional)
Cytoscape Ecosystem
Dawn of Web-Based Visualization
Cytoscape Family
- cytoscape.js: Library for web applications
JS
Cytoscape 3.1.0
JS
JS
Cytoscape.js Network Visualization Library Running on Web Browsers
What is cytoscape.js?
A Javascript Library for network visualization, not a web application!
Need to write some code to use it on the web browsers…
Complete desktop application for network
analysis and visualization !
Written in Java !
Expandable by Apps !
For Users
A Javascript Library for network visualization, not a web application!
!Written in JavaScript
!Expandable by Extensions
!
For Developers
JS
Analysis
Data Integration
Cytoscape Desktop
Cytoscape.js
Visualization
Minimal Analysis
Cytoscape
Web
Desktop
Layout
Visual Style
Visual Style
Layout
Visualization
Integration to Cytoscape
New in Cytoscape 3.1.0: Export Networks and Visual Styles to Cytoscape.js Format
JS
Future
Cytoscape Cyberinfrastructure
Internet
Service 1 Service 2
NDEx (DB)
Web Browser
Cytoscape Desktop
-
- Two Google Groups
- cytoscape-discuss@googlegroups.com
- cytoscape-helpdesk@googlegroups.com
- ANY question is OK!
Getting Help
Further Readings
Further Readings
• My presentation slides
• http://www.slideshare.net/keiono
• (This deck of slides will be uploaded tonight)
Further Readings 1- Introduction to Network Biology
- Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol 3(3): e42.doi:10.1371/journal.pcbi.0030042
- Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043
Further Readings 2- Overview of Cytoscape Apps (Plugins)
- A travel guide to Cytoscape pluginsRintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng-Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076
- Sample Protocol (based on 2.x)
− Integration of biological networks and gene expression data using CytoscapeCline, et al. Nature Protocols, 2, 2366-2382 (2007).
Further Readings 3
- Cytoscape Tutorial Booklet: Analysis and Visualization of Biological Networks with Cytoscape
- http://www.rbvi.ucsf.edu/Outreach/Workshops/ISMBTutorial.pdf
!
Recommended