Quick Introduction to Cytoscape for Undergraduates

  • View
    108

  • Download
    2

  • Category

    Science

Preview:

DESCRIPTION

Quick introduction to Cytoscape & tutorial for undergraduate students. 5/12/2014 @ UCSD

Citation preview

Biological Network Visualization with Cytoscape

Keiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology!5/12/2014 Workshop for Undergraduate Bioinformatics Club at UCSD

Sample Data Files:

http://cl.ly/VTJs

Made with Cytoscape

Keiichiro Ono

Cytoscape Core Developer!Area of Interest:Data Integration & Visualization

Keiichiro Ono

Computer Science Biology

Keiichiro Ono

Computer Science

Keiichiro Ono

Data Visualization Programming: Java, JavaScript, Python, R, etc

Software Engineering Web Development

Practitioner > Researcher

Outline

• Part 1: Introduction to Cytoscape

• What is Cytoscape?

• Basic Features

• Part 2: Hands-On Tutorial

• Visualize gene expression values and network

• Import data from public databases (optional)

What is Cytoscape?

An Open Source Platform for Biological Network Data Integration, Analysis and Visualization

Cytoscape

Cytoscape- Open Source (LGPL)

- Free for both commercial and academic use - Developed and maintained by universities, companies, and research institutions

- De-facto standard software in biological network research community

- Expandable by Apps- This is why Cytoscape is a Platform, not a simple desktop application

EP300

PPARG

SMARCD3

STMN1

SMARCA4

OPTN

ATP6V1C1

PSMD1

HTT

PRNP

HNRNPUL1

CCDC88A

CLU

HSP90AB1

SMARCD3

MAP4K4

MIF4GDUSP11

MARCH6TUBB

EDF1 CHD8

Protein-Protein Interactions

Directed Network

KEGG Pathway (TCA Cycle) visualized by Cytoscape KGMLReader

Large-Scale Network Analysis and Visualization

Human Interactome data from BioGRID visualized by Cytoscape

…But why we need such tool for biology?

C. Elegans Interactome from BioGRID Database

?

Biological Networks

- Tell us anything by themselves - Just a big hairball…

Module 1

Module 2

In other words…

Module 1

Need a tool to extract meaningful biological modules

Basic Use Case

Networks

Public Interaction Databases

List of Genes

Other Data

Network Data Analysis

Analysis

Graph Analysis

NetworkX

igraph

Cytoscape

Python

Pandas

NumPy

SciPy

Excel

Visualization

Desktop

Gephi

Cytoscape

matplotlib

Web

Cytoscape.js

sigma.js

d3

NDV3

d3.chart

Google Charts

Data Storage

Graph

Neo4j

GraphXDocument

MongoDB

Relational

MySQL

IPython

3rd Party Apps

NetworkAnalyzer

Network Data Analysis

Analysis

Graph Analysis

NetworkX

igraph

Cytoscape

Python

Pandas

NumPy

SciPy

Excel

Visualization

Desktop

Gephi

Cytoscape

matplotlib

Web

Cytoscape.js

sigma.js

d3

NDV3

d3.chart

Google Charts

Data Storage

Graph

Neo4j

GraphXDocument

MongoDB

Relational

MySQL

IPython

3rd Party Apps

NetworkAnalyzer

Network Data Analysis

Analysis

Graph Analysis

NetworkX

igraph

Cytoscape

Python

Pandas

NumPy

SciPy

Excel

Visualization

Desktop

Gephi

Cytoscape

matplotlib

Web

Cytoscape.js

sigma.js

d3

NDV3

d3.chart

Google Charts

Data Storage

Graph

Neo4j

GraphXDocument

MongoDB

Relational

MySQL

IPython

3rd Party Apps

NetworkAnalyzer

3 Basic Steps of Data Visualization with Cytoscape

<?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <!-- Created by igraph --> <key id="degree" for="node" attr.name="degree" attr.type="double"/> <key id="betweenness" for="node" attr.name="betweenness"

attr.type="double"/> <graph id="G" edgedefault="directed"> <node id="n0"> <data key="degree">79</data> <data key="betweenness">0</data> </node> <node id="n1"> <data key="degree">9</data> <data key="betweenness">167</data> </node> <node id="n2"> <data key="degree">18</data> <data key="betweenness">75</data> </node> <node id="n3"> <data key="degree">8</data> <data key="betweenness">12</data> </node> <node id="n4"> <data key="degree">26</data> <data key="betweenness">210</data> </node> <node id="n5"> <data key="degree">29</data> <data key="betweenness">320</data> </node>

Data Integration

Analysis

Visualization

Network Data

Annotated Networks

Attributes

Analyzed Data

Apps

Cytoscape Apps- Extension programs to

add new features to Cytoscape (were called Plugins)

- Large App developer/user community - This is why Cytoscape

is so successful in life science community!

(As of 4/5/2014)

APPS.CYTOSCAPE.ORG

Quick Overview of Apps

A travel guide to Cytoscape plugins !Rintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng-Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076

Tips for Learning Tools

Choose a Right Tool

Choose a Right ToolAnalysis VisualizationData Preparation

Data Visualization Tools

http://selection.datavisualization.ch/

Data Visualization Tools

http://selection.datavisualization.ch/

Data Visualization Tools

http://selection.datavisualization.ch/

Tools

• In some cases, you can finish exact same tasks using different tools

• Example: Data preparation (cleansing)

• But if you choose right tools, you can do it 100x faster than others.

• ex: Re-formatting complex data sets

• Excel vs Python Script

• Some recommendations:

• R/Bioconductor, Python/Pandas, Git/GitHub/Gist

Learning Tools = Saving Your Time

Hands-on: Introduction to Data Visualization with Cytoscape

50-60 min.

Data Visualization

- Goal: Help others to understand your data

- Emphasize what you want to tell

- Use color, shape,

size of objects effectively!

- Excellent resource for data visualization

- Tamara Munzner’s Web Site: http://www.cs.ubc.ca/~tmm/

Data Visualization

Today’s Goal

Story: !

I want to show gene expression changes over time as a network diagram

YPL201C YPL211W YML007WYPL131W YOR327CYDR171W YCL067C

YCL032W YGL208WYER074WYBL050W YLR134WYPL149W

YDR050C YMR311CYGL134WYER102W YBR112CYKL101W YNL199C

YPL222W

YLR264W

YPL089C

YNL098C

YLL028W

YBR072W

YOR326W

YJR066W

YOR039W YNL135C YPR041WYDR174W YIL074C YKL028WYOR362C

YIL162W

YNL189W

YOR212W

YPR080W

YPR145W

YLL019C

YLR284CYPL031C YFR037CYML074C YPL240CYPR048W YBR274W YBR050C YML032C

YJR022WYBR248C YDR382W YER081WYIR009W YDR244W YOL016C

YER103W YGR058WYLR256WYAL003W YOR355WYIL061C YER111C YMR309C

YPL248CYOL127W YBR019CYLR362W YGL035CYPR167C YML123C

YBL026WYLL021W YNL091W YOR178C YIL113WYLR321C YML064C YMR117C YDL194WYNR007C

YOL058WYBR045CYER065CYNL167CYNL047C YGL097WYHR071W YDL078C YDL081CYDR354W

YER145C YGR136WYDR311W YPR119WYER112W YLR214WYCR012W

YER143W YBR043CYKL204W

YGR019WYEL041W YER133W

YOL149W YBR118WYAL038W YDR167WYMR058WYER079W YMR291W

YKL012W YDL113CYDR299W YDL075W YDL236W YGL229CYLR377C YNL145W

YNL236W YJL030W

YOL156W

YGL013C

YHR171W

YBL021CYMR021C

YHR174W

YFL038C

YER090WYPR062W YAR007C YNL307C YGL237CYML024WYDR335W YLR075W

YNL050CYGR046W YAL040CYLR191W YMR138WYIL045W YHR005C YNL301C

YKL211CYLR452C YPL075WYML051W YOL123W YGR088WYHR198C YMR300C

YJR060W YMR043WYPR124WYLR081W YLR319CYKL074C YOR036WYKL001C

YDR100W

YDR395W

YDR009W

YDR309C

YPR102C

YAL030W

YHR084W

YLR345W

YBR170C

YJL089WYFL026W YBR018C YGL115W YHR179WYDL215CYGR009C YOL120C

YFL017C YDR429C

YIL052C YGL073W

YGR108WYPR035W

YJL190CYOL086C YHR055CYBL005WYKR026C

YBR155W

YOR264W YKL109W

YOR167C

YDR070CYEL015W

YIL133C

YGL166WYHR030CYGL008C

YMR146C YBR160W

YOL136C

YOL051W

YBR020W YBR190WYDR323CYLR197W YFR014CYKL161C

YML054C YKR099WYLR340WYGL106W YBR093CYCL040W

YLR044C

YCR086WYDL130W

YJL203W

YEL009CYBR135W

YOR361C YGR085C

YER056CA

YNL216WYMR005W

YBR109C

YLR229C YER124C

YJL157C YDR461WYNL154CYLR117C YKR097W YIL069CYMR186W YJR109CYIL015W

YER040WYDR103W YGR074WYER052C YIL160CYOR290C YLR249W

YGL153WYOR215CYGR254W YLR432WYCR084CYOR089CYGR218W YOR303W

YGL161C YLR293CYDL030WYNL036W YHR135CYER179W YDR277CYDR184CYNL312W YML114C YFL039C YOL059WYER054C YER110CYLR109W YLR116WYNL214W YBL069W

YHR141CYER116CYJL219W YPL111WYDL023C YGL202WYER062C YMR183CYFR034CYGL122C

YIL105C YDL088CYPR010C YJR048W YIL070C YEL039CYDR412WYMR108W

YOR204W YMR255W YLR175W YHR115CYNL164C YJL013C YDL063C YNL117W

YIL143CYOR315W YDR146CYLR310CYGR014WYBR217W

YNR053C

YJL036W

YNL116W

YOR120W

YDL014W

YJL194W

YDL013W

YDR032C

YOR310C YPR113W

YLR153C

YGR048W

YGR203W

YNL113WYOR202W

YNR050C

YCL030C

YJL159W

YHR053CYPR110C?YLR258W YBL079W

YNL069C YNL311CYDR142C YGL044CYMR044W

What is Great Visualization…?

Design is complicated, because humans are complicated. Design is a process to avoid bad designs.

Mike Bostock (New York Times Visualization Team. Creator of D3.js)

It is hard to generalize the design process, but we can avoid pitfalls by following some basic rules.

Avoid Chartjunk

Edward Tufte

http://en.wikipedia.org/wiki/File:Chartjunk-example.svg

Every pixel should carry information.

Edward Tufte

Avoid Data Overload

• Mapping too many attributes makes your visualization awful!

• It is hard to see the overall trend of your data sets if too many channels are used in a image

“Great Artists Steal…”

MSL5

TEM1

PRP40

MUD2

HAP4HAP2

CYC1

GCY1

HAP3

YHR198C

ECI1

YEL015W

GAL1

GAL7

GAL80

GAL3

GAL11

GAL4

GAL2

MLS1

SIP4

FBP1

GAL10

SWI5

SUC2

MIG1

ADH1PGK1

CDC19

GCR1

CBF1ENO1

ENO2

MCK1

CYC7

HAP1

CTT1

NCE103

SSL2

TFB1YNL091W

TRP4

ARG1

GCN4

SKO1

HIS3

ADE4 ILV2

TIF35

TIF5 NIP1

GNA1

PRE10

PRT1

YDR070C

GPD2

RPS17A

BAS1

HIS7

RPS24B

MSL1

HIS4

PDC5

PHO84

PHO4

YNL047C YIL105C

MET16

RPL11BRPS8B

RPL10

RPL11A

CKS1

RPL31A

PHO13

PDC1

SXM1RPL34B

RPL16B

ATC1

CAR1

FCY1

RFA2

ICL1SRP1

TPI1RPL18B

RPL25

PHO5

RPS24ARPL18A

DMC1 RAP1

RPL16A

HSP42

MSL5

TEM1

PRP40

MUD2

HAP4HAP2

CYC1

GCY1

HAP3

YHR198C

ECI1

YEL015W

GAL1

GAL7

GAL80

GAL3

GAL11

GAL4

GAL2

MLS1

SIP4

FBP1

GAL10

SWI5

SUC2

MIG1

ADH1PGK1

CDC19

GCR1

CBF1ENO1

ENO2

MCK1

CYC7

HAP1

CTT1

NCE103

SSL2

TFB1YNL091W

TRP4

ARG1

GCN4

SKO1

HIS3

ADE4 ILV2

TIF35

TIF5 NIP1

GNA1

PRE10

PRT1

YDR070C

GPD2

RPS17A

BAS1

HIS7

RPS24B

MSL1

HIS4

PDC5

PHO84

PHO4

YNL047C YIL105C

MET16

RPL11BRPS8B

RPL10

RPL11A

CKS1

RPL31A

PHO13

PDC1

SXM1RPL34B

RPL16B

ATC1

CAR1

FCY1

RFA2

ICL1SRP1

TPI1RPL18B

RPL25

PHO5

RPS24ARPL18A

DMC1 RAP1

RPL16A

HSP42

Map gene expression values to color

Avoid using more colors in other components (edge/label)

If necessary, map other data into non-overlapping visual properties

(edge score to width)

Part 1: Session File and Basic Navigation

Cytoscape 3.1 Desktop

Toolbar

Network Panel

Bird’s Eve View

Table Browser

Network Views

Table Browser

Local Column

Table Tabs

List Data(Values in [ ])

Shared Column

Session File

- Snapshot of your workspace - Networks - Tables - Visual Styles - System Properties

Open a Session

- Click folder icon - Or, File → Open

Exercise 1: Loading a session

Navigation- Pan: Middle-Click + Drag or

Command + Left-Click + Drag on Mac - Zoom

- IN: Mouse Wheel UP - OUT: Mouse Wheel DOWN

- Selection: Left-Click and Drag - Fit to Window

- Selected region - Entire network

First Neighbor of Nodes

CTR+6

Create New Sub-Network From Selection

CTR+N

- CTR (Command on Mac) + G

Part 2: Data Import

Network Data Formats- SIF - GML - XGMML - GraphML - BioPAX - PSI-MI - SBML - KGML (KEGG) - Excel - Text Table - CSV - Tab

NCBI Gene ID 672

On Chromosome 17

GO Terms DNA Repair Cell Cycle

DNA Binding

Ensemble ID ENSG00000012048

BRCA1

Data Tables for Cytoscape- Example:

- Numeric- Gene expression profiles - Network statistics calculated in other

applications, such as R - Confidence scores for edges

- Text (or categorical)- GO annotation for genes - List of genes related to disease X - Targets for FDA approved drugs - Genes on KEGG Pathway Y - Clusters / group / community calculated

in external programs - …

Your Data Sets- Anything saved as a table can be

loaded into Cytoscape - Excel - Tab Delimited Document - CSV

- As long as proper mapping key is available, Cytoscape can map them to your networks.

Mapping Key in the Network

Mapping Key in the Table

Exercise 2: Loading network and tables

Part 3: Visualization

Layouts

Automatic Layout

- Choose proper algorithm - Tree-like data - Hierarchical Layout - Scale-Free Network - Force-directed - Circular process - Circular Layout

- Tweak parameters if necessary

Manual Layout

- Tweak result from automatic layout - Scale - Align - Rotate

Exercise 3: Apply layouts

Visual Style

- Collection of mappings from Attributes to Visual Properties

Visual Styles

- Defaults + Mappings - Expression values to node color - Gene function to node shape - Interaction detection method to edge line

type - Confidence score to edge width

Core Idea: Data Controls The View

Data Controls The View• Photoshop / Illustrator

• You control the pixels and objects on the display

• Data Visualization Tools (including Cytoscape)

• Data points are mapped to visual properties

• Color

• Size

Data Controls The View

Expression Values To Node Colors

Discrete Mapping Editor

Continuous Mapping Editor

Exercise 4: Create New Visual Style

Part 4: Web Services (Optional)

Cytoscape Ecosystem

Dawn of Web-Based Visualization

Cytoscape Family

- cytoscape.js: Library for web applications

JS

Cytoscape 3.1.0

JS

JS

Cytoscape.js Network Visualization Library Running on Web Browsers

What is cytoscape.js?

A Javascript Library for network visualization, not a web application!

Need to write some code to use it on the web browsers…

Complete desktop application for network

analysis and visualization !

Written in Java !

Expandable by Apps !

For Users

A Javascript Library for network visualization, not a web application!

!Written in JavaScript

!Expandable by Extensions

!

For Developers

JS

Analysis

Data Integration

Cytoscape Desktop

Cytoscape.js

Visualization

Minimal Analysis

Cytoscape

Web

Desktop

Layout

Visual Style

Visual Style

Layout

Visualization

Integration to Cytoscape

New in Cytoscape 3.1.0: Export Networks and Visual Styles to Cytoscape.js Format

JS

Future

Cytoscape Cyberinfrastructure

Internet

Service 1 Service 2

NDEx (DB)

Web Browser

Cytoscape Desktop

-

- Two Google Groups

- cytoscape-discuss@googlegroups.com

- cytoscape-helpdesk@googlegroups.com

- ANY question is OK!

Getting Help

Further Readings

Further Readings

• My presentation slides

• http://www.slideshare.net/keiono

• (This deck of slides will be uploaded tonight)

Further Readings 1- Introduction to Network Biology

- Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol 3(3): e42.doi:10.1371/journal.pcbi.0030042

- Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/journal.pcbi.0030043

Further Readings 2- Overview of Cytoscape Apps (Plugins)

- A travel guide to Cytoscape pluginsRintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng-Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076

- Sample Protocol (based on 2.x)

− Integration of biological networks and gene expression data using CytoscapeCline, et al. Nature Protocols, 2, 2366-2382 (2007).

Further Readings 3

- Cytoscape Tutorial Booklet: Analysis and Visualization of Biological Networks with Cytoscape

- http://www.rbvi.ucsf.edu/Outreach/Workshops/ISMBTutorial.pdf

!

2014 Keiichiro Ono kono@ucsd.edu

Recommended