55
Proteomics and systems biology Bing Zhang, Ph.D. Associate Professor Department of Biomedical Informatics Vanderbilt University School of Medicine [email protected]

Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Proteomics and systems biology

Bing Zhang, Ph.D. Associate Professor Department of Biomedical Informatics Vanderbilt University School of Medicine [email protected]

Page 2: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

No man is an island, entire of itself; every man is a piece of the continent, a part of the main. If a clod be washed away by the sea, Europe is the less, as well as if a promontory were, as well as if a manor of thy friend's or of thine own were. Any man's death diminishes me because I am involved in mankind; and therefore never send to know for whom the bell tolls; it tolls for thee.

John Donne, Devotions upon Emergent Occasions, 1624

No man is an island

The impact of a specific genetic abnormality is not restricted to the activity of the gene product that carries it, but can spread along the links of the network and alter the activity of gene products that otherwise carry no defects.

Barabasi et al., Nature Review Genetics, 2011 Disease

Pathways/ Networks

No gene is an island

2

Page 3: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Omics opportunities and challenges

Genome CNV

mutation methylation

Transcriptome expression

splicing

Proteome expression modification

Elephant Data for

individual genes

Understanding of biological systems

3

Page 4: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

4

Page 5: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

5

Page 6: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Genomics "  Genome Wide Association Study (GWAS) "  Whole genome or exome sequencing

!  Transcriptomics "  mRNA profiling

!  Microarray !  RNA-Seq

"  Protein-DNA interaction !  Chromatin immunoprecipitation (ChIP)-Seq

!  Proteomics "  Protein profiling

!  LC-MS/MS "  Protein-protein interaction

!  Yeast two hybrid !  Affinity pull-down/LC-MS/MS

Omics studies generate gene/protein lists

……….. ……….. ……….. ……….. ………. ………. ………. ………. ……… ………………

6

Page 7: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Omics studies generate gene/protein lists

Differential expression

Clustering Lists of genes or proteins with potential biological interest

log2(ratio)

-log1

0(p

valu

e)

Microarray RNA-Seq Proteomics

IPI00375843 IPI00171798 IPI00299485 IPI00009542 IPI00019568 IPI00060627 IPI00168262 IPI00082931 IPI00025084 IPI00412546 IPI00165528 IPI00043992 IPI00384992 IPI00006991 IPI00021885 …...

7

Page 8: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Different ways of data organization

!  Gene 1 "  Pathway 1 "  Pathway 2

!  Gene 2 "  Pathway 1 "  Pathway 3

!  Gene 3 "  Pathway 2 "  Pathway 3

!  Gene 4 "  Pathway 3

!  Pathway 1 "  Gene 1 "  Gene 2

!  Pathway 2 "  Gene 1 "  Gene 3

!  Pathway 3 "  Gene 2 "  Gene 3 "  Gene 4

8

Page 9: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Advantages of pathway analysis

!  Better interpretation "  From interesting genes to interesting biological themes

!  Improved robustness "  Robust against noise in the data

!  Improved sensitivity "  Detecting minor but concordant changes in a pathway

9

Page 10: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

10

Page 11: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Databases "  BioCarta (http://www.biocarta.com/genes/index.asp) "  KEGG (http://www.genome.jp/kegg/pathway.html) "  MetaCyc (http://metacyc.org) "  Pathway commons (http://www.pathwaycommons.org) "  Reactome (http://www.reactome.org) "  STKE (http://stke.sciencemag.org/cm) "  Signaling Gateway (http://www.signaling-gateway.org) "  Wikipathways (http://www.wikipathways.org)

!  Limitation "  Limited coverage "  Inconsistency among different databases "  Relationship between pathways is not defined

Pathway databases

11

Page 12: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Structured, precisely defined, controlled vocabulary for describing the roles of genes and gene products

!  Three organizing principles: molecular function, biological process, and cellular component "  Dopamine receptor D2, the product of human gene DRD2 "  molecular function: dopamine receptor activity "  biological process: synaptic transmission "  cellular component: plasma membrane

!  Terms in GO are linked by several types of relationships "  Is_a (e.g. plasma membrane is_a membrane) "  Part_of (e.g. membrane is part_of cell) "  Has part "  Regulates "  Occurs in

Gene Ontology

12

Page 13: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Gene Ontology

13

Page 14: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Two types of GO annotations "  Electronic annotation "  Manual annotation

!  All annotations must: "  be attributed to a source "  indicate what evidence was found to support the GO term-gene/protein

association

!  Types of evidence codes "  Experimental codes - IDA, IMP, IGI, IPI, IEP "  Computational codes - ISS, IEA, RCA, IGC "  Author statement - TAS, NAS "  Other codes - IC, ND

Annotating genes using GO terms

14

Page 15: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Annotating genes using GO terms

…… ……

DLGAP1 discs, large (Drosophila) homolog-associated protein 1

DLGAP2 discs, large (Drosophila) homolog-associated protein 2

DNM1 dynamin 1

DOC2A double C2-like domains, alpha

DRD1 dopamine receptor D1

DRD1IP dopamine receptor D1 interacting protein

DRD2 dopamine receptor D2

DRD3 dopamine receptor D3

DRD4 dopamine receptor D4

DRD5 dopamine receptor D5

…… ……

Synaptic transmission

226 human genes

Cell-cell signaling

Child

Parent

15

Page 16: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Downloads (http://www.geneontology.org) "  Ontologies

!  http://www.geneontology.org/page/download-ontology "  Annotations

!  http://www.geneontology.org/page/download-annotations

!  Web-based access "  AmiGO: http://www.godatabase.org "  QuickGO: http://www.ebi.ac.uk/QuickGO

Access GO

16

Page 17: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Homo sapiens Mus musculus #term #gene #term #gene

GO/BP 6502 15228 6227 15709 GO/MF 3144 16389 2961 17287 GO/CC 947 16765 882 16801

Coverage of GO annotations

17

Page 18: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Other “gene sets”

!  Transcription factor targets !  miRNA targets !  Protein complexes !  Cytogenetic bands !  Disease-related genes !  Drug targets !  Experimentally-derived gene sets !  ……

18

Page 19: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

19

Page 20: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Is the observed overlap significantly larger than the expected value?

Over-representation analysis: concept

MMP9 SERPINF1 A2ML1 F2 FN1 LYZ TNXB FGG MPO FBLN1 THBS1 HDLBP GSN FBN1 CA2 P11 CCL21 FGB ……

IPI00375843 IPI00171798 IPI00299485 IPI00009542 IPI00019568 IPI00060627 IPI00168262 IPI00082931 IPI00025084 IPI00412546 IPI00165528 IPI00043992 IPI00384992 IPI00006991 IPI00021885 …...

180

Observed 22

83

1305 1305

Random

All identified proteins (1733)

Filter for significant proteins

Differentially expressed protein list

(260 proteins)

Extracellular space

(83 genes)

Compare

180 83

9.2

annotated

20

Page 21: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Over-representation analysis: hypergeometric test

Significant proteins

Non-significant proteins

Total

Proteins in the group k j-k j

Other proteins n-k m-n-j+k m-j Total n m-n m

Hypergeometric test: given a total of m proteins where j proteins are in the functional group, if we pick n proteins randomly, what is the probability of having k or more proteins from the group?

Zhang et.al. Nucleic Acids Res. 33:W741, 2005

p =

m − jn − i

#

$ %

&

' ( ji#

$ % &

' (

mn#

$ % &

' ( i= k

min(n, j )

∑ n

Observed k

j

m

21

Page 22: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

92546_r_at 92545_f_at 96055_at 102105_f_at 102700_at ……

Over-representation analysis: WebGestalt

WebGestalt

8 organisms Human, Mouse, Rat, Dog, Fruitfly, Worm, Zebrafish, Yeast

Microarray Probe IDs •  Affymetrix •  Agilent •  Codelink •  Illumina

Gene IDs •  Gene Symbol •  GenBank •  Ensembl Gene •  RefSeq Gene •  UniGene •  Entrez Gene •  SGD •  MGI •  Flybase ID •  Wormbase ID •  ZFIN

Protein IDs •  UniProt •  IPI •  RefSeq Peptide •  Ensembl Peptide

196 ID types with mapping to Entrez Gene ID

59,278 functional categories with genes identified by Entrez Gene IDs

Gene Ontology •  Biological Process •  Molecular Function •  Cellular Component

Pathway •  KEGG •  Pathway Commons •  WikiPathways

Network module •  Transcription factor targets •  microRNA targets •  Protein interaction modules

Disease and Drug •  Disease association genes •  Drug association genes

Chromosomal location •  Cytogenetic bands

Genetic Variation IDs •  dbSNP

http://www.webgestalt.org

Jul. 1, 2013 – Jun. 30, 2014 49,136 visits from 18,213 visitors

~200 ID types

~60K gene sets

Statistical analysis

Zhang et.al. Nucleic Acids Res. 33:W741, 2005 Wang et al. Nucleic Acids Res. 41:W77, 2013

22

Page 23: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

WebGestalt output: an enriched pathways

TGF Beta Signaling Pathway

Input genes

23

Page 24: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

WebGestalt output: Enriched GO terms

Response to unfolded proteins

12 genes adjp=1.32e-08

24

Page 25: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Does not account for the order of genes in the significant gene list

!  Arbitrary thresholding may lead to lose of information

Limitation of the over-representation analysis

25

Page 26: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

26

Page 27: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Gene Set Enrichment Analysis: concept

!  Do genes in a gene set tend to locate at the top or bottom of the ranked gene list?

27

Page 28: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Gene Set Enrichment Analysis: method

Subramanian et.al. PNAS 102:15545, 2005

http://www.broad.mit.edu/gsea/

+1/k

-1/(n-k)

k: Number of genes in the gene set S n: Number of all genes in the ranked gene list

28

Page 29: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

29

Page 30: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  http://bioinfo.vanderbilt.edu/zhanglab/?q=node/410 !  Using WebGestalt (http://www.webgestalt.org) to identify

GO categories and pathways enriched in proteins up-regulated in the poor-prognosis colorectal cancer (CRC) subtype (Zhang et al. Nature, 513:382-387, 2014).

Exercise

30

Page 31: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Proteomics and systems biology (II)

Bing Zhang, Ph.D. Associate Professor Department of Biomedical Informatics Vanderbilt University School of Medicine [email protected]

Page 32: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

32

Page 33: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Experimental "  Yeast two-hybrid "  Tandem affinity purification

!  Computational (based on evidence of association) "  Gene fusion (form part of a single protein in other organisms) "  Ortholog interaction (orthologs interact in other organisms) "  Phylogenetic profiling (co-existence in different organisms) "  Microarray gene co-expression (co-expression in various

conditions)

!  PPI data integration "  Voting "  Probabilistic integration (Jansen et al. Science 302:449, 2003)

Mapping protein-protein interaction networks

Valencia et al. Curr. Opin. Struct. Biol, 12:368, 2002

Phylogenetic profiling

33

Page 34: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Database of Interacting Proteins (DIP) http://dip.doe-mbi.ucla.edu/

!  The Molecular INTeraction database (MINT) http://mint.bio.uniroma2.it/mint/

!  The Biomolecular Object Network Databank (BOND) http://bond.unleashedinformatics.com/

!  The General Repository for Interaction Datasets (BioGRID) http://www.thebiogrid.org/

!  Human Protein Reference Database (HPRD) http://www.hprd.org

!  Online Predicted Human Interaction Database (OPHID) http://ophid.utoronto.ca

!  iRef http://wodaklab.org/iRefWeb

!  HI-II-14 http://interactome.dfci.harvard.edu/H_sapiens/

!  The International Molecular Exchange Consortium (IMEX) http://www.imexconsortium.org

PPI databases

34

Page 35: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Experimental "  Chromatin immunoprecipitation (ChIP)

!  ChIP-chip !  ChIP-seq

!  Computational "  Promoter sequence analysis "  Reverse engineering from gene expression data

!  Public databases "  Transfac (http://www.gene-regulation.com) "  MSigDB (http://www.broadinstitute.org/gsea/msigdb) "  hPDI (http://bioinfo.wilmer.jhu.edu/PDI/ )

Mapping transcriptional regulatory networks

35

Page 36: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Mapping metabolic networks

!  Public databases "  KEGG (http://www.genome.jp/kegg/ ) "  MetaCyc (http://metacyc.org/ )

36

Page 37: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

37

Page 38: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Graph representation of networks

Cramer et al. Science 292:1863, 2001

edge

node

!  Graph: a graph is a set of objects called nodes or vertices connected by links called edges. In mathematics and computer science, a graph is the basic object of study in graph theory.

RNA polymerase II

38

Page 39: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Undirected graph vs directed graph

Protein interaction network

Nodes: protein

Edges: physical interaction

Undirected

Transcriptional regulatory network

Nodes: genes

Edges: transcriptional regulation

Directed

TF->target gene

Metabolic network

Nodes: metabolites

Edges: enzymes

Directed

Substrate->Product

Krogan et al. Nature 440:637, 2006

Ravasz et al. Science 297:1551, 2002

Shen-orr et al. Nat Genet, 31:64, 2002

39

Page 40: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Unweighted graph vs weighted graph

Protein interaction network

Nodes: protein

Edges: physical interaction

Unweighted

Krogan et al. Nature 440:637, 2006

Gene co-expression network

Nodes: gene

Edges: co-expression level

Unweighted Weighted Correlation filtering (>0.8)

40

Page 41: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Degree, path, shortest path

!  Degree: the number of edges adjacent to a node. A simple measure of the node centrality.

!  Path: a sequence of nodes such that from each of its nodes there is an edge to the next node in the sequence.

!  Shortest path: a path between two nodes such that the sum of the distance of its constituent edges is minimized.

YDL176W

Degree: 3

MetR

Out degree: 3

In degree: 1

41

Page 42: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

42

Page 43: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Properties of complex networks

Scale-free Modular Hierarchical Small world

43

Page 44: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Obama vs Lady Gaga: who is more influential?

Obama 7,035,548 701,301

Gaga 8,873,525 144,263

Eminem 3,509,469 0

Twitter followers (in degree)

Twitter following (out degree)

28,490,739 664,606

35,158,014 136,511

13,946,813 0

44

Page 45: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Role of hubs in biological networks

!  Based on data from model organisms S. cerevisiae and C. elegans "  Correspond to essential genes "  Have a tendency to be older and have evolved more slowly "  Have a tendency to be more abundant "  Have a larger diversity of phenotypic outcomes resulting from their

deletion

Vidal et al. Cell, 144:986, 2011

45

Page 46: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Scale-free topology and network robustness

!  Robust against random damage

!  Yet fragile against selective damage

46

Page 47: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Modularity

!  Modularity refers to a group of physically or functionally linked molecules (nodes) that work together to achieve a relatively distinct function.

!  Examples "  Transcriptional module: a set of co-

regulated genes "  Protein complex: assembly of

proteins that build up some cellular machinery, commonly spans a dense sub-network of proteins in a protein interaction network

"  Signaling pathway: a chain of interacting proteins propagating a signal in the cell

Protein interaction modules Palla et al, Nature, 435:841, 2005

Gene co-expression modules Shi et al, BMC Syst Biol, 4:74, 2010

47

Page 48: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Making sense of the hairballs using network hierarchies

Nodes ordered in one dimension on the basis of network structure

Data visualized as tracks

www.netgestalt.org Shi et al., Nature Methods, 2013

48

Page 49: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

49

Page 50: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Network visualization

Cytoscape (http://www.cytoscape.org)

Gehlenborg et al. Nature Methods, 7:S56, 2010

Network visualization tools

Red Blue

Black

Node label color: protein type HNE targets Transcription factors Other proteins

Node size: prediction cutoff level

Node color: protein function

Loose cutoff level

Stringent cutoff level

Cell cycle RNA splicing

Both Neither

50

Page 51: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process.

Network distance vs functional similarity

Sharan et al. Mol Syst Biol, 3:88, 2007

51

Page 52: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Gene function prediction: neighborhood majority voting

Direct neighbor (local)

Iterative majority voting (global)

52

Page 53: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Module-based Direct neighbor voting

Gene function prediction: module-based approach

53

Page 54: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

Candidate disease gene prioritization

Kohler et al. Am J Hum Genet. 82:949, 2008

54

Page 55: Proteomics and systems biologybioinfo.vanderbilt.edu/zhanglab/lectures/BCHM352_2015_zhang.pdfbiological systems 3 ! Pathway analysis " Motivation " ... Three organizing principles:

!  Pathway analysis "  Motivation "  Pathway databases "  Over-representation analysis "  Gene set enrichment analysis

!  Network analysis "  Network mapping "  Network representation "  Network properties "  Network-based applications

Outline

Data for individual genes

Understanding of biological systems

55