60
Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and U von Luxburg (U Tübingen) VL Network Analysis (19401701) SS2016 Week 5 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin

VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 1

Based on slides by J Ruan (U Texas)and U von Luxburg (U Tübingen)

VL Network Analysis (19401701)

SS2016Week 5

Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin

Page 2: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 2

Community structure

Page 3: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 3

Source: Newman and M. Girvan, Finding and evaluating community structure in networks, Physical Review E 69, 026113 (2004).

Page 4: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 4

Consider edges that fall within a community or between a community and the rest of the networkDefine modularity Q:

A: Adjacency matrixL : Total number of links ki : degree of i-th nodeci : label of module to which i-th node belongsD: indicator function – 1 if both nodes are in same cluster

probability of an edge betweentwo vertices is proportional to their degrees

Modularity function (Q)

Page 5: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 5

HQcut

• Ruan & Zhang, Physical Review E 2008

• Apply Qcut to get communities with largest Q

• Recursively search for sub-communities within each community

• When to stop?– Q value of sub-network is small, or– Q is not statistically significant

• Estimated by Monte-Carlo method

Page 6: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 6

Applications to a PPI network

• Protein-protein interaction (PPI) network– Vertices: proteins– Edges: interactions detected by experiments

• Motivation:– Community = protein complex?

• Protein complex– Group of proteins associated via interactions– Elementary functional unit in the cell– Prediction from PPI network is important

Page 7: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 7

Experiments• Data set

– A yeast protein-protein interaction network• Krogan et.al., Nature. 2006

– 2708 proteins, 7123 interactions

• Algorithms:– Qcut, HQcut, Newman

• Evaluation– ~300 Known protein complexes in MIPS– How well does a community match to a known protein

complex?

Page 8: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 8

Results

Newman Qcut HQcut

# of communities 56 93 316

Max community size 312 264 60

# of matched communities 53 52 216

Communities with matching score = 1 5 (9%) 7 (13%) 43 (20%)

Average matching score 0.56 0.55 0.70

# of novel predictions 3 41 100

Page 9: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 9

Communities found by HQcutSmall ribosomal subunit (90%)

RNA poly II mediator (83%)

Proteasome core (90%)

Exosome (94%)

gamma-tubulin (77%)

respiratory chain complex IV (82%)

Page 10: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 10

Lecture outline

• Gene expression analysis• Converting data to networks• Applications of network clustering methods

Page 11: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 11

Gene Expression Analysis

Page 12: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

The early steps of a microarray study

• Scientific Question (biological)• Study design (biological/statistical)• Conducting Experiment (biological)• Preprocessing/

Normalising Data (statistical)

• Finding differentially expressed genes (statistical)

1st Classical statistics T-tests, ANOVA Since 1950s

2ndHigh-dimensional feature selection;Machine learning

SAM, Limma; SVM, Neural networks

Since 1990s

3rd Group-based enrichment analysis

GSEA, GSA, Globaltest Since 2003

4th Pathway Analysis SPIA, TopoGSA Since 2007

Page 13: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Specific Filtering

• t-statistic (one-way ANOVA F-statistic if > 2 samples)–problem is that there often isn’t enough data to estimate variances

•Fold change: simplest method; ratio of expression levels(but as microarray data is typically log transformed,

calculated as difference of means)

Page 14: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

A data example

• Lee et al (2005) compared adipose tissue (abdominal subcutaenous adipocytes) between obese and lean Pima Indians

• Samples were hybridised on HGu95e-Affymetrix arrays (12639 genes/probe sets)

• Available as GDS1498 on the GEO database

Page 15: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Page 16: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

The “Result”Probe Set ID log.ratio pvalue adj.p73554_at 1.4971 0.0000 0.000491279_at 0.8667 0.0000 0.001774099_at 1.0787 0.0000 0.010483118_at -1.2142 0.0000 0.013981647_at 1.0362 0.0000 0.013984412_at 1.3124 0.0000 0.022290585_at 1.9859 0.0000 0.025884618_at -1.6713 0.0000 0.025891790_at 1.7293 0.0000 0.035080755_at 1.5238 0.0000 0.035185539_at 0.9303 0.0000 0.035190749_at 1.7093 0.0000 0.035174038_at -1.6451 0.0000 0.035179299_at 1.7156 0.0000 0.035172962_at 2.1059 0.0000 0.035188719_at -3.1829 0.0000 0.035172943_at -2.0520 0.0000 0.035191797_at 1.4676 0.0000 0.035178356_at 2.1140 0.0001 0.035990268_at 1.6552 0.0001 0.0421

What happened to the biology???

Page 17: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Naive functional analyses

• Manually annotate list of differentially expressed (DE) genes • Extremely time-consuming, not systematic, user-dependent• Group together genes with similar function• Conclude functional categories with most DE genes

important in disease/condition under study• BUT may not be the right conclusion

Page 18: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Page 19: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

The Gene Ontology Consortium

Page 20: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

GO Consortium

• Developed three structured and controlled vocabularies (ontologies)

• Describe gene products in terms of their

• associated biological processes, • cellular components and • molecular functions

in a species-independent manner

• Has become a major resource for microarray data interpretation

Page 21: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

The Gene Ontology

• Molecular Function: basic activity or task• e.g. catalytic activity, calcium ion binding

• Biological Process: broad objective or goal• e.g. signal transduction, immune response

• Cellular Component: location or complex• e.g. nucleus, mitochondrion

Page 22: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Slightly more informative resultsProbe Set ID Gene SymboGene Title go biological process termgo molecular function term log.ratio pvalue adj.p73554_at CCDC80 coiled-coil domain contain --- --- 1.4971 0.0000 0.000491279_at C1QTNF5 /// C1q and tumor necrosis fa visual perception /// embry --- 0.8667 0.0000 0.001774099_at --- --- --- --- 1.0787 0.0000 0.010483118_at RNF125 ring finger protein 125 immune response /// mod protein binding /// zinc ion -1.2142 0.0000 0.013981647_at --- --- --- --- 1.0362 0.0000 0.013984412_at SYNPO2 synaptopodin 2 --- actin binding /// protein bin 1.3124 0.0000 0.022290585_at C15orf59 chromosome 15 open rea --- --- 1.9859 0.0000 0.025884618_at C12orf39 chromosome 12 open rea --- --- -1.6713 0.0000 0.025891790_at MYEOV myeloma overexpressed --- --- 1.7293 0.0000 0.035080755_at MYOF myoferlin muscle contraction /// bloo protein binding 1.5238 0.0000 0.035185539_at PLEKHH1 pleckstrin homology doma --- binding 0.9303 0.0000 0.035190749_at SERPINB9 serpin peptidase inhibitor, anti-apoptosis /// signal traendopeptidase inhibitor ac 1.7093 0.0000 0.035174038_at --- --- --- --- -1.6451 0.0000 0.035179299_at --- --- --- --- 1.7156 0.0000 0.035172962_at BCAT1 branched chain aminotran G1/S transition of mitotic c catalytic activity /// branch 2.1059 0.0000 0.035188719_at C12orf39 chromosome 12 open rea --- --- -3.1829 0.0000 0.035172943_at --- --- --- --- -2.0520 0.0000 0.035191797_at LRRC16A leucine rich repeat contain --- --- 1.4676 0.0000 0.035178356_at TRDN triadin muscle contraction receptor binding 2.1140 0.0001 0.035990268_at C5orf23 chromosome 5 open read --- --- 1.6552 0.0001 0.0421

• If we are lucky, some of the top genes mean something to us

• But what if they don’t?

• And what are the results for other genes with similar biological functions?

Page 23: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Major bioinformatic developments

• Requires annotating entire set of genes

• The Gene Ontology Consortium (www.geneontology.org)

• Automated, statistical approaches for annotating gene lists and performing functional profiling

Page 24: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Functional profiling tools

Identify GO categories with significantly more DE genes than expected by chance (i.e. over-represented among DE genes relative to

representation on array as a whole)

Correct for testing multiple GO categories

Hypergeometric Distribution or Fisher’s Exact Test

Page 25: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Biological Interpretation

• Interpretation still requires substantial work• search literature and public databases • likely functional consequences of the changes• are the genes identified as significant within each GO category up-

or down-regulated?• genes within a category can have opposite effects e.g. apoptosis

would include genes that induce or repress apoptosis

Page 26: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

More than GO…

Page 27: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

• Methods of how to incorporate biological knowledge into microarray analysis

• The type of knowledge we deal with is rather simple: we know groups/sets of genes that for example:

• have a similar function (e.g. GO)• belong to the same pathway• are located on the same chromosome, etc…

• We will assume these groupings to be given• i.e we will not discuss methods how to detect pathways,

networks, gene clusters

Page 28: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

What is a pathway?

• No clear definition• Wikipedia: “In biochemistry, metabolic pathways are series of chemical

reactions occurring within a cell. In each pathway, a principal chemical is modified by chemical reactions.”

• These pathways describe enzymes and metabolites

• But often the word “pathway” is also used to describe gene regulatory networks or protein interaction networks

• In all cases a pathway describes a biological function very specifically

Page 29: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

What is a Gene Set?

• Just what it says: a set of genes!• All genes involved in a pathway are an example of a Gene Set• All genes corresponding to a Gene Ontology term are a Gene Set• All genes mentioned in a paper of Smith et al might form a Gene Set

• A Gene Set is a much more general and less specific concept than a pathway

• Still: we will sometimes use two words interchangeably, as the analysis methods are mainly the same

Page 30: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

What is Gene Set/Pathway analysis?

• The aim is to give one number (score, p-value) to a Gene Set/Pathway

• Are many genes in the pathway differentially expressed (up-regulated/down-regulated)

• Can we give a number (p-value) to the probability of observing these changes just by chance?

Page 31: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Classes of Gene Set Analysis

Khatri et al. PLOS Comp Bio. 8:1 2012

DAVID

GSEA

Reactome FI networkPARADIGM

Page 32: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Limitations of Gene Set Enrichment Analysis

• Many possible gene sets – diseases, molecular function, biological process, cellular compartment, pathways...

• Gene sets are heavily overlapping; need to sort through lists of enriched gene sets!

• “Bags of genes” obscure regulatory relationships among them.

Page 33: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Pathway Analysis

Page 34: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Pathway Databases

• Advantages:– Usually curated.– Biochemical view of biological processes.– Cause and effect captured.– Human-interpretable visualizations.

• Disadvantages:– Sparse coverage of genome.– Different databases disagree on boundaries of

pathways.

Page 35: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

KEGG

Page 36: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Reactome

Page 37: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Reactome

• Hand-curated pathways in human.• Rigorous curation standards – every reaction

traceable to primary literature.• Automatically-projected pathways to non-human

species.• 22 species; 1112 human pathways; 5078 proteins.• Features:

– Google-map style reaction diagrams with overlays; – Find pathways containing your gene list; – Calculate gene overrepresentation in pathways;– Find corresponding pathways in other species.

• Open access.

Page 38: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Pathway Commons

Page 39: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Pathway Colorization

• Main feature offered by all pathway databases.• Upload a gene list• Database calculates an enrichment score on each

pathway and displays ranked list.• Browse into pathways of interest; download

colorized pictures.

Page 40: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Example from Reactome

Page 41: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Example from Reactome

Page 42: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Page 43: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Curated Human Data – Version 35.5078 proteins 4166 reactions3870 complexes 1112 pathways Only ~25% of genome!

Goal: add a “corona” of uncurated interaction data around scaffold of curated pathway data.

Example: Reactome FI Network

Page 44: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Page 45: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

More than pathways

Page 46: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16

Networks

• Pathways capture only the “well understood” portion of biology.• Networks cover less well understood relationships:

– Genetic interactions– Physical interaction– Coexpression– GO term sharing– Adjacency in pathways

Page 47: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 47

Gene Expression Networks

Page 48: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 48

Microarray data

• Data organized into a matrix– Rows are genes– Columns are samples representing different

time points, conditions, tissues, etc.• Analysis techniques

– Differential expression analysis– Classification and clustering– Regulatory network construction– Enrichment analysis

• Characteristics of microarray data– High dimensionality and noise– Underlying topology unknown, often

irregular shape

Sample

Gen

e

Red: high activityGreen: low activity

Page 49: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 49

Microarray data clustering

• Many clustering algorithms available– K-means– Hierarchical– Self organizing maps– Parameter hard to tune– Does not consider network topology

Sample

Gen

e • Common functions?• Common regulation?• Predict functions for

unknown genes?

Analyze genes in each cluster

Red: high activityGreen: low activity

Page 50: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 50

From Data to Neworks

Page 51: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 51

Network-based data analysis

Page 52: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 52

Network-based data analysis

Page 53: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 53

Network-based data analysis

Page 54: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 54

Network-based data analysis

Page 55: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 55

Network-based data analysis

Page 56: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 56

Distances & Similarity

Page 57: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 57

Directed k-nearest neighbor graph

Page 58: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 58

Undirected k-nearest neighbor graph

Page 59: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 59

Undirected k-nearest neighbor graph

Page 60: VL Network Analysis (19401701) SS2016 Week 5medicalbioinformatics.de/...Analysis/...ss16_week5.pdf · Tim Conrad, VL Network Analysis, SS16 1 Based on slides by J Ruan (U Texas) and

Tim Conrad, VL Network Analysis, SS16 60

epsilon-neighborhood Graph