23
Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing, and Visualization for Graphs Stanford University August 1-2, 2003

Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Embed Size (px)

Citation preview

Page 1: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Statistical Inference Using Graphs for Protein

Complex Identification

Denise ScholtensRobert Gentleman

Marc Vidal

Workshop on Statistical Inference, Computing, and Visualization for Graphs

Stanford UniversityAugust 1-2, 2003

Page 2: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Graphic from:U.S. Department of Energy Human Genome Program http://www.ornl.gov/hgmis

Page 3: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

High-throughput Protein Complex Identification

• Gavin, et al. (Nature, 2002)– TAP : Tandem Affinity Purification

• Ho, et al. (Nature, 2002)– HMS-PCI: High-throughput Mass

Spectromic Protein Complex Identification

Page 4: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Protein Complex Identification Using TAP

Data Spoke Model

Matrix Model

Bader, et al. (Nature Biotechnology, 2002)

Page 5: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Protein-Complex Affiliation Network Incidence Matrix

C1 C2 C3 C4 C5 … Cm

P1

P2

P3

P4

P5

P6

P7…

Pn

1110000…0

0000111…0

1001000…0

1101000…0

1111000…0

A =

Page 6: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Cohesive vs. Dynamic Protein Complexes

Cohesive Complex: a complex of invariable composition whose proteins are associated only with that complex and its particular function

Page 7: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Cohesive Complex Affiliation Network Incidence Matrix

C1

Bait

Hit 1

Hit 2

Hit 3

Hit 4Hit 5

111111

A =

Page 8: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Cohesive vs. Dynamic Protein Complexes

Dynamic Complex: complex composed of proteins that may also be involved in other complexes

Page 9: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Dynamic Complex Affiliation Network Incidence Matrices

A =

C1 C2 C3 C4 C5

Bait 1 1 1 1 1

Hit 1 1 0 0 0 0

Hit 2 0 1 0 0 0

Hit 3 0 0 1 0 0

Hit 4 0 0 0 1 0

Hit 5 0 0 0 0 1

C1 C2

Bait 1 1

Hit 1 1 0

Hit 2 0 1

Hit 3 1 0

Hit 4 0 1

Hit 5 1 0

A =

C1 C2

Bait 1 1

Hit 1 1 1

Hit 2 1 1

Hit 3 0 1

Hit 4 0 1

Hit 5 0 1

A =

Page 10: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

All 5 “complexes” above would yield the same TAP Data:

Page 11: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Statistical Inference Problem

• What is A?• A captures the cohesive/dynamic

distinction.• At best, we observe all but the main

diagonal of X=AA’.

• Current analyses focus on X, not on A.

Page 12: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Protein Complex Data as a Directed Graph

?

Page 13: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Cohesive Complex described in Gavin, et al.

Page 14: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Dynamic Complex described in Gavin, et al.

Page 15: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Largest Connected Component in Gavin, et al. using Bait Proteins Only,

Colored by Outdegree

Page 16: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Gavin Data Ho Data

Page 17: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

SubGraph of Bait Proteins from Previous Graphs with Outdegree 7

Gavin Data Ho Data

Page 18: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Examples of Distinct Complexes Identified by Gavin, et al.

Page 19: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Back to Affiliation Networks

C1

B1 1

B2 1

B3 1

A =

B1 B2 B3

B1 1 1 1

B2 1 1 1

B3 1 1 1

X=AA’ =

One Three-Way Conversation

Page 20: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Affiliation Networks

C1 C2 C3

B1 1 1 0

B2 1 0 1

B3 0 1 1

A =

B1 B2 B3

B1 2 1 1

B2 1 2 1

B3 1 1 2

X=AA’ =

Three Two-Way Conversations

Page 21: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Statistical Inference Problem

• Which A is correct?– A uniquely defines X, but X does not

uniquely define the observable part of A.

• Extra information and directed graph model for the TAP data– Cellular Component Data– Gene Expression Data– Hit Data

Page 22: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Possible Use of Hit Data to Help Estimate A

Page 23: Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert Gentleman Marc Vidal Workshop on Statistical Inference, Computing,

Conclusions

• In the protein complex setting, directed graphs are useful for EDA, as well as framing the correct questions for statistical inference.

• Statistical inference problem for cohesive and dynamic protein complex identification should focus on A, not X.

• Digraph model of the TAP data better reflects what we actually observe, and is informative for estimating A.