33
3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution CSB Seminar Philip M. Kim, Gerstein Lab New Haven, CT January 19th, 2006

CSB Seminar Philip M. Kim, Gerstein Lab

Embed Size (px)

DESCRIPTION

3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution. CSB Seminar Philip M. Kim, Gerstein Lab. New Haven, CT January 19th, 2006. MOTIVATION. ILLUSTRATIVE. Network perspective:. =. There remains a rich source - PowerPoint PPT Presentation

Citation preview

Page 1: CSB Seminar Philip M. Kim, Gerstein Lab

3-D Structural Analysis of Protein Interaction Networks Gives New Insight Into Protein Function, Network Topology and Evolution

CSB SeminarPhilip M. Kim, Gerstein Lab

New Haven, CTJanuary 19th, 2006

Page 2: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

2

MOTIVATION

AB1-4

Cdk/cyclin complex Part of the RNA-pol complex

ILLUSTRATIVE

A

B1

B2

B3

B4

Network perspective:

Structural biology perspective:

=

There remains a rich sourceof knowledge unmined by network

theorists!

Page 3: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

3

OUTLINE

Interaction Networks and their properties

Network properties revisited

A 3-D structural point of view

Conclusions

Page 4: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

4

OUTLINE

Interaction Networks and their properties

Network properties revisited

A 3-D structural point of view

Conclusions

Page 5: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

5

PROTEIN INTERACTION NETWORKS IN YEAST

Source: Gavin et al. Nature (2002), Uetz et al. Nature (2000), Cytoscape and DIP

• Determined by:

– Large-scale Yeast-two-hydrid

– TAP-Tagging

– Literature curation

• Currently over 20,000 unique interactions available in yeast

• Spawned a field of computational “graph theory” analyses that view proteins as “nodes” and interactions as “edges”

A snapshot of the current interactome Description and methodologies

ILLUSTRATIVE

DIP (Database of interacting Proteins)

Page 6: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

6

TINY GLOSSARY: DEGREE AND HUBS

C: Degree = 1

A: Degree = 5

A is a “Hub”*

*The definition of hubs is somewhat arbitrary, usually a cutoff is used

Source: PMK

Page 7: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

7

INTERESTING PROPERTIES OF INTERACTION NETWORKS

Source: Various, see following slides

Network topology

Network Evolution

Relationship of topology and genomic features

Examples of studies

• What distribution does the degree (number of interaction partners) follow?

• What is the relationship between the degree and a proteins essentiality?

• Is there a relationship between a proteins connectivity and expression profile?

• What is the relationship between a proteins evolutionary rate and its degree?

• How did the observed network topology evolve?

OVERVIEW

Page 8: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

8

INTERACTION NETWORKS ARE SCALE-FREE – THEIR TOPOLOGY IS DOMINATED BY SO-CALLED HUBS

Source: Barabasi, A. and Albert, R., Science (1999)

• So-called scale-free topology has been observed in many kinds of networks (among them interaction networks)

• Scale freeness: A small number of hubs and a large number of poorly connected ones (“Power-law behavior”)

• Topology is dominated by “hubs”

• Scale-freeness is in stark contrast to normal (gaussian) distribution

p(k) ~ kγ

Page 9: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

9

HUBS TEND TO BE IMPORTANT PROTEINS, THEY ARE MORE LIKELY TO BE ESSENTIAL PROTEINS AND TEND TO BE MORE CONSERVED

Source: Jeong et al. Nature (2001), Yu et al. TiG (2004) and Fraser et al. Science (2002)

• By now it is well documented that proteins with a large degree tend to be essential proteins in yeast.

(“Hubs are essential”)

• Likewise, it has been found that hubs tend to evolve more slowly than other proteins

(“Hubs are slower evolving”)

Page 10: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

10

• But the “Yes” side appears to be winning

… OR ARE THEY? THERE IS AN ONGOING DEBATE ABOUT THE RELATIONSHIP BETWEEN EVOLUTIONARY RATE AND DEGREE

Source: See text

Yes, hubs are more conserved

• Fraser et al. Science (2002)

• Fraser et al. BMC Evol. Biol. (2003)

• Wuchty Genome Res. (2004)

• Jordan et al. Genome Res. (2002)

• Hahn et al. J. Mol. Evol. (2004)

• Jordan et al. BMC Evol. Biol. (2003)

No, the relationship is unclear

?

EXAMPLES

• Fraser Nature Genetics (2005)

Page 11: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

11

THERE IS A RELATIONSHIP BETWEEN NETWORK TOPOLOGY AND GENE EXPRESSION DYNAMICS

Source: Han et al. Nature (2004) and Yu*, Kim* et al. (Submitted)

Frequency

Co-expression correlation

Page 12: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

12

SCALE FREENESS GENERALLY EVOLVES THROUGH PREFERENTIAL ATTACHMENT (THE RICH GET RICHER)

Source: Albert et al. Rev. Mod. Phys. (2002) and Middendorf et al. PNAS (2005)

• Theoretical work shows that a mechanism of preferential attachment leads to a scale-free topology

(“The rich get richer”)

The Duplication Mutation Model Description

ILLUSTRATIVE

• In interaction network, gene duplication followed by mutation of the duplicated gene is generally thought to lead to preferential attachment

• Simple reasoning: The partners of a hub are more likely to be duplicated than the partners of a non-hub

Gene duplication

The interaction partners of A are more likely to beduplicated

Page 13: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

13

OUTLINE

Interaction Networks and their properties

Network properties revisited

A 3-D structural point of view

Conclusions

Page 14: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

14

THERE IS A PROBLEM WITH SCALE-FREENESS AND REALLY BIG HUBS IN INTERACTION NETWORKS

Source: DIP, Institut fuer Festkoerperchemie (Univ. Tuebingen)

A really big hub (>200 Interactions)

Gedankenexperiment

How many maximum neighbors can a protein have?

• Clearly, a protein is very unlikely to have >200 simultaneous interactors.

• Some of the >200 are most likely false positives

• Some others are going to be mutually exclusive interactors (i.e. binding to the same interface).

Conclusion

• There appears to be an obvious discrepancy between >200 and 12.

ILLUSTRATIVEWouldn’t it be great to

be able to see the differentbinding interfaces?

Page 15: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

15

UTILIZING PROTEIN CRYSTAL STRUCTURES, WE CAN DISTINGUISH THE DIFFERENT BINDING INTERFACES

*Many redundant structures

Source: PMK

ILLUSTRATIVE

InteractomeUse a high-confidencefilter

Map Pfam domains to all proteins in the interactome

Distinguish interfaces

Combine with all structures of yeast protein complexes

Annotate interactionswith available structures,discard all others

PDB

Homology mappingof Pfam domainsto all structures of interactions

~10000 Structures of interactions*

~20000 interactions

Page 16: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

16

SHORT DIGRESSION: THIS ALLOWS US TO DISTINGUISH SYSTEMATICALLY BETWEEN SIMULTANEOUSLY POSSIBLE AND MUTUALLY EXCLUSIVE INTERACTIONS

Simultaneouslypossible

interactions

Mutuallyexclusive

interactions

Source: PMK

Page 17: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

17

Mutuallyexclusive

interactions

Simultaneouslypossible

interactions

0.24

0.14Fractionsame biologicalprocess

p<<0.001

Fractionsamemolecularfunction

p<<0.001

Mutuallyexclusive

interactions

Simultaneouslypossible

interactions

Co-expressioncorrelation

p<<0.001

0.33

0.18

0.23

0.17

Fractionsamecellularcomponent

p<<0.001

0.27

0.12

SIMULTANEOUSLY POSSIBLE INTERACTIONS (“PERMANENT”) MORE OFTEN LINK PROTEINS THAT ARE FUNCTIONALLY SIMILAR, COEXPRESSED AND CO-LOCATED

Source: PMK

Page 18: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

18

THAT IS HOW THE RESULTING NETWORK LOOKS LIKE

Source: PDB, Pfam, iPfam and PMK

• Represents a “very high confidence” network

• Total of 873 nodes and 1269 interactions, each of which is structurally characterized

• 438 interactions are classified as mutually exclusive and 831 as simultaneously possible

• While much smaller than DIP, it is of similar size as other high-confidence datasets

The Structural Interaction Dataset (SID) Properties

Page 19: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

19

OUTLINE

Interaction Networks and their properties

Network properties revisited

A 3-D structural point of view

Conclusions

Page 20: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

20

REMEMBER THE NETWORK PROPERTIES AS WE DESCRIBED BEFORE?

Source: Various, see following slides

Network topology

Network Evolution

Relationship of topology and genomic features

Examples of studies

• What distribution does the degree (number of interaction partners follow?)

• Does the network easily separate into more than one component?

• What is the relationship between the degree and a proteins essentiality?

• Is there a relationship between a proteins connectivity and expression profile?

• What is the relationship between a proteins evolutionary rate and its degree?

• How did the observed network topology evolve?

OVERVIEW

Page 21: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

21

THERE DO NOT APPEAR TO BE THE KINDS OF REALLY BIG HUBS AS SEEN BEFORE – IS THE TOPOLOGY STILL SCALE-FREE?

Source: PMK

• With the maximum number of interactions at 13, there are no “really big hubs” in this network

• Note that in other high-confidence datasets (or similar size), there are still proteins with a much higher degree

• The degree distribution appears to top out much earlier and less scale free than that of other networks

Degree distribution Properties

Our dataset (SID)

Conventional Datasets (e.g. DIP)

Page 22: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

22

Entire genomeAll proteins

In our dataset

64.9%

31.8%32.3%15.1%

Single-interface hubs only

Multi-interface hubs only

Percentage ofessential proteins

IT’S REALLY ONLY THE MULTI-INTERFACE HUBS THAT ARE SIGNIFICANTLY MORE LIKELY TO BE ESSENTIAL

Source: PMK

Page 23: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

23

All proteinsIn our dataset

Single-interface hubs only

Multi-interface hubs only

ExpressionCorrelation

0.20.17

0.25

Expression correlation

DATE-HUBS AND PARTY-HUBS ARE REALLY SINGLE-INTERFACE AND MULTI-INTERFACE HUBS

Source: Han et al. Nature (2004) and PMK

Frequency

Page 24: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

24

AND ONLY MULTI-INTERFACE PROTEINS ARE EVOLVING SLOWER, SINGLE-INTERFACE HUBS DO NOT

Entire genomeAll proteins

In our datasetSingle-interface

hubs onlyMulti-interface

hubs only

EvolutionaryRate (dN/dS)

0.029

0.077

0.047 0.051

Source: PMK

Page 25: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

25

• But the “Yes” side appears to be winning

… OR ARE THEY? THERE IS AN ONGOING DEBATE ABOUT THE RELATIONSHIP BETWEEN EVOLUTIONARY RATE AND DEGREE

Source: See text

Yes, hubs are more conserved

• Fraser et al. Science (2002)

• Fraser et al. BMC Evol. Biol. (2003)

• Wuchty Genome Res. (2004)

• Jordan et al. Genome Res. (2002)

• Hahn et al. J. Mol. Evol. (2004)

• Jordan et al. BMC Evol. Biol. (2003)

No, the relationship is unclear

?

This debate may have arisenbecause the two different sides were

all looking at the wrong variable!

Page 26: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

26

IN FACT, EVOLUTIONARY RATE CORRELATES BEST WITH THE FRACTION OF INTERFACE AVAILABLE SURFACE AREA

Source: PMK

DATA IN BINS

Small portion of surface area involved in interfaces – fast evolving

Large portion of surface area involved in interfaces – slow evolving

Page 27: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

27

IS THERE A DIFFERENCE BETWEEN SINGLE-INTERFACE HUBS AND MULTI-INTERFACE HUBS WITH RESPECT TO NETWORK EVOLUTION?

Source: PMK

The Duplication Mutation Model

Gene duplication

The interaction partners of A are more likely to beduplicated

In the structural viewpoint

If these models were correct,there would be an enrichment of

paralogs among B

Page 28: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

28

0.00%

0.15%

0.07%

0.003%

Random pair

Same partner

Same partnerdifferent interface

Same partnersame interface

Fraction of paralogsbetween pairs of proteins

MULTI-INTERFACE HUBS DO NOT APPEAR TO EVOLVE BY A GENE DUPLICATION – THE DUPLICATION MUTATION MODEL CAN ONLY EXPLAIN THE EXISTENCE OF SINGLE-INTERFACE HUBS

Source: PMK

But that also means that the duplication-mutation modelcannot explain the full current

interaction network!

Page 29: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

29

OUTLINE

Interaction Networks and their properties

Network properties revisited

A 3-D structural point of view

Conclusions

Page 30: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

30

CONCLUSIONS

• The topology of a direct physical interaction network is much less dominated by hubs than previously thought

• Several genomic features that were previously thought to be correlated with the degree are in fact related to the number of interfaces and not the degree

• Specifically, a proteins evolutionary rate appears to be dependent on the fraction of surface area involved in interactions rather than the degree

• The current network growth model can only explain a part of currently known networks

PRELIMINARY

Source: PMK

Page 31: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

31

ACKNOWLEDGEMENTS

Mark Gerstein

The nets group (Haiyuan, Jason, Brandon, Tara, Kevin, Zhengdong and Alberto)

The Gersteinlab

Page 32: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

32

OUT

Page 33: CSB Seminar Philip M. Kim, Gerstein Lab

060119_CSB_Talk_PMK

33

BACKUP