39
A computational study A computational study of protein folding of protein folding pathways pathways Reducing the computational Reducing the computational complexity of the folding complexity of the folding process using the building process using the building block folding model. block folding model. Nurit Haspel, Chung-Jung Tsai, Haim Nurit Haspel, Chung-Jung Tsai, Haim Wolfson and Ruth Nussinov Wolfson and Ruth Nussinov

A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Embed Size (px)

Citation preview

Page 1: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

A computational study of A computational study of protein folding pathwaysprotein folding pathways

Reducing the computational Reducing the computational complexity of the folding process complexity of the folding process using the building block folding using the building block folding

model.model.

Nurit Haspel, Chung-Jung Tsai, Haim Nurit Haspel, Chung-Jung Tsai, Haim Wolfson and Ruth NussinovWolfson and Ruth Nussinov

Page 2: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

The building blocks modelThe building blocks model(Chung Jung Tsai)(Chung Jung Tsai)

Protein folding is a hierarchical process. Protein folding is a hierarchical process. A protein is constructed from HFU’s.A protein is constructed from HFU’s. HFU - the result of a combinatorial HFU - the result of a combinatorial

assembly of building blocks.assembly of building blocks. Building block - a contiguous, highly Building block - a contiguous, highly

populated fragment.populated fragment. The building block model allows The building block model allows

illustrating the protein folding pathway. illustrating the protein folding pathway.

Page 3: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

An outline of the building blocks An outline of the building blocks algorithmalgorithm

Scoring function - measures the relative Scoring function - measures the relative stability of a candidate building blockstability of a candidate building block

Three ingredients:Three ingredients:– CompactnessCompactness– Degree of isolationDegree of isolation– hydrophobicityhydrophobicity

The result - an “anatomy tree” that The result - an “anatomy tree” that illustrates the most probable folding illustrates the most probable folding route.route.

Page 4: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

The Scoring FunctionThe Scoring Function

2

2

2

2

2

2

1

1

1

1

1

1

),,(

Dev

avg

Dev

avg

Dev

avg

Dev

avg

Dev

avg

Dev

avgBB

III

HHH

ZZZ

III

HHH

ZZZ

IHZSCORE

Z - CompactnessH - hydrophobicityI - Isolation

Page 5: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Compactness, Hydrophobicity and Isolation Compactness, Hydrophobicity and Isolation definitionsdefinitions

•Compactness -

•Hydrophobicity -

•Isolation -

3/1236 VOLASAZ Surf

NonSurf

NonBuried

NonBuried

ASAASA

ASAH

frag

NonEB

ASA

ASAI

Page 6: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

The Cutting ProcedureThe Cutting Procedure

Locating a basket of candidate building blocks Locating a basket of candidate building blocks (relatively stable contiguous fragments):(relatively stable contiguous fragments):– Assign a stability score to all the candidate fragmentsAssign a stability score to all the candidate fragments– Collect the local minima in the “fragment map” (best Collect the local minima in the “fragment map” (best

score in a given radius).score in a given radius). Recursively splitting the protein top-down:Recursively splitting the protein top-down:

– Search the “basket” for a set of fragments that Search the “basket” for a set of fragments that constitute the whole fragment, allowing a short constitute the whole fragment, allowing a short overlap (7 residues) and a gap of up to 15 residues. overlap (7 residues) and a gap of up to 15 residues.

– Minimum building block size - 15.Minimum building block size - 15.– No node can have only one child (except for the root)No node can have only one child (except for the root)– Stop when the node can not be split any furtherStop when the node can not be split any further– In this work, building blocks up to level 6.In this work, building blocks up to level 6.

Page 7: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Example - Annexin IIIExample - Annexin III

Page 8: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Example (cont.)Example (cont.)

Page 9: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Example (cont.)Example (cont.)

Page 10: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Usefulness of the anatomy treeUsefulness of the anatomy tree It is possible to see whether a protein folds It is possible to see whether a protein folds

through single or multiple route(s).through single or multiple route(s).– These routes can be observed by inspecting These routes can be observed by inspecting

the fragment map (there can be more than one the fragment map (there can be more than one way to construct a tree).way to construct a tree).

Sequential versus non-sequential folding.Sequential versus non-sequential folding.– Sequential – contact made only between Sequential – contact made only between

consecutive building blocks.consecutive building blocks.– Binary anatomy tree sequential folder.Binary anatomy tree sequential folder.

Fast versus slow foldingFast versus slow folding– Sequential folding proteins usually fold faster.Sequential folding proteins usually fold faster.

Climbing up the tree allows us to illustrate Climbing up the tree allows us to illustrate the folding process.the folding process.

Page 11: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Critical building blocks Critical building blocks (Sandeep Kumar)(Sandeep Kumar)

Some building blocks may be Some building blocks may be considered critical for correct folding.considered critical for correct folding.

A critical building block is in contact A critical building block is in contact with other building blocks in the protein.with other building blocks in the protein.

It likely to be inserted between It likely to be inserted between sequentially connected building blocks.sequentially connected building blocks.

Without it, the other building blocks are Without it, the other building blocks are likely to mis-associate.likely to mis-associate.

The structure and sequence of a critical The structure and sequence of a critical BB is more likely to be conserved.BB is more likely to be conserved.

Page 12: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Critical building block algorithmCritical building block algorithm

For each building block:For each building block:– Compute its diff. contacting surface Compute its diff. contacting surface

area .area .– Compute its Critical building block Compute its Critical building block

index : index :

– Compute its Z-score:Compute its Z-score:

)()(Pr

*)(

)()(

jSolvExpAsajotBuryAsa

jTotSajDiffContSa

jCIndex

))(()( jCIndexjZscore

Page 13: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Critical building blocks (cont.)Critical building blocks (cont.)

It is found at most levels below the It is found at most levels below the hydrophobic folding unit levelhydrophobic folding unit level

It has a consistently high CIndex at It has a consistently high CIndex at different levelsdifferent levels

Its CIndex is significant by at least 2 Its CIndex is significant by at least 2 standard deviations in at least one standard deviations in at least one level of protein anatomylevel of protein anatomy

A building block is critical if:

Page 14: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

The goals of my researchThe goals of my research

Clustering the building blocks Clustering the building blocks according to their 3-D structures, according to their 3-D structures, using a rigid matching algorithm.using a rigid matching algorithm.

Analyzing the building blocks: Analyzing the building blocks: Sequence, stability distribution, size.Sequence, stability distribution, size.

Analyzing the clusters: Size, stability Analyzing the clusters: Size, stability score distribution, sequence score distribution, sequence conservation, criticalness conservation, criticalness conservation.conservation.

Page 15: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

The goals of my research (cont.)The goals of my research (cont.)

Analyzing the critical building blocks: Analyzing the critical building blocks: position within the protein, relative position within the protein, relative stability, sequence and structure stability, sequence and structure conservation.conservation.

Developing an algorithm that assigns Developing an algorithm that assigns a set of building blocks to a protein a set of building blocks to a protein sequence, using sequence similarity, sequence, using sequence similarity, relative stability and more relative stability and more information.information.

Page 16: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Clustering the building blocksClustering the building blocks

Each cluster has representative members Each cluster has representative members (one or more)(one or more)

For each building block structure:For each building block structure:– Go over the clusters.Go over the clusters.– Match with cluster representative(s).Match with cluster representative(s).– If matches (1.5A rmsd, 70% size) - join the If matches (1.5A rmsd, 70% size) - join the

building block to the cluster.building block to the cluster. If no match found - open a new cluster If no match found - open a new cluster

with this building block as a with this building block as a representative.representative.

Problem -O(n²) comparisonsn - number of clusters

Page 17: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Clustering of the building blocksClustering of the building blocks

Cluster 1 Cluster 2 Cluster n

? ?

Page 18: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Making clustering more efficientMaking clustering more efficient

Dividing the building blocks into Dividing the building blocks into SCOP families (proteins from the SCOP families (proteins from the same family usually produce the same family usually produce the same building blocks).same building blocks).

Clustering each family and then Clustering each family and then merge all the clusters - reduces the merge all the clusters - reduces the number of clusters at each instance.number of clusters at each instance.

Page 19: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Building block and cluster dataBuilding block and cluster data

Page 20: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Distribution of number of clustersDistribution of number of clusters

Page 21: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

An example of a clusterAn example of a cluster

Page 22: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Sequence analysis of the clustersSequence analysis of the clusters

Sequence clustering of each Sequence clustering of each structural cluster (using BLAST).structural cluster (using BLAST).

Creating a non-redundant Creating a non-redundant sequence dataset.sequence dataset.

Goal - finding a connection Goal - finding a connection between (short) sequences and between (short) sequences and structures.structures.

Page 23: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Statistical analysis of the clusters and Statistical analysis of the clusters and of the critical building blocksof the critical building blocks

Stability score distribution among Stability score distribution among cluster members.cluster members.

Criticalness score distribution Criticalness score distribution among cluster members.among cluster members.

Position distribution of the critical Position distribution of the critical building blocks.building blocks.

Stability score as a function of Stability score as a function of criticalness score.criticalness score.

Page 24: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

An example of stability distributionAn example of stability distribution

Page 25: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Criticalness score distribution within a clusterCriticalness score distribution within a cluster

Page 26: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

An N-terminus critical building block exampleAn N-terminus critical building block example

Page 27: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

A C-terminus critical building block exampleA C-terminus critical building block example

Page 28: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

A mid-sequence critical building block A mid-sequence critical building block exampleexample

Page 29: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Distribution of the position inside Distribution of the position inside the protein - all-alpha, level 3the protein - all-alpha, level 3

Page 30: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Stability vs. Criticalness score exampleStability vs. Criticalness score example

Page 31: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Stability score of critical and non-critical building Stability score of critical and non-critical building blocks (histogram)blocks (histogram)

Non-critical Critical

Page 32: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Final goalFinal goal

Given a sequence and using Given a sequence and using the information accumulated the information accumulated

so far - is there a way of so far - is there a way of matching a set of building matching a set of building

blocks to it?blocks to it?

Page 33: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

The building block assignment algorithmThe building block assignment algorithm

Perform sequence alignment of the protein Perform sequence alignment of the protein sequence against the building block sequence against the building block sequence database.sequence database.

Construct a directed, acyclic graph.Construct a directed, acyclic graph. – Each matching building block is a graph vertex Each matching building block is a graph vertex

and is assigned a score depending on the and is assigned a score depending on the sequence alignment score, building block sequence alignment score, building block stability and other parameters.stability and other parameters.

– Directed edges connecting the fragments that Directed edges connecting the fragments that match to consecutive areas in the protein match to consecutive areas in the protein sequence, allowing short overlaps and small sequence, allowing short overlaps and small gaps. gaps.

– Edge score – average score of connected Edge score – average score of connected vertices.vertices.

Page 34: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

The building block assignment algorithm The building block assignment algorithm (cont.)(cont.)

Add fictitious Add fictitious “start”“start” and and “target”“target” vertices.vertices.

Connect Connect startstart to all starting vertices to all starting vertices Connect all ending vertices to Connect all ending vertices to targettarget.. Find shortest path from Find shortest path from startstart to to

targettarget using the Single source using the Single source shortest path algorithm.shortest path algorithm.

The path is an “optimal” building The path is an “optimal” building block assignment covering the block assignment covering the protein sequence.protein sequence.

Page 35: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Illustration of the algorithmIllustration of the algorithm

Page 36: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Example – ROP protein from E. coli (1rpo)Example – ROP protein from E. coli (1rpo)

Page 37: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Example – Myoglobin from sea hare (1mba)Example – Myoglobin from sea hare (1mba)

Page 38: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

Suggestions for future workSuggestions for future work

Improving the algorithm and adding new Improving the algorithm and adding new parameters to it (secondary structure parameters to it (secondary structure alignment, trying other building blocks alignment, trying other building blocks from the same cluster as the matching from the same cluster as the matching building blocks etc.).building blocks etc.).

Combinatorial assembly – Yuval’s work.Combinatorial assembly – Yuval’s work. Further cluster analysis – inquiring into Further cluster analysis – inquiring into

sequence conservation sequence conservation Conformation stability measurements Conformation stability measurements

(molecular dynamics…)(molecular dynamics…)

Page 39: A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model

ConclusionsConclusions

Using the hierarchical folding Using the hierarchical folding model, It may be possible to model, It may be possible to

reduce the folding complexity, reduce the folding complexity, assigning local substructures assigning local substructures and then assembling them.and then assembling them.