A network-based representation of protein fold space

Preview:

DESCRIPTION

A network-based representation of protein fold space. Spencer Bliven. Qualifying Examination. 6/6 / 2011. Overview. Background & Motivation Preliminary Research Proposed Future Research. Fold Space. What protein folds ar e possible? Discrete or Continuous? Both? Neither ? - PowerPoint PPT Presentation

Citation preview

A network-based representation of protein fold space

Spencer Bliven

Qualifying Examination 6/6/2011

Overview1. Background & Motivation2. Preliminary Research3. Proposed Future Research

Fold SpaceWhat protein folds are possible?Discrete or Continuous? Both? Neither?What portion of fold space is utilized by nature?Long debated questions. Why?

Understanding of structure-function relationshipProtein design/engineeringProtein evolutionClassification

Previous Work Orengo, Flores, Taylor,

Thornton. Protein Eng (1993) vol. 6 (5) pp. 485-500

Holm and Sander. J Mol Biol (1993) vol. 233 (1) pp. 123-38

Holm and Sander. Science (1996) vol. 273 (5275) pp. 595-603

Shindyalov and Bourne. Proteins (2000) vol. 38 (3) pp. 247-60

Hou, Sims, Zhang, Kim. PNAS (2003) vol. 100 (5) pp. 2386-90

Taylor. Curr Opin Struct Biol (2007) vol. 17 (3) pp. 354-61

Sadreyev et al. Curr Opin Struct Biol (2009) vol. 19 (3) pp. 321-8

α

α+β

β

α/β

Why can we do better?More structuresSampling of globular folds “saturated”

Few novel folds being discoveredGeometric arguments for saturation of

small protein foldsRecent all-vs-all computation

Cluster sequence to 40% identity17,852 representative (updated weekly)189 million FATCAT rigid-body alignments

73503

http://www.rcsb.org/pdb/statistics/contentGrowthChart.do?content=total&seqid=100Accessed 5/31/2011

Structural Similarity Graph Nodes: PDB chains,

non-redundant to 40% Edges: FATCAT-rigid

alignments “Significant” edges:

p<0.001 Length > 25 Coverage > 50

Hierarchically cluster to reduce complexity in visualization

aba/ba+bMultiMembraneSmall

Agreement with SCOP

Class p<10-6

Fold p<10-7

Superfamily p<10-10

Continuity

Grishin. J Struct Biol (2001) vol. 134 (2-3) pp. 167-85

Skolnick claims ≤ 7 intermediates between any proteinsWe observe network diameter=15

Can find interesting paths

C4C5C6C7

Symmetry

Beta Propellers

SymmetryFunctionally important

Protein evolution (e.g. beta-trefoil)DNA bindingAllosteric regulationCooperativity

Widespread (~20% of proteins)Focus of algorithmic work

FGF-1 Lee & Blaber. PNAS 2011

TATA Binding Protein1TGH

Hemoglobin4HHB

Cross-class example 3GP6.A

PagP, modifies lipid A f.4.1 (transmembrane

beta-barrel)

1KT6.A Retinol-binding protein b.60.1 (Lipocalins)

Summary of Preliminary Research

Calculated all-vs-all alignment Prlić A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE. Pre-

calculated protein structure alignments at the RCSB PDB website. Bioinformatics (2010) vol. 26 (23) pp. 2983-2985

Built network of significant alignmentsApproximately matches SCOP classifications

Improved structural alignment algorithms Identify symmetry, circular permutations, topology

independent alignments Discussed more in report

Future ResearchImprove the network

1. Improve all-vs-all comparison algorithm2. Tune parameters during graph generation

Annotate the network & draw biological inferences3. Annotate nodes with functional information4. Compare with other networks

Create new networks5. Enhance structural comparison algorithms

1. Improve all-vs-all comparison algorithm

Need domain decompositionUse Combinatorial Extension (CE)

2. Tune parameters during graph generation

Don’t use p-valuesShouldn’t compare p-values, statistically*Not normalized by secondary structureNot accurate due to multiple testing problem

Use TM-scoreRMSD, normalized to the alignment length

Determine optimal thresholds for determining “significance”For instance, train an SVG

* Technically ok here, since one-to-one with the FATCAT score

FATCAT p-value by Class

Perform poorly on all-alpha in “twilight zone”

Terrible on membrane proteins Probably reflects non-

structural considerations in SCOP assignment

3. Annotate nodes with functional information

SCOP/CATH classificationsGO termsMetal bindingLigand bindingSymmetry

aba/ba+bMultiMembraneSmall

4. Compare with other networks

Define other types of network over the set of protein representativesProtein-protein interactionsCo-expression

Correlate to the structural similarities

Structural similarity

Protein-protein interaction

5. Enhance structural comparison algorithms

Improve automated pseudo-symmetry detection

Find topology-independent relationships

C3

SummaryFold space as networkImprove network creationAnnotate network with functional informationImprove structural similarity detection

AcknowledgmentsBourne Lab

Philip BourneAndreas PrlićLab & PDB members

Qualifying Exam Committee

Ruben AbagyanPatricia JenningsAndy McCammon

Collaborators

Philippe YoukharibacheJean-Pierre Changeux

Rotation Advisors

Pavel PevznerPhilip BourneJosé Onuchic & Pat JenningsMike MacCossVirgil Woods

Recommended