41
Protein Structure Prediction Graham Wood Charlotte Deane

Protein Structure Prediction Graham Wood Charlotte Deane

Embed Size (px)

Citation preview

Page 1: Protein Structure Prediction Graham Wood Charlotte Deane

Protein Structure Prediction

Graham Wood

Charlotte Deane

Page 2: Protein Structure Prediction Graham Wood Charlotte Deane

The problem - in brief

MVLSEGEWQL

VLHVWAKVEA

DVAGHGQDIL

AKYKELCYOG

Databases

Algorithms

Software

+ =

Page 3: Protein Structure Prediction Graham Wood Charlotte Deane

Why is protein structure prediction needed?

• Essential functioning of cells is mediated by proteins

• It is protein structure that leads to protein function

• 3D structure determination is expensive, slow and difficult (by X-ray crystallography or NMR)

• Assists in the engineering of new proteins

Page 4: Protein Structure Prediction Graham Wood Charlotte Deane

Terminology

Target

- the unknown structure you are trying to model

Parent

- a known structure which provides a basis for modelling

Page 5: Protein Structure Prediction Graham Wood Charlotte Deane

The problem- more detail

Configuration space

EnergyEKGPDLYLIPLT

Protein databases

EKGPDLYLIPLT

Biologist Physicist

Page 6: Protein Structure Prediction Graham Wood Charlotte Deane

CASPCritical Assessment of Structure Prediction

Jan-Apr May Jun Jul Aug Sept Oct Nov Dec

Biologists

Caspers

Organisers

Call for structures

Publish seqs on web

Give sequences to organisersStructure determination Give structures to

organisers

Predict structure from sequence

Expert assessment

4 day mtg

Page 7: Protein Structure Prediction Graham Wood Charlotte Deane

Degree of evolutionary conservation

Less conservedInformation poor

More conservedInformation rich

DNA seq

Protein Seq Structure

Function

ACAGTTACACCGGCTATGTACTATACTTTG

HDSFKLPVMSKFDWEMFKPCGKFLDSGKLG

Page 8: Protein Structure Prediction Graham Wood Charlotte Deane

Three main approaches(in order of current success)

1. Comparative modelling

2. Fold recognition

3. De novo

Page 9: Protein Structure Prediction Graham Wood Charlotte Deane

Comparative modelling

Conserved backbone

EnergyEKGPDLYLIPLT

Target

Close homologues

Variable backbone

Side chains

Page 10: Protein Structure Prediction Graham Wood Charlotte Deane

Comparative modelling(protein building)

1. Prepare the raw materials

2. Build the model (two methods)

3. Check the model

4. Accept or reject the model

Page 11: Protein Structure Prediction Graham Wood Charlotte Deane

C1: Preparing the raw materials

Structurally align parents

Align target to parents

EKGPDLYLIPLTGiven target AA sequence

Identify parents (homologues)

Page 12: Protein Structure Prediction Graham Wood Charlotte Deane

loop region

secondary structure region

Structurally conserved regions and structurally variable regions

SCR

SVR

Page 13: Protein Structure Prediction Graham Wood Charlotte Deane

C2: Building (choice of two methods)

Attach and orient side-chains

Refine model

Determine SCRs and build associated backbone

Determine SVRs and buildrest of backbone

Assemble fragments Use spatial restraints

Page 14: Protein Structure Prediction Graham Wood Charlotte Deane
Page 15: Protein Structure Prediction Graham Wood Charlotte Deane

C2: Building (choice of two methods)

Orient side-chains

Refine model

Determine SCRs and build associated backbone

Determine SVRs and buildrest of backbone

Assemble fragments Use spatial restraints

Optimally satisfyspatial restraints

Page 16: Protein Structure Prediction Graham Wood Charlotte Deane

Extrapo lation

D T N V A Y C N K D

Page 17: Protein Structure Prediction Graham Wood Charlotte Deane

C3: Test model (C4: then accept or reject)

• Examine the model in the light of all experimental data

• PROCHECK, VERIFY3D, PROSA II, Visual inspection using 3D software, JOY

Page 18: Protein Structure Prediction Graham Wood Charlotte Deane

Problems in comparative modelling

• Aligning the target to the parents

• The packing of secondary structure elements in the core

• The long insertions and deletions in the structurally variable regions

Page 19: Protein Structure Prediction Graham Wood Charlotte Deane

Fold Recognition

?Target

Page 20: Protein Structure Prediction Graham Wood Charlotte Deane

Fold recognition

EnergyEKGPDLYLIPLT

Target Structurally similar proteins

Page 21: Protein Structure Prediction Graham Wood Charlotte Deane

Fold recognition(protein finding)

1. Obtain library of non-duplicate folds

2. Perform sequence-structure alignment

3. Assess success of alignment

• Biologist – use substitution matrix

• Physicist – use potentials

4. Accept or reject the model

Page 22: Protein Structure Prediction Graham Wood Charlotte Deane

Sequence-structure alignment

1. Construct sequence profile

2. Use profile to score the sequence

Target Parent

BLASTP

OWL MULTAL

Dynamic programming algorithm

Score

Page 23: Protein Structure Prediction Graham Wood Charlotte Deane

Amino acid substitutions are constrained by local environments

Different substitution patterns

Environment-specific substitution tables

Page 24: Protein Structure Prediction Graham Wood Charlotte Deane

•Main-chain conformation and secondary structure(α-helix, β-strand, coil and positive φ)

•Solvent accessibility(accessible and inaccessible)

•Hydrogen bonds(side-chain to main-chain NH, side-chain to main-chain CO and side-chain to side-chain)

Definition of local environments

Page 25: Protein Structure Prediction Graham Wood Charlotte Deane

Substitution scores

c

Eac

Eab

ffEabP ),|(

))/),|((log(round bEab PEabPS

bPBackground probability of observing amino acid b,

match occurring by chance

Log odds score scaled to the nearest integer

Probability that amino acid a in environment E

is replaced by amino acid b

Eabf

Frequency of observing amino acid a in environment E replaced by b

Page 26: Protein Structure Prediction Graham Wood Charlotte Deane

Scoring with potentials

))(

)(1log()1log()(

sf

sfmRTmRTsE

k

abk

abababk

Energy potential

Solvation potential

))(

)(log()(

rf

rfRTrE

aa

Page 27: Protein Structure Prediction Graham Wood Charlotte Deane

The Novel Fold Problem

?

asdghklprtwecvm

nasetyasdghklprtwecvm

nasety

Page 28: Protein Structure Prediction Graham Wood Charlotte Deane

De novo – new fold methods

EnergyEKGPDLYLIPLT

Segment configurations Sets of local configurations

Page 29: Protein Structure Prediction Graham Wood Charlotte Deane

Defining a “New Fold”

• CATH– Somewhat objective

• SCOP– No objective definition

– Tends towards evolutionary relationships

• Ask A. Murzin

Page 30: Protein Structure Prediction Graham Wood Charlotte Deane

New fold approach

• All structure information is in the AA sequence (Anfinson, Science, 1973)

• Seek “lowest free energy conformation”

• Tactic is to simplify the problem, for example

•Simplified model of protein (one atom per residue)

•Simple or knowledge based potential function

• Assist in detecting distant homologues

Page 31: Protein Structure Prediction Graham Wood Charlotte Deane

New fold recognition(structure discovery)

1. Set up domain and objective function

2. Perform optimisation

3. Check the model

4. Accept or reject the model

Page 32: Protein Structure Prediction Graham Wood Charlotte Deane

De Novo (biologist)ROSETTA (Baker et al.)

Domain of objective function

sequence

9 residues

.

.

. Set of local structures

consistent with local sequence

Page 33: Protein Structure Prediction Graham Wood Charlotte Deane

De Novo (biologist)ROSETTA

Objective function to be maximised

)sequence(

)structure|sequence()structure(

)sequence|structure(

P

PP

P

constantFunction of energy

i

ii EAAP )|(

Page 34: Protein Structure Prediction Graham Wood Charlotte Deane

De Novo (biologist)ROSETTA

Maximising the probability of the sequence

1. Choose each local conformation and start with a fully extended chain

2. Generate a neighbouring conformation

3. Accept in simulated annealing style, using P(structure|sequence)

4. Do this many times and cluster results – use centre of largest cluster as prediction

Page 35: Protein Structure Prediction Graham Wood Charlotte Deane
Page 36: Protein Structure Prediction Graham Wood Charlotte Deane

De Novo (physicist)ASTROFOLD (Floudas et al.)

1. Predict α-helices and β-strands

2. Predict β-sheets and disulphide bridges using ILP

3. Use deterministic global optimisation, with energy function and constraints to predict tertiary structure

Page 37: Protein Structure Prediction Graham Wood Charlotte Deane

Testing of prediction

servers- LiveBench

Sensitivity Specificity Added Value

Server Type Easy Hard All Hard Easy Hard

Pcons2 Consensus 6 4 2 2 3 3

ShotGun on 5 Consensus 1 2 4 4 7 5

ShotGun on 3 Consensus 2 1 1 1 2 2

Shotgun-INBGU Threading 3 3 3 3 4 1

INBGU Threading 7 5 6 9 5 6

Fugue3 Threading 14 8 9 8 15 9

Fugue2 Threading 12 7 8 7 10 8

Fugue1 Threading 17 14 14 11 16 15

mGenTHREADER Threading 8 11 16 13 6 11

GenTHREADER Threading 13 12 17 15 8 13

3D-PSSM Threading 5 10 12 12 12 10

ORFeus Sequence 4 6 7 6 1 4

FFAS Sequence 9 9 5 5 9 7

Sam-T99 Sequence 10 15 13 16 11 16

Superfamily Sequence 15 13 11 10 17 12

ORF-BLAST BLAST 11 16 10 14 14 14

PDB-BLAST BLAST 16 17 15 17 13 17

BLAST BLAST 18 18 18 18 18 18

Page 38: Protein Structure Prediction Graham Wood Charlotte Deane

Review - comparative modelling

Conserved backbone

EnergyEKGPDLYLIPLT

Target

Close homologues

Variable backbone

Side chains

Page 39: Protein Structure Prediction Graham Wood Charlotte Deane

Review - fold recognition

EnergyEKGPDLYLIPLT

Target Structurally similar proteins

Page 40: Protein Structure Prediction Graham Wood Charlotte Deane

Review - new fold methods

EnergyEKGPDLYLIPLT

Segment configurations Sets of local configurations

Page 41: Protein Structure Prediction Graham Wood Charlotte Deane

Summary: Prediction Methods

• Comparative modelling– There exists a protein with clear homology– PSI-BLAST

• Fold recognition– There exists a protein of similar fold (analogy)– DALI (CATH & SCOP)

• Novel Fold methods– The sequence has a new fold

• Better methods needed yet for it all to be useful!