Upload
scarlett-carson
View
256
Download
0
Tags:
Embed Size (px)
Citation preview
Protein Structure Prediction
Graham Wood
Charlotte Deane
The problem - in brief
MVLSEGEWQL
VLHVWAKVEA
DVAGHGQDIL
…
AKYKELCYOG
Databases
Algorithms
Software
+ =
Why is protein structure prediction needed?
• Essential functioning of cells is mediated by proteins
• It is protein structure that leads to protein function
• 3D structure determination is expensive, slow and difficult (by X-ray crystallography or NMR)
• Assists in the engineering of new proteins
Terminology
Target
- the unknown structure you are trying to model
Parent
- a known structure which provides a basis for modelling
The problem- more detail
Configuration space
EnergyEKGPDLYLIPLT
Protein databases
EKGPDLYLIPLT
Biologist Physicist
CASPCritical Assessment of Structure Prediction
Jan-Apr May Jun Jul Aug Sept Oct Nov Dec
Biologists
Caspers
Organisers
Call for structures
Publish seqs on web
Give sequences to organisersStructure determination Give structures to
organisers
Predict structure from sequence
Expert assessment
4 day mtg
Degree of evolutionary conservation
Less conservedInformation poor
More conservedInformation rich
DNA seq
Protein Seq Structure
Function
ACAGTTACACCGGCTATGTACTATACTTTG
HDSFKLPVMSKFDWEMFKPCGKFLDSGKLG
Three main approaches(in order of current success)
1. Comparative modelling
2. Fold recognition
3. De novo
Comparative modelling
Conserved backbone
EnergyEKGPDLYLIPLT
Target
Close homologues
Variable backbone
Side chains
Comparative modelling(protein building)
1. Prepare the raw materials
2. Build the model (two methods)
3. Check the model
4. Accept or reject the model
C1: Preparing the raw materials
Structurally align parents
Align target to parents
EKGPDLYLIPLTGiven target AA sequence
Identify parents (homologues)
loop region
secondary structure region
Structurally conserved regions and structurally variable regions
SCR
SVR
C2: Building (choice of two methods)
Attach and orient side-chains
Refine model
Determine SCRs and build associated backbone
Determine SVRs and buildrest of backbone
Assemble fragments Use spatial restraints
C2: Building (choice of two methods)
Orient side-chains
Refine model
Determine SCRs and build associated backbone
Determine SVRs and buildrest of backbone
Assemble fragments Use spatial restraints
Optimally satisfyspatial restraints
Extrapo lation
D T N V A Y C N K D
C3: Test model (C4: then accept or reject)
• Examine the model in the light of all experimental data
• PROCHECK, VERIFY3D, PROSA II, Visual inspection using 3D software, JOY
Problems in comparative modelling
• Aligning the target to the parents
• The packing of secondary structure elements in the core
• The long insertions and deletions in the structurally variable regions
Fold Recognition
?Target
Fold recognition
EnergyEKGPDLYLIPLT
Target Structurally similar proteins
Fold recognition(protein finding)
1. Obtain library of non-duplicate folds
2. Perform sequence-structure alignment
3. Assess success of alignment
• Biologist – use substitution matrix
• Physicist – use potentials
4. Accept or reject the model
Sequence-structure alignment
1. Construct sequence profile
2. Use profile to score the sequence
Target Parent
BLASTP
OWL MULTAL
Dynamic programming algorithm
Score
Amino acid substitutions are constrained by local environments
Different substitution patterns
Environment-specific substitution tables
•Main-chain conformation and secondary structure(α-helix, β-strand, coil and positive φ)
•Solvent accessibility(accessible and inaccessible)
•Hydrogen bonds(side-chain to main-chain NH, side-chain to main-chain CO and side-chain to side-chain)
Definition of local environments
Substitution scores
c
Eac
Eab
ffEabP ),|(
))/),|((log(round bEab PEabPS
bPBackground probability of observing amino acid b,
match occurring by chance
Log odds score scaled to the nearest integer
Probability that amino acid a in environment E
is replaced by amino acid b
Eabf
Frequency of observing amino acid a in environment E replaced by b
Scoring with potentials
))(
)(1log()1log()(
sf
sfmRTmRTsE
k
abk
abababk
Energy potential
Solvation potential
))(
)(log()(
rf
rfRTrE
aa
The Novel Fold Problem
?
asdghklprtwecvm
nasetyasdghklprtwecvm
nasety
De novo – new fold methods
EnergyEKGPDLYLIPLT
Segment configurations Sets of local configurations
Defining a “New Fold”
• CATH– Somewhat objective
• SCOP– No objective definition
– Tends towards evolutionary relationships
• Ask A. Murzin
New fold approach
• All structure information is in the AA sequence (Anfinson, Science, 1973)
• Seek “lowest free energy conformation”
• Tactic is to simplify the problem, for example
•Simplified model of protein (one atom per residue)
•Simple or knowledge based potential function
• Assist in detecting distant homologues
New fold recognition(structure discovery)
1. Set up domain and objective function
2. Perform optimisation
3. Check the model
4. Accept or reject the model
De Novo (biologist)ROSETTA (Baker et al.)
Domain of objective function
sequence
9 residues
.
.
. Set of local structures
consistent with local sequence
De Novo (biologist)ROSETTA
Objective function to be maximised
)sequence(
)structure|sequence()structure(
)sequence|structure(
P
PP
P
constantFunction of energy
i
ii EAAP )|(
De Novo (biologist)ROSETTA
Maximising the probability of the sequence
1. Choose each local conformation and start with a fully extended chain
2. Generate a neighbouring conformation
3. Accept in simulated annealing style, using P(structure|sequence)
4. Do this many times and cluster results – use centre of largest cluster as prediction
De Novo (physicist)ASTROFOLD (Floudas et al.)
1. Predict α-helices and β-strands
2. Predict β-sheets and disulphide bridges using ILP
3. Use deterministic global optimisation, with energy function and constraints to predict tertiary structure
Testing of prediction
servers- LiveBench
Sensitivity Specificity Added Value
Server Type Easy Hard All Hard Easy Hard
Pcons2 Consensus 6 4 2 2 3 3
ShotGun on 5 Consensus 1 2 4 4 7 5
ShotGun on 3 Consensus 2 1 1 1 2 2
Shotgun-INBGU Threading 3 3 3 3 4 1
INBGU Threading 7 5 6 9 5 6
Fugue3 Threading 14 8 9 8 15 9
Fugue2 Threading 12 7 8 7 10 8
Fugue1 Threading 17 14 14 11 16 15
mGenTHREADER Threading 8 11 16 13 6 11
GenTHREADER Threading 13 12 17 15 8 13
3D-PSSM Threading 5 10 12 12 12 10
ORFeus Sequence 4 6 7 6 1 4
FFAS Sequence 9 9 5 5 9 7
Sam-T99 Sequence 10 15 13 16 11 16
Superfamily Sequence 15 13 11 10 17 12
ORF-BLAST BLAST 11 16 10 14 14 14
PDB-BLAST BLAST 16 17 15 17 13 17
BLAST BLAST 18 18 18 18 18 18
Review - comparative modelling
Conserved backbone
EnergyEKGPDLYLIPLT
Target
Close homologues
Variable backbone
Side chains
Review - fold recognition
EnergyEKGPDLYLIPLT
Target Structurally similar proteins
Review - new fold methods
EnergyEKGPDLYLIPLT
Segment configurations Sets of local configurations
Summary: Prediction Methods
• Comparative modelling– There exists a protein with clear homology– PSI-BLAST
• Fold recognition– There exists a protein of similar fold (analogy)– DALI (CATH & SCOP)
• Novel Fold methods– The sequence has a new fold
• Better methods needed yet for it all to be useful!