View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins
Zhong Chen and Ying Xu
Department of Biochemistry and Molecular Biology and
Institute of Bioinformatics
University of Georgia
Outline
1. Background information
2. Statistical analysis of known membrane protein structures
3. Structure prediction at residual level
4. Helix packing at atomistic level
5. Linking predictions at residue and atomistic levels
Membrane Proteins Roles in biological process:• Receptors;• Channels, gates and pumps;• Electric/chemical potential;• Energy transduction
> 50% new drug targets are membrane proteins (MP).
Beta structureHelical structure
Membrane Proteins
20-30% of the genes in a genome encode MPs. < 1% of the structures in the Protein Data Bank (PDB) are MPs
difficulties in experimental structure determination.
Membrane Proteins
Prediction for transmembrane (TM) segments (α-helix or β-sheet) based on sequence alone is very accurate (up to 95%);
? Prediction of the tertiary structure of the TM segments: how do these α-helices/β-sheets arrange themselves in the constrains of bi-lipid layers?
Helical structures are relatively easier to solve computationally
Membrane Protein Structures
Difficult to solve experimentally
Computational techniques could possibly play a significant role in solving MP structures, particularly helical structures
Statistical analysis of known structures:• Unveil the underlying principles for MP structure and stability;• Develop knowledge-based propensity scale and energy functions.
Structure prediction at residue level
Structure prediction at atomistic level: MC, MD
multi-scale, hierarchical computational framework
High Level Plan
Database for Known MP Structures: Helical Bundles
Redundant database• 50 pdb files• 135 protein chains
Non-redundant database (identity < 30%)• 39 pdb files• 95 protein chains (avg. length ~220 AA)
Statistics-based energy functions
Length of bi-lipid layer: ~60 Å Central regions Terminal regions
Three energy terms Lipid-facing potential Residue-depth potential Inter-helical interaction
potential
Central
Terminal
Terminal
30 Å60 Å
Lipid-facing Propensity Scale Residue Termini Central
ILE 0.84 1.33
VAL 0.71 1.30
LEU 0.89 1.30
PHE 1.03 1.38
CYS 0.37 0.67
MET 0.57 0.80
ALA 0.69 0.79
GLY 0.84 0.44
THR 0.79 0.61
SER 1.04 0.51
TRP 1.11 1.89
TYR 0.73 1.04
PRO 1.01 0.60
HIS 1.27 1.61
ASP 1.56 1.08
GLU 2.10 0.93
ASN 1.02 0.71
GLN 1.44 0.71
LYS 2.59 1.97
ARG 1.42 1.16
fraction of AA are lipid-facingLF_scale(AA) = fraction of AA are in interior
The most hydrophobic residues (ILE, VAL, LEU) prefer the surface of MPs in the central region, while prefer interior position in the terminal regions;
Small residues (GLY, ALA, CYS, THR) tend to be buried in the helix bundle;
Bulky residues (LYS, ARG, TRP, HIS) are likely to be found on the surface.
This propensity scale reflects both hydrophobic interactions and helix packing
Helical Wheel and Moment Analysis
Lipid facing vector prediction: state of the art
kPROT: avg. error ~41º
Samatey Scale: 61º
Hydrophobicity scales: 65 ~68º
-30
-20
-10
0
10
20
30
-30 -20 -10 0 10 20 30X (Angstrom)
Y (
An
gs
tro
m)
* Average Predication Error: 41 degree
The magnitude of each thin-vector is proportional to the LF-propensity and overall lipid-facing vector is the sum of all thin vectors,
Reside-Depth Potential
- hydrophobic residues tend to be located in the hydrocarbon core;
- hydrophilic residues tend to be closer to terminal regions;
- aromatic residues prefer the interface region.
),(
),(ln),(
exp zaN
zaNzaV obs
lp
-0.6
-0.2
0.2
0.6
1.0
0 5 10 15 20 25 30 35 40
z (Å)
Vlp
(k
ca
l/m
ol)
LEU
TRP
GLU
TM Helix Tilt Angle Prediction
major pVIII coat protein of the filamentous fd bacteriophage (1MZT)
0
0.02
0.04
0.06
0 30 60 90
Tilt angle q (degree)
Pro
ba
bili
ty d
en
sit
y /
sin
(q)
=1.0 kcal/mol
=5.0 kcal/mol
=10.0 kcal/mol
slp
slp
slp
experimental value 26 degree
GEM value 23 degree
23º
Inter-Helical Pair-wise Potential
),,(
),,(ln),,(
exp rjiN
rjiNrjiV obs
pw0.15
),,()/(exp
cutoff
cutoffobscutoff
r
rjiNrrN
Å
-2.00
0.00
2.00
4.00
6.00
1 3 5 7 9 11 13 15
Distance (angstrom)
En
erg
y
ILE-VALGLY-GLYARG-ARG
Statistical energy potentials (summary)
1. Three residue-based statistic potentials were derived from the database: (a) lipid-facing propensity, (b) residue depth potential, (c) inter-helical pair-wise potential
2. The lipid-facing scale predicted the lipid-facing direction for single helix with a uncertainty at ~ ±40º;
3. The residue-depth potential was able to predict the tilt angle for single helix with high accuracy.
4. Need more data to make inter-helical pair-wise potential more reliable
Key Prediction Steps• Structure prediction through optimizing our
statistical potential (weighted sum)
• Idealized and rigid helical backbone configurations;
• Monte Carlo moves: translations, rotations, rotation by helix axis;
• Wang-Landau sampling technique for MC simulation
• Principle component analysis.
In Wang-Landau, g(E) is initially set to 1 and modified “on the fly”. Monte Carlo moves are accepted with probability
Each time when an energy level E is visited, its density of states is updated by a modification factor f >1, i.e.,
Observation: if a random walk is performed with probability proportional to reciprocal of density of states then a flat energy histogram could be obtained.
Wang-Landau Method for MC
1,
)(
)(min)(
2
121 Eg
EgEEp
)(/1)( EgEp
The density of states is not known a priori.
fEgEg )()(
Wang-Landau Method for MC
Advantages:
1. simple formulation and general applicability;
2. Entropy and free energy information derivable from g(E);
3. Each energy state is visited with equal probability, so energy barriers are overcome with relative ease.
Principal Component Analysis
Purpose:
- analyze the conformation variations during a simulation, and
- identify the most important conformational degrees of freedom.
Covariance matrix:
* A large part of the system’s fluctuations can be described in terms of only a few PCA eigenvectors.
0
10
20
30
40
50
60
70
80
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Eigen Vector
Eig
en
Va
lue
pe
rce
nta
ge
A Model System: Glycophorin (GpA) Dimer
22 residues, 189 atoms
EITLIIFGVMAGVMAGVIGTILLISY
•GxxxG motif
•Ridges-into-grooves
Glycophorin (GpA) Dimer (1AFO)
RMSD=3.6AE=-114.6kcal/mol
A: GEM (global energy minimum)
B: LEM
RMSD=0.8AE=-93.9kcal/mol
RED: experiment
GREY: simulation
-120
-100
-80
-60
-40
-20
0
0 5 10 15
RMSD (Å)
En
erg
y (
kc
al/
mo
l)B A
Helices A and B of Bacteriorhodopsin (1QHJ)
RMSD=2.7AE=-94kcal/mol
A: GEM
B: LEM
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
0 5 10 15 20
RMSD (Å)
En
erg
y (
kc
al/m
ol)
RMSD=0.9AE=-86kcal/mol
AB
RED: experiment
GREY: simulation
Bacteriorhodopsin (1QHJ)
Rmsd=5.0A
A
BC
D
E
FG
A
Experimental structure
-600
-500
-400
-300
-200
-100
0
100
0 5 10 15 20
RMSD (Å)E
ne
rgy
(k
ca
l/mo
l)
Computational prediction
Residue-level structure prediction (Summary)
1. A computational scheme was established for TM helix structure prediction at residue level;
2. For two-helix systems, LEM structures very close to native structures (RMSD < 1.0 Å) were consistently predicted;
3. For a seven-helix bundle, a packing topology within 5.0 Å of the crystal structure was identified as one of the LEMs.
Key Prediction Steps
Structure prediction through optimizing atom-level energy potential: CHARMM19 force field for helix-helix interaction Knowledge-based energy function for lipid-helix interaction
Idealized and rigid helix structure for backbone and sidechain flexible;
Apply helix orientation constraint (i.e., N-term inside/outside cell);
MC moves: translations, rotations, rotation by helix axis, and side-chain torsional rotation;
Wang-Landau algorithm for MC simulation
CHARMM19 Polar Hydrogen Force Field
- nonpolar hydrogen atoms are combined with heavy atoms they are bound to ,
- polar hydrogen atoms are modeled explicitly.
ji ij
ij
ij
ijijvdw rr
V
612
2
lpesvdw VVVVV
ji ij
jies Dr
qqE
04
2D Wang-Landau Sampling in PC1 and E Spaces
LEM2LEM1
1.0E-08
1.0E-06
1.0E-04
1.0E-02
1.0E+00
-14 -10 -6 -2 2 6 10
PC1 (Å)P(
PC1)
300K
150K
ABF E
D C
E
kTEPCEgPCP )/exp()1,()1(
Effect of Helix-Lipid Interactions: Helices A&B of Bacteriorhodopsin
Helix-helix interactions Helix-helix & helix-lipid interactions
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 2 4 6 8 10
RMSD (Å)
Pro
bab
ilit
y
150K
306K
524K
b
g
d
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 2 4 6 8 10
RMSD (Å)
Pro
ba
bil
ity
150K
306K
524K
bg
Helix-lipid interactions play a critical role in the correct packing of helices
Effect of Helix-Lipid Interactions: Helix A&B of Bacteriorhodopsin (BR)
RMSD=4.4ÅRMSD=0.2Å RMSD=5.7Å RMSD=7.1Å
30 Å
Hydrocarbon core region
All four LEM structures share essentially the same contact surfaces.In the native structure, the polar N-terminals of both helices are located outside of hydrocarbon core region, resulting in low helix-lipid energy.
Docking of a Seven-helix Bundle: Bacteriorhodopsin (1QHJ)
7 helices, 174 residues, 1619 atoms
• CHARMM19 + lipid-helix potential;
• One month CPU time on one PC
AB
A
B
Initial Configuration
Crystal structure
Atom-level Structure Prediction (Summary)
1. Wang-Landau algorithm proved to be effective for the energetics study of TM helix packing;
2. Prediction results for two-helix and seven-helix structures are highly promising
3. Practical application of Wang-landau method to large systems requires further work.
Correspondence between simulations at two levels
-7
-6
-5
-4
-3
-2
-1
0
0 5 10 15 20
Residue number
inte
rhe
lica
l VD
W e
ne
rgy
CHARMM19
Knowledge-based
A multi-scale hierarchical modeling approach is feasible and practical:
•LEMs identified at residue-level be used as candidates for atomistic simulation;
•Using PC vectors from residue-level simulation to improve search speed in atomistic simulation.
Future Works
1. Further improvement of the residue-based folding potentials;
2. Speed-up and parallelization of Wang-Landau sampling;
3. Construct a hierarchical computational framework, and develop corresponding software package.