Upload
frederick-lindsey
View
230
Download
0
Embed Size (px)
Citation preview
Prof Shoba RanganathanProf Shoba Ranganathan
Dept. of Chemistry and Biomolecular Sciences, Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia &Macquarie University, Sydney, Australia &
Dept of Biochemistry, Yong Loo Lin School of MedicineDept of Biochemistry, Yong Loo Lin School of MedicineNational University of SingaporeNational University of Singapore
([email protected])([email protected])
Biomolecular Modeling:Biomolecular Modeling: building a 3D protein building a 3D protein
structure from its sequencestructure from its sequence
Why protein structure?Why protein structure? In the factory of the living cell, proteins are the
workers, performing a variety of tasks
Each protein adopts a particular folding pattern that determines its functionThe 3D structure of a protein brings
into close proximity residues that are far apart in the amino acid sequence
How does a protein fold?How does a protein fold? Most newly synthesized proteins fold
without assistance! Ribonuclease A: denatured protein
could refold and recover its activity (C. Anfinsen -1966)“Structure implies function”
The amino acid sequence encodes the protein’s structural information
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
6. Building the Model
The basicsThe basics Proteins are linear heteropolymers: one or more
polypeptide chains Repeat units: 20 amino acid residues
Range from a few 10s-1000s Three-dimensional shapes (“folds”)
adopted vary enormously Experimental methods: X-ray
crystallography, electron microscopy and NMR (nuclear magnetic resonance)
The (L-)amino acidThe (L-)amino acid
N
R
C
O
O
C
+
-
Amino
Carboxylate
Side chain = H,CH3,…
Backbone
The peptide bondThe peptide bond
Coplanar atomsCoplanar atoms
Levels of protein structureLevels of protein structure
Zeroth: amino acid composition Primary
This is simply the order of covalent linkages along the polypeptide chain, i.e. the sequence itself
Levels of protein structureLevels of protein structure
Secondary Local organization of the protein backbone: -
helix, -strand (which assemble into -sheets), turn and interconnecting loop
Ramachandran / phi-psi plotRamachandran / phi-psi plot
-helix (right
handed)
-sheet
-helix (left handed)
Levels of protein structureLevels of protein structure
Tertiary packing of secondary
structure elements into a compact spatial unit
“Fold” or domain – this is the level to which structure prediction is currently possible
Levels of protein structureLevels of protein structure
Quaternary Assembly of homo- or
heteromeric protein chains
Usually the functional unit of a protein, especially for enzymes
Structural classesStructural classes
All- (helical) All- (sheet)
(parallel -sheet)
Structural classesStructural classes
(antiparallel -sheet)
Structural informationStructural information
Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/pdb > 45,744 structures of proteins Also contains structures of DNA,
carbohydrates, protein-DNA complexes and numerous small ligand molecules.
The PDB dataThe PDB data
Text files Each entry is identified by a unique 4-
letter code: say 1emg 1emg entry
Header information Atomic coordinates in Å (1 Ångstrom
= 1.0e-10 m)
PDB Header detailsPDB Header details identifies the molecule, any modifications, date of
release of PDB entry
organism, keywords, method Authors, reference, resolution if X-ray structure Sequence, x-reference to sequence databases
HEADER GREENFLUORESCENT PROTEIN 12-NOV-98 1EMG TITLE GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T TITLE 2 SUBSTITUTION, Q80R) COMPND MOL_ID: 1; COMPND 2 MOLECULE: GREEN FLUORESCENT PROTEIN; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R COMPND 6 SUBSTITUTION; COMPND 7 BIOLOGICAL_UNIT: MONOMER
The data itselfThe data itself
ATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75 ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64 ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02 ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28-------ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62TER 1738 ILE A 229
Coordinates for each heavy (non-hydrogen) atom from the first residue to the last
Any ligands (starting with HETATM) follow the biomacromolecule
O of water molecules (also HETATM) at the end
Structural FamiliesStructural Families SCOP - Structural Classification Of
Proteins http://scop.mrc-lmb.cam.ac.uk/scop
FSSP – Family of Structurally Similar Proteins http://www.ebi.ac.uk/dali/fssp/
CATH – Class, Architecture, Topology, Homology http://www.biochem.ucl.ac.uk/bsm/cath
Structure comparison factsStructure comparison facts
Proteins adopt a limited number of topologies. Homologous sequences show very
similar structures, with strong conservation in secondary structural elements: variations in non-conserved regions.
In the absence of sequence homology, some folds are preferred by vastly different sequences.
Structure comparison factsStructure comparison facts The “active site” (a collection of functionally
critical residues) is remarkably conserved, even when the protein fold is different. Structural models (especially those based
on homology) provide insights into possible function for new proteins.
Implications for protein engineering ligand/drug design, function assignment of genomic data.
Visualizing PDB informationVisualizing PDB information RASMOL: most popular, available for all platforms (Sayle et al, 2005) http://www.bernstein-plus-sons.com/software/rasmol
DeepView Swiss-PDBViewer: from Swiss-Prot (Guex & Peitsch, 1997) http://tw.expasy.org/spdbv/
Chemscape Chime Plug-in: for PC and Mac http://www.mdli.com/products/framework/chemscape
PyMOL: Very good, available for all platforms (DeLano, W.L. The PyMOL Molecular Graphics System, 2002) http://pymol.sourceforge.net
RASMOL views - SH2 domain RASMOL views - SH2 domain
All-atom model Space-filling model
Atom colors: N O C S
C Trace Ribbon
Rainbow coloring: N to C Coloring: by structural units
RASMOL views – 1sha RASMOL views – 1sha
Homologous foldsHomologous folds Hemoglobin and
erythrocruorin: 31% sequence identity
Analogous foldsAnalogous folds Hemoglobin and
phycocyanin: 9% sequence identity
Surface PropertiesSurface PropertiesCro repressor –
DNA complex Basic residues
in blue Acidic residues
in red
Mapping Functional RegionsMapping Functional Regions
Immunoglobulin light chain - dimer
Hydrophobhic residues in magenta
Hydrophilic and charged residues in cyan
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
6. Building the Model
Siblings and CousinsSiblings and Cousins Siblings or homologues: sequences with at least
30% sequence identity over an alignment length of at least 125 residues and conservation of function.
Cousins or paralogues: < 30% identity but with conservation of function
Both show structural conservation Homologues located using a database search tool
such as BLAST (free webserver): http://www.ncbi.nlm.nih.gov/BLAST
Paralogues require a more sensitive method such as PSI-BLAST
Multiple Sequence AlignmentMultiple Sequence AlignmentFinding the best way to match the residues ofrelated sequences Identical residues must be lined up The rest should be arranged, based on
observed substitution in protein families chemical similarity charge similarity
Where it is impossible to get the residues to line up, the biological concept of insertion/deletion in invoked: the ‘gap’ in alignments
MSA MethodsMSA Methods CLUSTALW / CLUSTALX (Thompson et al, 1997):
freely available for all platforms and one of the best alignment programs
http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html
MAXHOM (Sander & Schneider, 1991): alignment based on maximum homology; available via the PredictProtein webserver, free for academics
http://cubic.bioc.columbia.edu/predictprotein/
MALIGN (Johnson et al, 1994): freely available UNIX program, based on the structural alignment of protein families
http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html
Alignment ChecksAlignment Checks Conservation of functionally important residues:
e.g. the catalytic triad (Asp-Ser-His) that are essential for serine proteinase activity
Line up of structurally important residues: e.g. cysteines forming disulfide bonds
Overall, maximizing the alignment of “like” residues
Completely conserved residues usually indicate some conserved structural or functional role, especially buried charges
Sequence Motifs & PatternsSequence Motifs & Patterns From the analysis of the alignment of
protein families Conserved sequence features, usually
associated with a specific function PROSITE (Hulo et al, 2006) database for
protein “signature” patterns: http://www.expasy.ch/prosite
Aligned Sequence FamiliesAligned Sequence Families From alignments of homologous
sequences: PRINTS PRODOM:
http://www.toulouse.inra.fr/prodom.html
From Hidden Markov Model based methods: PFAM: http://www.sanger.ac.uk/Pfam
Protein DomainsProtein Domains Most proteins are composed of structural subunits
called domains A domain is a compact unit of protein structure,
usually associated with a function. It is usually a “fold” - in the case of monomeric
soluble proteins. A domain comprises normally only one protein
chain: rare examples involving 2 chains are known.
Domains can be shared between different proteins: like a LEGO block
Protein ArchitecturesProtein Architectures Beads-on-a-string: sequential location: tyrosine-
protein kinase receptor TIE-1 (immunoglobulin, EGF, fibronectin type-3 and protein kinase).
Domain insertions: “plugged-in” - pyruvate kinase (1pyk)
SMART: smart.embl-heidelberg.deSimple Modular Architecture Retrieval Tool
Dissection into DomainsDissection into Domains A sequence, usually > 125 residues should
be routinely checked to see how many domains are present.
Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence
E.g. NP_002917 shows similarity to G-protein regulators:
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
6. Building the Model
Structural HomologuesStructural Homologues
BLASTP vs. PDB database or PSI-BLAST: look for 4-character PDB ID E < 0.005
Domain coverage: at least 60% coverage is recommended
Gaps: we don’t want them. Choose between: few gaps and reasonable similarity scores or lots of gaps and high similarity scores?
Small Proteins: Disulfide bondsSmall Proteins: Disulfide bonds BLAST-type methods may not locate
homologues, if Conserved Domain search is not turned on.
Are the Cys residues conserved? Gaps: where are they on the structure?
gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein) four-disulfide core'.
CD-Length = 46 residues, 100.0% aligned Score = 43.9 bits (102), Expect = 1e-06
Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97 D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP46
Metal-binding domainsMetal-binding domains
C2H2 Zinc Finger 2 Cys & 2 His binding to
Zinc Not detected even by CD-
search in BLAST Detected by Pfam &
SMART Sequence Pattern:#-X-C-X(1-5)-C-X3-#-X5-#-
X2-H-X(3-6)-[H/C]
Structure Prediction MethodsStructure Prediction Methods Secondary Structure Prediction: identify local
structural elements such as helices, strands and loops.
> 75% accuracy achievable PredictProtein or PHD
http://cubic.bioc.columbia.edu/pp/ PSIPRED
http://bioinf.cs.ucl.ac.uk/psipred/ SSPro
http://promoter.ics.uci.edu/BRNN-PRED/
Folds from Secondary Folds from Secondary Structure PredictionsStructure Predictions
Assembling SSEs into folds is a combinatorial problem
Current methods depend on available structural data for mapping predictions: FORREST
http://abs.cit.nih.gov/foresst/foresst.html TOPITS from the PHD server
http://cubic.bioc.columbia.edu/pp
Tertiary Structure PredictionTertiary Structure Prediction Fold recognition/Threading: < 20% identity
typically Best results obtained by combining several
database search and knowledge-based tools: 3D-PSSM
http://www.sbg.bio.ic.ac.uk/~3dpssm/ FUGUE
http://www-cryst.bioc.cam.ac.uk/fugue/
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
6. Building the Model
One or many templates?One or many templates? Sequence similarity: extract template
sequences and align with query: select the most similar structure
Completeness: Missing data? REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 MET A 1 REMARK 465 THR A 230
REMARK 470 M RES CSSEQI ATOMS REMARK 470 GLU A 5 OE2 REMARK 470 GLU A 6 CG CD OE1 OE2 REMARK 470 GLU A 17 OE1
One or many templates?One or many templates? X-ray or NMR?:
Lowest resolution X-ray structure X-ray and then NMR NMR average over assembly
One or many?: Structure alignment of C atoms If 2 templates are very close, keep only one Keep templates that provide new information
Many templatesMany templates Sequence alignment from structure
comparison of templates (SSA) can be different from a simple sequence alignment (SA).
For model building, 1. align templates structurally
2. extract the corresponding SSA
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to the Template Structure(s)
6. Building the Model
Query - Template AlignmentQuery - Template Alignment >40% identity: any alignment method is OK Below this, checks are essential.
Collect close sequence homologues (about 10) and align to query to get MSA (multiple sequence alignment)
Collect several structural templates (at least 5) and align them using structure comparison methods: extract the SSA (structural sequence alignment)
Align MSA to SSA using profile alignment Extract query and selected template(s) from the
final alignment – QTA.
QTA ChecksQTA Checks Residue conservation checks
Functional regions Patterns/motifs conserved?
Indels Combine gaps separated by few residues
Editing the alignment Move gaps from secondary structures to
loops Within loops, move gaps to loop ends, i.e.
turnaround point of backbone
QTA ChecksQTA Checks Residue conservation checks
Functional regions Patterns/motifs conserved?
Indels Combine gaps separated by few residues
Editing the alignment Move gaps from secondary structures to
loops Within loops, move gaps to loop ends, i.e.
turnaround point of backbone
Visual Inspection of IndelsVisual Inspection of Indels 2-residue
deletion from sequence alignment
End-of-loop 2-residue deletion
1. Understanding Protein Structure
2. A Quick Overview of Sequence Analysis
3. Finding a Structural Homologue
4. Template Selection
5. Aligning the Query Sequence to Template Structure(s)
6. Building the Model
Input for Model BuildingInput for Model Building
Query sequence Template structure
Template sequence Query-template sequence alignment
Methods AvailableMethods Available
1. WHATIF (Vriend G, 1990) : High quality models where template is
available Indels not modelled
Side chain rotamers In silico mutations In silico disulfide bond creation
Methods AvailableMethods Available
2. SWISS-MODEL (Schwede et al, 2003) : Automatic modeling mode with multiple
templates Query + template input High Homology situations DeepView for input file creation
Methods AvailableMethods Available3. MODELLER (Sali & Blundell, 1993) :
High quality models Sequence alignment Structure analysis/alignment Multiple templates Multiple chains Ligand/cofactor present
4. ESyPred3D (uses MODELLER): QTAs from several methods & neural networks http://www.fundp.ac.be/urbm/bioinfo/esypred/
Methods AvailableMethods Available5. ICM (Ruben et al, 1994) :
High quality models Loop modelling
Multiple templates not possible Sequence/Structure alignment/analysis Ab initio peptide modeling Secondary structure prediction
6. Geno3D (Combet et al, 2002) : Automated modelling Distance geometry used for loops http://geno
Methods AvailableMethods Available
7. 3D-JIGSAW (Bates et al, 2001) : Automatic modeling mode Interactive user mode to select templates Multiple templates Multidomain protein modeling
Methods AvailableMethods Available
8. CPH-MODELS (Lund et al, 1997) : Fully automated FASTA search for templates Not validated
Automatic or Manual Mode?Automatic or Manual Mode? Automatic: High homology
Manual Medium/Low homology Template from structure prediction Multiple templates Multiple chains Ligand present
How good is the model?How good is the model?
Structural Quality Analysis
PROCHECK (Laskowski et al, 1993) :
WHATIF (Vriend G, 1990) : ERRAT (Colovos & Yeates, 1993) :
Improving ill-defined regionsImproving ill-defined regions
Iterative model building Rebuild or anneal bad regions Check/edit alignment and rebuild
Molecular dynamics and/or Monte Carlo simulations Compute intensive Input files need to be set up Optional
Molecular Modeling ProtocolMolecular Modeling Protocol Resources required
The query sequence Personal computer with internet
connectivity RASMOL/DeepView for PDB structure
visualization CLUSTALX sequence alignment software Access to a UNIX workstation MODELLER/ICM – UNIX software WHATIF – UNIX/PC software PROCHECK – UNIX or Windows software
MM Protocol – Input FilesMM Protocol – Input Files
Minimum requirement Query sequence Template structure
Template sequence Query-Template alignment
Ex 1. High Homology CaseEx 1. High Homology Case
Human SOX9 WT - homologous to SRY (PDB: 1HRY) - 49% identity
S9WT: ..AGAACAATGG.. highest SOXCORE: ..GCAACAATCT.. least Mutants (campomelic dysplasia):
F12L: No DNA binding H65Y: Minimal binding P70R: altered specificity; no SOXCORE A19V: near WT but normal binding
1. SOX9 Models1. SOX9 Models
WT & P70R models built
C overlay:
WT-SRY ~ 0.72 Å J. Biol. Chem. 274
(1999) 24023
SOX9 & P70Rbased on SRY
Ex 1. SOX9 Ex 1. SOX9 + DNA Models+ DNA Models
Observed disease-linked mutations mapped
Other residues in DNA-binding groove determined
SOX9-WT
SOX9-P70R
SRY
Ex 2. Low Homology SituationEx 2. Low Homology Situation Pigments from reef-building corals: similar to
Pocilloporin fluoresce under UV and visible radiation similar to the Green Fluorescent Protein -
GFP (19.6% identity) contain ‘QYG’ instead of ‘SYG’ in GFP, as
proposed fluorophore
2. Alignment of POC4 & GFP2. Alignment of POC4 & GFP
2. POC4 Model2. POC4 Model Barrel ends open C-ter not included -sheet OK ‘QYG’ fits the site! 26 residues within 5Å
of QYG (only 19 in GFP)
Increased thermal stability
UV protection
Ex 3. Small Disulfide-bonded Ex 3. Small Disulfide-bonded Protein: Complement Factor HProtein: Complement Factor H
20 tandem homologous units = SCRs (short consensus repeat) or “sushi” regions
Each SCR is ~ 60 aa: conserved Y, P, G 2 disulfide bridges:1-3 & 2-4 Linkers of 3-8 aa
Heparin binding SCRs: 7 (high affinity) & 20
Previous SCRs required for activity: minimum constructs are fH67 and fH18-20
C1
C3
C2C4
N-ter
C-ter
HV loop
3. Sequence Alignment of Close 3. Sequence Alignment of Close Functional HomologuesFunctional Homologues
hfH 385 C L R K C Y F P Y L E N G Y N Q N H G R K F V Q G K S I D V A
fHR-3 83 C L R K C Y F P Y L E N G Y N Q N Y G R K F V Q G N S T E V A
bfH 292 C L R Q C I F N Y L E N G H N Q H R E E K Y L Q G E T V R V H
mfH 385 C V R K C V F H Y V E N G D S A Y W E K V Y V Q G Q S L K V Q
Consensus * : * : * * * : * * * . . : : * * : : *
hfH 416 C H P G Y A L P K A - Q T T V T C M E N G W S P T P R C I R 444
fHR-3 114 C H P G Y G L P K V R Q T T V T C T E N G W S P T P R C I R 143
bfH 323 C Y E G Y S L Q N D - Q N T M T C T E S G W S P P P R C I R 351
mfH 416 C Y N G Y S L Q N G - Q D T M T C T E N G W S P P P K C I R 444
Consensus * : * * . * : * * : * * * . * * * * . * : * * *
Site A Site d Site B Site c
3. Templates for fH SCRs 6-73. Templates for fH SCRs 6-7 hfH SCRs 15&16 (fH1516; PDB ID: 1HFH) Vaccinia virus complement control protein
domains 3&4 (vcp34; PDB ID: 1VVC)
Orientations differ considerably Vcp34 28% identical to hfH67 compared to
hfH1516 (25%) !
hfH15 hfH16
vcp3vc
p4
hfH15 hfH16
vcp3
vcp4
3. Query-Templates alignment3. Query-Templates alignment
3. hfH67 model3. hfH67 model
hfH67
hfH1516
Sialic acid
Heparin
disaccharide repeat
3. Locating residues for 3. Locating residues for mutation from modelmutation from model
Arg-404Arg-387
Lys-405 His
-402
Lys-410
Lys-388SCRs 6&7 SCRs 15&16
Pacific Symposium of Biocomputing 2000, 5:155Pacific Symposium of Biocomputing 2000, 5:155
Ex 4: Protein EngineeringEx 4: Protein Engineering Thermolysin-like
protease unstable at high temperatures (> 40 ºC unlike trypsin)
Homology Model built G8 & N60 suited for
disulfide bond Double Mutant
functional at 92.5 ºC
J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156.
Ex 5. Multiple chains: Human Ex 5. Multiple chains: Human Hand, Foot & Mouth Disease Hand, Foot & Mouth Disease Virus capsidVirus capsid 2000 outbreak of
HFMD in Singapore: thousands of children affected – 4 deaths (The Lancet, 2000, 356, 1338)
Major etiological agent: EV71 (enterovirus group)
Neurological complications
5. EV71 genome structure5. EV71 genome structureCapsid Replication
VP4 VP2 VP3 VP1 2A 2B 2C 3A,B 3C 3D
5’ AUG 3’ UAGVPg
PolyA
EV71 specific primersPan-enterovirus primers
95/94% (RNA) homology coxsackievirus A16/B3 Only 1% difference between neurovirulent and
non-neuro virulent isolates Most variations in non-capsid regions Within capsid regions, VP1 shows maximum
variability relative to other Evs Differences in capsid region 1: VP1 & VP2, 2: VP3
5. Picornaviridae5. PicornaviridaeIcosahedralCapsid
5. Template hunt5. Template hunt
BLASTP against PDB sequences VP1: 3 templates
1BEV 38.7% (bovine enterovirus) 1EAH 36.5% (poliovirus type 2 strain Lansing) 1FPN 38.0% (human rhinovirus serotype 2)
VP2: 1BEV 56.7% VP3: 1BEV 54.9% VP4: 1BEV* 50.0%
5. Fixing the VP1 Alignment5. Fixing the VP1 Alignment
Structural alignment of templates: using VAST (Gibrat, Madej, & Bryant, 1996)
Extract corresponding sequence alignment
Match HFMDV VP1 to aligned templates using profile alignment in CLUSTALW
5. VP1 alignment to templates5. VP1 alignment to templates
VP11BEV1 14 Q A A G A L V A G T S T S T H S V A T D S T P A L Q A A E T G A T S T A R D E S M I E T R T I V P T H G I H E T S V E S F F G R S S L V G M1EAH1 24 - - - A N N L P D T Q S S G P A H S - K E T P A L T A V E T G A T N P L V P S D T V Q T R H V I Q K R T R S E S T V E S F F A R G A C V A I1FPN1 15 - - - - L V V P N I N S S N P T T S - N S A P A L D A A E T G H T S S V Q P E D V I E T R Y V Q T S Q T R D E M S L E S F L G R S G C I H EEV711 23 A L P A P T G Q N T Q V S S H R L D T G E V P A L Q A A E I G A S S N T S D E S M I E T R C V L N S H S T A E T T L D S F F S R A G L V G E
. . * . . * * * * . * * : . . . : : * * : . : * : : : * * : . * . . :
1BEV1 84 P L L A T - - - - - - G T S I T H W R I D F R E F V Q L R A K M S W F T Y M R F D V E F T I I A T S S - T G Q N V T T E Q H T T Y Q V M Y V1EAH1 90 I E V D N D - - - - - S K L F S V W K I T Y K D T V Q L R R K L E F F T Y S R F D M E F T F V V T S N Y T D A N N G H A L N Q V Y Q I M Y I1FPN1 80 S K L E V T L A N Y N K E N F T V W A I N L Q E M A Q I R R K F E L F T Y T R F D S E I T L V P C I S A L - - - S Q D I G H I T M Q Y M Y VEV711 93 I D L P L E - G T T N P N G Y A N W D I D I T G Y A Q M R R K V E L F T Y M R F D A E F T F V A C T P - - - - - T G E V V P Q L L Q Y M F V
: : * * . * : * * . . * * * * * * * : * : : * * : :
1BEV1 147 P P G A P V P S N Q D S F Q W Q S G C N P S V F A D T D G P P A Q F S V P F M S S A N A Y S T V Y D G Y A R F M - - - D T - - - D P D R Y G1EAH1 161 P P G A P I P G K W N D Y T W Q T S S N P S V F Y T Y G A P P A R I S V P Y V G I A N A Y S H F Y D G F A K V P L A G Q A S T E G D S L Y G1FPN1 147 P P G A P V P N S R D D Y A W Q S G T N A S V F W Q H G Q A Y P R F S L P F L S V A S A Y Y M F Y D G Y D E - - - - - - - - - - Q D Q N Y GEV711 157 P P G A P K P E S R E S L A W Q T A T N P S V F V K L T D P P A Q V S V P F M S P A S A Y Q W F Y D G Y P T F G - - - E H K Q E K D L E Y G
* * * * * * . : . * * : . * . * * * . . : . * : * : : . * . * * . * * * : * *
1BEV1 211 I L P S N F L G F M Y F R T L E D - - - A A H Q V R F R I Y A K I K H T S C W I P R A P R Q A P Y K K R Y N L V F S - - G - D S D R I C S N1EAH1 231 A A S L N D F G S L A V R V V N D H N P T K L T S K I R V Y M K P K H V R V W C P R P P R A V P Y Y G P - G V D Y K - - D - G L A P - L P G1FPN1 207 T A N T N N M G S L C S R I V T E K H I H K V H I M T R I Y H K A K H V K A W C P R P P R A L E Y T R A H R T N F K I E D R S I Q T A I V TEV711 224 A C P N N M M G T F S V R T V G S S - K S K Y P L V V R I Y M R M K H V R A W I P R P M R N Q N Y L F K A N P N Y A - - G N S I K P T G T S
* : * : * : . * : * : * * . * * * . * * : . .
1BEV1 275 R A S L T S Y1EAH1 296 K - G L T T Y1FPN1 277 R P I I T T AEV711 291 R T A I T T -
: : * :
281301283296
strandsheliceshelices
Pocket-factor binding residues
5. Model building steps5. Model building steps Build all 4 capsid proteins (VP1-VP4)
together to ensure 3D fit Use 1BEV alone for VP2-VP4 For VP1: use aligned 1BEV, 1EAH,
1FPN Check model
5. Round 1: 5. Round 1: VP1VP1,,VP2VP2,,VP3VP3,,VP4VP4
Clip hanging ends
Re-position problem loops:
adjust gaps in alignment
Build again
5. Round 2: Pentamer Check5. Round 2: Pentamer Check
Loops look OK Build pentamer Publish…. Oops: clash in
pentamer assembly. Go back
5. Close encounters of the 35. Close encounters of the 3rdrd Kind Kind
Build only VP3 pentamer
N-terminus of each VP3 hydrogen-bonded
Also, in BEV, Asp-Lys ion pair
First 25 aa overlay v. well
5. Fourth foray: build with 5 VP3s5. Fourth foray: build with 5 VP3s
Only first 50aa of the other 4 VP3s included
Model resulted in knots due to insufficient refinement cycles
However, VP3 pentameric region OK
5. Fifth 5. Fifth and final and final attemptattempt
5. Canyon Pit and Antigenic Sites5. Canyon Pit and Antigenic Sites
Cardiovirus
Neurovirulent Polio (mouse)
Poliovirussites
5. Putative antigenic sites5. Putative antigenic sites
VP2VP1
5. HFMDV Conclusions 5. HFMDV Conclusions Unique surface loops identified for
Immunodiagnostic assays Vaccine design Antibodies being generated
Canyon pit: depth is similar to BEV Mapping the antigenic regions of other related enteroviruses
on the HFMDV surface: specific VP1 and VP2 sites buried Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C. Phoon:
Dept. of Microbiology, NUS Applied Bioinformatics, Vol 1, issue 1, 43-52: invited research
article