Upload
vulien
View
214
Download
0
Embed Size (px)
Citation preview
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Technical University of Denmark - DTUDepartment of systems biology
B CELL EPITOPES AND PREDICTIONS
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
OUTLINE
• What is a B-cell epitope?
• How can you predict B-cell epitopes?
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
WHAT IS A B-CELL EPITOPE?
Antibody Fabfragment
• B-cell epitopes:
• Accessible structural feature of a pathogen molecule.
• Antibodies are developed to bind the epitope specifically using the complementary determining regions (CDRs).
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
THE BINDING INTERACTIONS
• Salt bridges
• Hydrogen bonds
• Hydrophobic interactions
• Van der Waals forces
Binding strength
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
B-CELL EPITOPE CLASSIFICATION
Linear epitopesOne segment of the amino acid
chainDiscontinuous epitope (with
linear determinant)
Discontinuous epitopeSeveral small segments
brought into proximity by the protein fold
B-cell epitope: structural feature of a molecule or pathogen, accessible and recognizable by B-cell receptors
and antibodies
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
BINDING OF A DISCONTINUOUS EPITOPE
Antibody FAB fragment complexed with Guinea Fowl Lysozyme (1FBI).
Black: Light chain, Blue: Heavy chain, Yellow: Residues with atoms distanced < 5Å from FAB antibody
fragments.
Guinea Fowl Lysozyme KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNSQNRNTDGS
DYGVLNSRWWCNDGRTPGSRNLCNIPCSALQSSDITATANCAKKIVSDG
GMNAWVAWRKCKGTDVRVWIKGCRL
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
B-CELL EPITOPE ANNOTATION• Linear epitopes:
• Chop sequence into small pieces and measure binding to antibody
• Discontinuous epitopes:
• Measure binding of whole protein to antibody
• The best annotation method : X-ray crystal structure of the antibody-epitope complex
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
B-CELL EPITOPE DATA BASES
• Databases:
• IEDB, Los Alamos HIV database, Protein Data Bank, AntiJen, BciPep
• Large amount of data available for linear epitopes
• Few data available for discontinuous
Friday, 11 June 2010
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS
Technical University of Denmark - DTUDepartment of systems biology
B CELL EPITOPE PREDICTION
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
SEQUENCE-BASED METHODS FOR PREDICTION OF LINEAR
EPITOPES• Protein hydrophobicity – hydrophilicity algorithms
• Parker, Fauchere, Janin, Kyte and Doolittle, Manavalan• Sweet and Eisenberg, Goldman, Engelman and Steitz (GES), von Heijne
• Protein flexibility prediction algorithm • Karplus and Schulz
• Protein secondary structure prediction algorithms • PsiPred (D. Jones)
• Protein “antigenicity” prediction :• Hopp and Woods, WellingTSQDLSVFPLASCCKDNIASTSVTLGCLVTGYLPMSTTVTWDTGSLNKNVTTFPTTFHETYGLHSIVSQVTASGKWAKQRFTCSVAHAESTAINKTFSACALNFIPPTVKLFHSSCNPVGDTHTTIQLLCLISGYVPGDMEVIWLVDGQKATNIFPYTAPGTKEGNVTSTHSELNITQGEWVSQKTYTCQVTYQGFTFKDEARKCSESDPRGVTSYLSPPSPL
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
PROPENSITY SCALES: THE PRINCIPLE
• The Parker hydrophilicity scale
• Derived from experimental data
D 2.46E 1.86N 1.64S 1.50Q 1.37G 1.28K 1.26T 1.15R 0.87P 0.30H 0.30C 0.11A 0.03Y -0.78V -1.27M -1.41I -2.45 F -2.78L -2.87W -3.00
Hydrophilicity
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
(-2.78 + -1.27 + 2.46 +1.86 + 1.26 + 0.87 + 0.3)/7 = 0.39
Prediction scores:
0.39 0.1 0.6 0.9 1.0 1.2 2.6 1.0 0.9 0.5 -0.5
Epitope
….LISTFVDEKRPGSDIVEDLILKDENKTTVI….
PROPENSITY SCALES: THE PRINCIPLE
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
TURN PREDICTION AND B-CELL EPITOPES
• Pellequer found that 50% of the epitopes in a data set of 11 proteins were located in turns
Turn propensity scales for each position in the turn were used for epitope prediction.
Pellequer et al.,Immunology letters, 1993
1
2
3
4
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
BLYTHE AND FLOWER 2005
• Extensive evaluation of propensity scales for epitope prediction
• Conclusion:
–Basically all the classical scales perform close to random!
–Other methods must be used for epitope prediction
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
BEPIPRED• Parker hydrophilicity scale
• PSSM
• PSSM based on linear epitopes extracted from the AntiJen database
• Combination of the Parker prediction scores and PSSM leads to prediction score
• Tested on the Pellequer dataset and epitopes in the HIV Los Alamos database
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
PSSM
! A ! C D ………… G S T V W
Pos 1 7.28
Pos 2 9.39
Pos 3 0.3
Pos 4 5.2
Pos 5 7.9
….LISTFVDEKRPGSDIVEDLILKDENKTTVI….
2.46+1.86+1.26+0.87+0.3 = 6.75 Prediction value
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
ROC EVALUATION
Evaluation on HIV Los
Alamos data set
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
BEPIPRED PERFORMANCE• Pellequer data set:
–Levitt AROC = 0.66
–Parker AROC = 0.65
–BepiPred AROC = 0.68
• HIV Los Alamos data set
–Levitt AROC = 0.57
–Parker AROC = 0.59
–BepiPred AROC = 0.60
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
BEPIPRED• BepiPred conclusion:
• On both of the evaluation data sets, Bepipred was shown to perform better
• Still the AROC value is low compared to T-cell epitope prediction tools!
• Bepipred is available as a webserver :
• www.cbs.dtu.dk/services/BepiPred
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
PREDICTION OF LINEAR EPITOPES
Con• only ~10% of epitopes can
be classified as “linear” • weakly immunogenic in
most cases• most epitope peptides do
not provide antigen-neutralizing immunity
• in many cases represent hypervariable regions
Pro• easily predicted
computationally • easily identified experimentally• immunodominant epitopes in
many cases • do not need 3D structural
information• easy to produce and check
binding activity experimentally
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
SEQUENCE BASED PREDICTION METHODS
• Linear methods for prediction of B cell epitopes have low performances
• The problem is analogous to the problems of representing the surface of the earth on a two-dimensional map
• Reduction of the dimensions leads to distortions of scales, directions, distances
• The world of B-cell epitopes is 3 dimensional and therefore more sophisticated methods must be developed
Regenmortel 1996,Meth. of Enzym. 9.
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
SO WHAT IS MORE SOPHISTICATED?
• Use of the three dimensional structure of the pathogen protein
• Analyze the structure to find surface exposed regions
• Additional use of information about conformational changes, glycosylation and trans-membrane helices
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
SOURCES OF THREE-DIMENSIONAL STRUCTURES• Experimental determination
• X-ray crystallography • NMR spectroscopy
• Both methods are time consuming and not easily done in a larger scale
• Structure prediction
• Homology modeling• Fold recognition
• Less time consuming, but there is a possibility of incorrect predictions, specially in loop regions
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
PROTEIN STRUCTURE PREDICTION METHODS
• Homology/comparative modeling >25% sequence identity (seq 2 seq alignment)
• Fold-recognition <25% sequence identity (Psi-blast search/ PSSM 2 seq
alignment)
• Ab initio structure prediction 0% sequence identity
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
WHAT DOES ANTIBODIES RECOGNIZE IN A PROTEIN?
A: Everything accessible to a 10 Å probe on a protein surfaceNovotny J. A static accessibility model of protein antigenicity.
Int Rev Immunol 1987 Jul;2(4):379-89
probe
Antibody Fabfragment
Protrusion index
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
THE CEP SERVER
• Conformational epitope server
http://202.41.70.74:8080/cgi-bin/cep.pl
• Uses protein structure as input
• Finds stretches in sequences which are surface exposed
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
THE DISCOTOPE SERVER
• CBS server for prediction of discontinuous epitopes
• Uses protein structure as input
• Combines propensity scale values of amino acids in discontinuous epitopes with surface exposure
• http://www.cbs.dtu.dk/services/DiscoTope
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
DISCOTOPE
• Prediction of residues in discontinuous B cell epitopes using protein 3D structures
Pernille Haste Andersen, Morten Nielsen and Ole Lund, Protein Science 2006
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
• Structures of antibodies/antigen protein complexes in the Protein DataBank
• Dr. Andrew Martin’s SACS database (available at http://www.bioinf.org.uk/abs/sacs) was used to get an overview of PDB entries
• Epitopes in the data set were identified by finding residues within 4Å from heavy or light chains in the Abs
• We used homology grouping and cross-validation for the training and testing of the method to avoid biasing towards specific antigens
• The 5 sets used for cross-validated training/testing are available at:http://www.cbs.dtu.dk/suppl/immunology/DiscoTope.php
A DATA SET OF DISCONTINUOUS B CELL
EPITOPES
An example: The epitope of the outer surface protein A from Borrelia
Burgdorferi (1OSP)
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
AMINO ACIDS IN EPITOPES
Amino Acid G A V L I M P F W S
e/E 0.09 0.07 0.05 0.08 0.04 0.02 0.06 0.03 0.01 0.08
. 0.07 0.08 0.07 0.10 0.06 0.03 0.05 0.05 0.02 0.07
Amino acid C T Q N H Y E D K R
e/E 0.03 0.08 0.04 0.04 0.02 0.04 0.06 0.07 0.07 0.04
. 0.03 0.06 0.04 0.05 0.02 0.03 0.04 0.04 0.05 0.04
Guillermo Carbajosa]
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
LOG-ODDS RATIOS OF AMINO ACIDS IN DISCONTINUOUS EPITOPES
Frequencies of amino acids in epitope residues compared to frequencies of non-epitope residues
Several discrepancies compared to the Parker hydrophilicity scale
Predictive performance (AUC) of B cell epitopes:Parker 0.614Epitope log–odds 0.634
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
DiscoTope prediction value
DISCOTOPE: A PREDICTION METHOD USING 3D STRUCTURES
A combination method:
• Addition of epitope log-odds values of residues in spatial proximity
• Contact numbers
.LIST..FVDEKRPGSDIVED……ALILKDENKTTVI.
-0.145
+0.346+1.136
Contact number : 7Sum of log-odds values
w
+0.691+0.346+1.136+1.180+1.164
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
• Receiver Operator Characteristics (ROC) curves were used for performance measures
• The reported performance is an average of the AUC values of the non-homologous groups of antigens:
– Parker 0.614 Seq.-based
– Epitope log–odds 0.634 Seq.-based
–Contact numbers 0.647 Str.-based
•Naccess 0.673 Str.-based
–DiscoTope 0.711 Seq./Str.-based
DISCOTOPE : PREDICTION OF DISCONTINUOUS EPITOPES
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
EVALUATION EXAMPLE AMA1• Apical membrane antigen 1 from
Plasmodium falciparum (not used for training/testing)
• Two epitopes were identified using phage-display, sequence variance analysis and point-mutation
(green backbone)
• Most residues identified as epitopes were successfully predicted by DiscoTope
(black side chains)
DiscoTope is available as webserver:http://www.cbs.dtu.dk/services/DiscoTope/
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
Vol. 24 no. 12 2008, pages 1459–1460BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn199
Structural bioinformatics
PEPITO: improved discontinuous B-cell epitope prediction usingmultiple distance thresholds and half sphere exposureMichael J. Sweredoski1,2 and Pierre Baldi1,2,*1Department of Computer Science and 2Institute for Genomics and Bioinformatics, University of California, Irvine,92697-3435, California, USA
Received on March 3, 2008; revised on April 18, 2008; accepted on April 20, 2008
Advance Access publication April 28, 2008
Associate Editor: Anna Tramontano
ABSTRACT
Motivation: Accurate prediction of B-cell epitopes is an important
goal of computational immunology. Up to 90% of B-cell epitopes are
discontinuous in nature, yet most predictors focus on linear
epitopes. Even when the tertiary structure of the antigen is available,
the accurate prediction of B-cell epitopes remains challenging.
Results: Our predictor, PEPITO, uses a combination of amino-acid
propensity scores and half sphere exposure values at multiple
distances to achieve state-of-the-art performance. PEPITO achieves
an area under the curve (AUC) of 75.4 on the Discotope dataset.
Additionally, we benchmark PEPITO as well as the Discotope
predictor on the more recent Epitome dataset, achieving AUCs of
68.3 and 66.0, respectively.
Availability: PEPITO is available as part of the SCRATCH suite of
protein structure predictors via www.igb.uci.edu.
Contact: [email protected]
Supplementary information: Supplementary data are available at
Bioinformatics online.
1 INTRODUCTION
B-cell epitope prediction is an important, but unsolved problem inbioinformatics. The ability to accurately predict B-cell epitopeswould aid researchers in a variety of immunological applications.Initial attempts at predicting B-cell epitopes involved the
calculation of propensity scales (Hopp and Woods, 1981).While this information can be useful in predicting B-cellepitopes, Blythe and Flower (2005) showed that propensityscales alone are not enough to accurately predict epitopes.Many of the previous predictors have focused on linear B-cell
epitopes. Some of these methods include ABCpred (Saha andRaghava, 2006), BEPITOPE (Odorico and Pellequer, 2003),Bepipred (Larsen et al., 2006) and PEOPLE (Alix, 1999).However, past surveys have estimated that only 10% of theB-cell epitopes are continuous (van Regenmortel, 1996).Additionally, van Regenmortel (2006) noted that even linearepitopes adopt a conformational structure and therefore thedistinction is somewhat blurred. Far fewer predictors have beendeveloped for discontinuous B-cell epitopes. One of the firstmethods explicitly created for identification of discontinuousepitopes was conformational epitope predictor (CEP)
(Kulkarni-Kale et al., 2005). Another method described byRapberger et al. (2007) incorporates epitope–paratope shapecomplementarity to predict interaction sites. One of the mostrecent, state-of-the-art, predictors of discontinuous epitopes isDiscotope (Andersen et al., 2006), which uses both contactnumbers (i.e. the number of C! atoms within a certain distancethreshold) and an amino-acid propensity scale.Our predictor, PEPITO, attempts to overcome some of the
limitations of previous predictors by incorporating an amino-acid propensity scale along with side chain orientation andsolvent accessibility information using half sphere exposurevalues (Hamelryck, 2005). To increase robustness, PEPITOuses propensity scales and half sphere exposure values atmultiple distance thresholds from the target residue.
2 METHODS
2.1 DatasetsWe obtained epitope datasets for benchmarking prediction methodsfrom both the Discotope Supplementary Materials (Andersen et al.,2006) and Epitome (Schlessinger et al., 2006). The two datasets containdifferent sets of protein chains and differ in their epitope/non-epitopeclassification rules. The Discotope dataset, which consists of 75 proteinchains, labels all residues in antigen chains within 4 A of an antibody asepitopes. The Epitome dataset, which consists of 140 protein chains,seeks to eliminate incidental contacts by labeling residues in the antigenwithin 6 A of the complementary determining regions of the antibodychains as epitopes.
We derived two additional datasets, C[Discotope] and C[Epitome],from the set of protein chains that are common to both the Epitomeand Discotope datasets. The two datasets differ in the method used toidentify epitope residues. Eight hundred and seventy-five of the residuesin the derived datasets are defined as epitopes using both methods. Fourhundred and seventy-one of the residues in the derived datasets aredefined as epitopes using the Epitome method but not the Discotopemethod. One hundred and nine of the residues in the derived datasetsare defined as epitopes using the Discotope method but not the Epitomemethod. The assertions by Schlessinger et al. (2006) would indicate thatthe 471 residues are integral to the antigen–antibody binding while the109 residues result from incidental contacts.
Testing procedures require that the protein chains present in thedatasets be clustered to prevent any one family from dominating theperformance measures. Protein families were previously annotated forthe Discotope dataset. UniqueProt (Mika and Rost, 2003) was used toidentify protein families in the Epitome dataset and the two deriveddatasets.*To whom correspondence should be addressed.
! The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1459
BioMed Central
!"#$%&%'(%)!"#$%&'()*%+&',-&.,+&/0-#-0,'&"(+",1%12
BMC Bioinformatics
Open AccessSoftwareElliPro: a new structure-based tool for the prediction of antibody epitopesJulia Ponomarenko*1,2, Huynh-Hoa Bui3, Wei Li, Nicholas Fusseder, Philip E Bourne1,2, Alessandro Sette4 and Bjoern Peters4
Address: 1San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA, 2Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA, 3Isis Pharmaceuticals, Inc., 1896 Rutherford Road, Carlsbad, California 92008, USA and 4La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, California 92037, USA
Email: Julia Ponomarenko* - [email protected]; Huynh-Hoa Bui - [email protected]; Wei Li - [email protected]; Nicholas Fusseder - [email protected]; Philip E Bourne - [email protected]; Alessandro Sette - [email protected]; Bjoern Peters - [email protected]* Corresponding author
AbstractBackground: Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highlydesirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity,solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton andcolleagues proposed a method for identifying continuous epitopes in the protein regions protrudingfrom the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures ofantibody-protein complexes.
Results: Here we present ElliPro, a web-tool that implements Thornton's method and, togetherwith a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows theprediction and visualization of antibody epitopes in a given protein sequence or structure. ElliProhas been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures ofantibody-protein complexes. In comparison with six other structure-based methods that can beused for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when themost significant prediction was considered for each protein. Since the rank of the best predictionwas at most in the top three for more than 70% of proteins and never exceeded five, ElliPro isconsidered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro isavailable at http://tools.immuneepitope.org/tools/ElliPro.
Conclusion: The results from ElliPro suggest that further research on antibody epitopesconsidering more features that discriminate epitopes from non-epitopes may further improvepredictions. As ElliPro is based on the geometrical properties of protein structure and does notrequire training, it might be more generally applied for predicting different types of protein-proteininteractions.
Published: 2 December 2008
BMC Bioinformatics 2008, 9:514 doi:10.1186/1471-2105-9-514
Received: 24 September 2008Accepted: 2 December 2008
This article is available from: http://www.biomedcentral.com/1471-2105/9/514
© 2008 Ponomarenko et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
RECENT DEVELOPMENTS
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
SECONDARY STRUCTURE IN EPITOPES
Sec struct: H T B E S G I .
Log odds ratio -0.19 0.30 0.21 -0.27 0.24 -0.04 0.00 0.17
H: Alpha-helix (hydrogen bond from residue i to residue i+4)
G: 310-helix (hydrogen bond from residue i to residue i+3)I: Pi helix (hydrogen bond from residue i to residue i+5)
E: Extended strandB: Beta bridge (one residue short strand)
S: Bend (five-residue bend centered at residue i) T: H-bonded turn (3-turn, 4-turn or 5-turn)
. : Coil
Guillermo Carbajosa]
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
RATIONAL VACCINE DESIGN
>PATHOGEN PROTEINKVFGRCELAAAMKRHGLDNYRGY
SLGNWVCAAKFESNF
Rational Vaccine Design
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
RATIONAL B-CELL EPITOPE DESIGN
• Protein target choice
• Structural analysis of antigen
Known structure or homology modelPrecise domain structurePhysical annotation (flexibility, electrostatics, hydrophobicity)Functional annotation (sequence variations, active sites, binding sites, glycosylation sites, etc.)
Known 3D structure
Model
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
• Protein target choice
• Structural annotation
• Epitope prediction and ranking
RATIONAL B-CELL EPITOPE DESIGN
Surface accessibilityProtrusion indexConserved sequenceGlycosylation status
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
RATIONAL B-CELL EPITOPE DESIGN
• Protein target choice
• Structural annotation
• Epitope prediction and ranking
• Optimal Epitope presentation
Fold minimization, orDesign of structural mimics Choice of carrier (conjugates, DNA plasmids, virus like particles)Multiple chain protein engineering
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
MULTI-EPITOPE PROTEIN DESIGN
Rational optimization of epitope-VLP chimeric proteins:
Design a library of possible linkers (<10 aa)
Perform global energy optimization in VLP (virus-like particle) context
Rank according to estimated energy strain
B-cellepitope
T-cellepitope
Friday, 11 June 2010
Technical University of Denmark - DTUDepartment of systems biology
CE
NT
ER
FOR
BIO
LOG
ICA
L SE
QU
EN
CE
AN
ALY
SIS
ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial
CONCLUSIONS
• Rational vaccines can be designed to induce strong and epitope-specific B-cell responses
• Selection of protective B-cell epitopes involves structural, functional and immunogenic analysis of the pathogenic proteins
• When you can: Use protein structure for prediction
• Structural modeling tools are helpful in prediction of epitopes, design of epitope mimics and optimal epitope presentation
Friday, 11 June 2010