Upload
danim123
View
214
Download
0
Embed Size (px)
Citation preview
7/27/2019 Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen
1/2
Vol. 24 no. 11 2008, pages 13971398
BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn128
Sequence analysis
Accurate approximation method for prediction of class I MHC
affinities for peptides of length 8, 10 and 11 using prediction
tools trained on 9mersClaus Lundegaard*, Ole Lund and Morten NielsenCenter for Biological Sequence Analysis CBS, Department of Systems Biology, The Technical University of
Denmark DTU, Kemitorvet Build. 208, 2800 Lyngby, Denmark
Received on February 8, 2008; revised and accepted on April 4, 2008
Advance Access publication April 14, 2008
Associate Editor: Burkhard Rost
ABSTRACT
Summary: Several accurate prediction systems have been devel-
oped for prediction of class I major histocompatibility complex
(MHC):peptide binding. Most of these are trained on binding affinity
data of primarily 9mer peptides. Here, we show how prediction
methods trained on 9mer data can be used for accurate binding
affinity prediction of peptides of length 8, 10 and 11. The method
gives the opportunity to predict peptides with a different length than
nine for MHC alleles where no such peptides have been measured.
As validation, the performance of this approach is compared to
predictors trained on peptides of the peptide length in question. In
this validation, the approximation method has an accuracy that is
comparable to or better than methods trained on a peptide length
identical to the predicted peptides.
Availablility: The algorithm has been implemented in the web-
accessible servers NetMHC-3.0: http://www.cbs.dtu.dk/services/
NetMHC-3.0, and NetMHCpan-1.1: http://www.cbs.dtu.dk/services/
NetMHCpan-1.1
Contact: [email protected]
Supplementary information: Supplementary data are available at
Bioinformatics online
1 INTRODUCTION
Determination of peptide binding to MHC class I is an
important step in cytotoxic T cell lymphocyte (CTL) epitope
discovery methods for class I MHC peptide binding. These
methods have become increasingly accurate (Lundegaard et al.,
2007; Moutaftsi et al., 2006; Peters et al., 2006), limiting the
effort significantly. Most MHCs, however, prefer peptides of
the length 9, making available binding data of 9mer peptides
significantly more abundant than data for other lengths such as8, 10 and 11mers as these more rarely binds to the MHCs. Since
the amount of available data is crucial for the developing of
accurate predictions (Yu et al., 2002), the number of accurate
predictors of these other lengths is limited. Ligand motifs have
been elucidated for several MHCs and groups of MHCs (Lund
et al., 2004; Rammensee et al., 1999; Sette and Sidney, 1998).
According to these motifs, the most important (anchor) residues
are in general the positions 2, 3 and the C-terminal, disregarding
the peptide length. However, for a limited number of non-
human MHCs other peptide positions might be the primary
anchors (see Supplementary Material). Using this knowledge,
we exploited the possibility of generating pseudo 9mers frompeptides of other length by fixating these positions and inserting
or deleting residues at other positions. This resulted in a simple
though remarkably accurate method to overcome the length
problem using affinity predictions by 9mer predictors of such
pseudo 9mers. This method will in principle work with any type
of existing MHC 9mer binding prediction methods.
2 METHODS
2.1 9mer predictions
Here, we used predictions generated by NetMHC-3.0 (http://www.
cbs.dtu.dk/services/NetMHC-3.0) (Buus et al., 2003; Nielsen et al.,
2003). However, any 9mer:MHC binding prediction algorithm thataccepts unknown amino acids (i.e. X) can be used.
2.2 Prediction of 8mer affinities
In 8mer peptides (e.g. EIGHTMER) an X is inserted repeatedly
at either position 4, 5, 6, 7 or 8, resulting in five new pseudo
peptides of length 9; EIGXHTMER, EIGHXTMER, EIGHTXMER,
EIGHTMXER and EIGHTMEXR (Fig. 1A). The final predicted
affinity is calculated as the geometrical mean of the five predicted
affinities in nano Molar units.
2.3 Prediction of 10 and 11mer affinities
The longermers (e.g. TENELEVENS) are converted into 9mers by
deleting 1 (10mers) or 2 (11mers) residues at positions 4, 5, 6, 7, 8 or 9,
resulting in six new pseudo-peptides; TENLEVENS, TENEEVENS,TENELVENS, TENELEENS, TENELEVNS and TENELEVES
(Fig. 1B). The final predicted affinity is calculated as the geometrical
mean of the six predicted affinities in nano Molar units (Fig. 1C).
2.4 Evaluation
The method was evaluated using peptide IC50 and Kd affinity data
extracted from the web site of the Immune Epitope Database
and Analysis resource (IEDB) (Sette et al., 2005). This resulted
in the 8mer, 9mer and 10mer evaluation data available at*To whom correspondence should be addressed.
The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1397
http://www.cbs.dtu.dk/services/http://www.cbs.dtu.dk/services/http://www/http://www/http://www.cbs.dtu.dk/services/http://www.cbs.dtu.dk/services/7/27/2019 Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen
2/2
http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xls, http://
www.cbs.dtu.dk/services/NetMHC-3.0/evalset_10mers_all.xls and http://
www.cbs.dtu.dk/services/NetMHC-3.0/evalset_11mers.xls, respectively.
8mer data: 1975 measurements distributed on 35 MHC alleles. 10mer
data: 13 507 measurements distributed on 31 MHC alleles. 11mer data:
181 measurements, distributed on 25 MHC alleles. We evaluated the
accuracy by Pearson correlation coefficients (PCC), and area under
receiver operating characteristic (ROC) curves (AUC) using a binding
cutoff of 500 nM.
3 RESULTS
We predicted affinities for 8mer, 10mer and 11mer data using
the approximation method described in Methods. The overall
PCCs for all predictions within each dataset were 0.69, 0.73 and
0.74, respectively. The overall AUCs were 0.86, 0.87 and 0.89,
respectively. To calculate an AUC value, we needed both
negative (IC50 or Kd500 nM) and positive (IC50 or Kd 5
500 nM) affinity data, thus we removed alleles having only
binders or non-binders from the evaluation. This lead to 27
compared alleles on 8mer peptides. The resulting PCC values
had a mean of 0.72. Acceptable AUC values (above 0.7) were
obtained for 25 of the 27 covered alleles (Table 1). To evaluate
the 10mer approximation, we calculated PCC and AUC values
for 27 alleles. Using a 500 nM threshold, 26 of 27 alleles had
AUCs above 0.7 (Table 1).
To compare the approximation method with specifically
trained methods, we used artificial neural networks (ANNs)
previously trained as described in (Nielsen et al., 2003) on
10mer data. For 10mers, 2037 new data points covering 16
alleles had become available since training of 10mer specific
ANNs, available at http://www.cbs.dtu.dk/services/NetMHC-
3.0/evalset_10mers.xls. AUC values were calculated for eachallele using either ANNs trained on 10mers or the approxima-
tion method described here (Supplementary Fig. 1). For 12 of
the 16 alleles the approximation method performed better than
the 10mer trained ANNs (P50.01).
For the currently small number of alleles for which the
primary peptide anchor position(s) are in positions 48 the
approximation method will not work well. Examples of such
alleles and how to identify these are described in the
Supplementary Material.
Conflict of Interest: none declared.
REFERENCES
Buus,S. et al. (2003) Sensitive quantitative predictions of peptide-MHC binding
by a Query by Committee artificial neural network approach. Tissue
Antigens, 62, 378384.
Lund,O. et al. (2004) Definition of supertypes for HLA molecules using clustering
of specificity matrices. Immunogenetics, 55, 797810.
Lundegaard,C. et al. (2007) Modeling the adaptive immune system: predictions
and simulations. Bioinformatics, 23, 32653275.
Moutaftsi,M. et al. (2006) A consensus epitope prediction approach identifies the
breadth of murine T(CD8)-cell responses to vaccinia virus. Nat. Biotechnol.,
24, 817819.
Nielsen,M. et al. (2003) Reliable prediction of T-cell epitopes using neural
networks with novel sequence representations. Protein Sci., 12, 10071017.
Peters,B. et al. (2006) A community resource benchmarking predictions of peptide
binding to MHC-I molecules. PLoS Comput. Biol., 2, e65.
Rammensee,H. et al. (1999) SYFPEITHI: database for MHC ligands and peptide
motifs. Immunogenetics, 50, 213219.
Sette,A. et al. (2005) A roadmap for the immunomics of category A-C pathogens.
Immunity, 22, 155161.
Sette,A. and Sidney,J. (1998) HLA supertypes and supermotifs: a functional
perspective on HLA polymorphism. Curr. Opin. Immunol., 10, 478482.
Yu,K. et al. (2002) Methods for prediction of peptide binding to MHC molecules:
a comparative study. Mol. Med., 8, 137148.
Table 1. PCC and AUC values of predicted 8mer and 10mer affinities
using the approximation method
The total number of 8mer and 10mer data, is given as #8mer data and #10mer
data, respectively. The number of binding peptides ( Kd5500nM) in the 8mer and
10mer datasets are given as #8mer binders and #10mer binders, respectively. NA:
Value not calculated due to lack of data.
Fig. 1. (A) Illustration of one 8mer to five 9mer conversion.
(B) Illustration of one 10mer to six 9mer conversion. (C) Calculation
of six pseudo 9mer predictions back to a 10mer prediction.
1398
C.Lundegaard et al.
http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xlshttp://www.cbs.dtu.dk/services/NetMHC-http://www.cbs.dtu.dk/services/NetMHC-http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xls