Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen 2008

7/27/2019 Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen

1/2

Vol. 24 no. 11 2008, pages 13971398

BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn128

Sequence analysis

Accurate approximation method for prediction of class I MHC

affinities for peptides of length 8, 10 and 11 using prediction

tools trained on 9mersClaus Lundegaard*, Ole Lund and Morten NielsenCenter for Biological Sequence Analysis CBS, Department of Systems Biology, The Technical University of

Denmark DTU, Kemitorvet Build. 208, 2800 Lyngby, Denmark

Received on February 8, 2008; revised and accepted on April 4, 2008

Advance Access publication April 14, 2008

Associate Editor: Burkhard Rost

ABSTRACT

Summary: Several accurate prediction systems have been devel-

oped for prediction of class I major histocompatibility complex

(MHC):peptide binding. Most of these are trained on binding affinity

data of primarily 9mer peptides. Here, we show how prediction

methods trained on 9mer data can be used for accurate binding

affinity prediction of peptides of length 8, 10 and 11. The method

gives the opportunity to predict peptides with a different length than

nine for MHC alleles where no such peptides have been measured.

As validation, the performance of this approach is compared to

predictors trained on peptides of the peptide length in question. In

this validation, the approximation method has an accuracy that is

comparable to or better than methods trained on a peptide length

identical to the predicted peptides.

Availablility: The algorithm has been implemented in the web-

accessible servers NetMHC-3.0: http://www.cbs.dtu.dk/services/

NetMHC-3.0, and NetMHCpan-1.1: http://www.cbs.dtu.dk/services/

NetMHCpan-1.1

Contact: [email protected]

Supplementary information: Supplementary data are available at

Bioinformatics online

1 INTRODUCTION

Determination of peptide binding to MHC class I is an

important step in cytotoxic T cell lymphocyte (CTL) epitope

discovery methods for class I MHC peptide binding. These

methods have become increasingly accurate (Lundegaard et al.,

2007; Moutaftsi et al., 2006; Peters et al., 2006), limiting the

effort significantly. Most MHCs, however, prefer peptides of

the length 9, making available binding data of 9mer peptides

significantly more abundant than data for other lengths such as8, 10 and 11mers as these more rarely binds to the MHCs. Since

the amount of available data is crucial for the developing of

accurate predictions (Yu et al., 2002), the number of accurate

predictors of these other lengths is limited. Ligand motifs have

been elucidated for several MHCs and groups of MHCs (Lund

et al., 2004; Rammensee et al., 1999; Sette and Sidney, 1998).

According to these motifs, the most important (anchor) residues

are in general the positions 2, 3 and the C-terminal, disregarding

the peptide length. However, for a limited number of non-

human MHCs other peptide positions might be the primary

anchors (see Supplementary Material). Using this knowledge,

we exploited the possibility of generating pseudo 9mers frompeptides of other length by fixating these positions and inserting

or deleting residues at other positions. This resulted in a simple

though remarkably accurate method to overcome the length

problem using affinity predictions by 9mer predictors of such

pseudo 9mers. This method will in principle work with any type

of existing MHC 9mer binding prediction methods.

2 METHODS

2.1 9mer predictions

Here, we used predictions generated by NetMHC-3.0 (http://www.

cbs.dtu.dk/services/NetMHC-3.0) (Buus et al., 2003; Nielsen et al.,

2003). However, any 9mer:MHC binding prediction algorithm thataccepts unknown amino acids (i.e. X) can be used.

2.2 Prediction of 8mer affinities

In 8mer peptides (e.g. EIGHTMER) an X is inserted repeatedly

at either position 4, 5, 6, 7 or 8, resulting in five new pseudo

peptides of length 9; EIGXHTMER, EIGHXTMER, EIGHTXMER,

EIGHTMXER and EIGHTMEXR (Fig. 1A). The final predicted

affinity is calculated as the geometrical mean of the five predicted

affinities in nano Molar units.

2.3 Prediction of 10 and 11mer affinities

The longermers (e.g. TENELEVENS) are converted into 9mers by

deleting 1 (10mers) or 2 (11mers) residues at positions 4, 5, 6, 7, 8 or 9,

resulting in six new pseudo-peptides; TENLEVENS, TENEEVENS,TENELVENS, TENELEENS, TENELEVNS and TENELEVES

(Fig. 1B). The final predicted affinity is calculated as the geometrical

mean of the six predicted affinities in nano Molar units (Fig. 1C).

2.4 Evaluation

The method was evaluated using peptide IC50 and Kd affinity data

extracted from the web site of the Immune Epitope Database

and Analysis resource (IEDB) (Sette et al., 2005). This resulted

in the 8mer, 9mer and 10mer evaluation data available at*To whom correspondence should be addressed.

The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1397
http://www.cbs.dtu.dk/services/http://www.cbs.dtu.dk/services/http://www/http://www/http://www.cbs.dtu.dk/services/http://www.cbs.dtu.dk/services/

7/27/2019 Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen

2/2

http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xls, http://

www.cbs.dtu.dk/services/NetMHC-3.0/evalset_10mers_all.xls and http://

www.cbs.dtu.dk/services/NetMHC-3.0/evalset_11mers.xls, respectively.

8mer data: 1975 measurements distributed on 35 MHC alleles. 10mer

data: 13 507 measurements distributed on 31 MHC alleles. 11mer data:

181 measurements, distributed on 25 MHC alleles. We evaluated the

accuracy by Pearson correlation coefficients (PCC), and area under

receiver operating characteristic (ROC) curves (AUC) using a binding

cutoff of 500 nM.

3 RESULTS

We predicted affinities for 8mer, 10mer and 11mer data using

the approximation method described in Methods. The overall

PCCs for all predictions within each dataset were 0.69, 0.73 and

0.74, respectively. The overall AUCs were 0.86, 0.87 and 0.89,

respectively. To calculate an AUC value, we needed both

negative (IC50 or Kd500 nM) and positive (IC50 or Kd 5

500 nM) affinity data, thus we removed alleles having only

binders or non-binders from the evaluation. This lead to 27

compared alleles on 8mer peptides. The resulting PCC values

had a mean of 0.72. Acceptable AUC values (above 0.7) were

obtained for 25 of the 27 covered alleles (Table 1). To evaluate

the 10mer approximation, we calculated PCC and AUC values

for 27 alleles. Using a 500 nM threshold, 26 of 27 alleles had

AUCs above 0.7 (Table 1).

To compare the approximation method with specifically

trained methods, we used artificial neural networks (ANNs)

previously trained as described in (Nielsen et al., 2003) on

10mer data. For 10mers, 2037 new data points covering 16

alleles had become available since training of 10mer specific

ANNs, available at http://www.cbs.dtu.dk/services/NetMHC-

3.0/evalset_10mers.xls. AUC values were calculated for eachallele using either ANNs trained on 10mers or the approxima-

tion method described here (Supplementary Fig. 1). For 12 of

the 16 alleles the approximation method performed better than

the 10mer trained ANNs (P50.01).

For the currently small number of alleles for which the

primary peptide anchor position(s) are in positions 48 the

approximation method will not work well. Examples of such

alleles and how to identify these are described in the

Supplementary Material.

Conflict of Interest: none declared.

REFERENCES

Buus,S. et al. (2003) Sensitive quantitative predictions of peptide-MHC binding

by a Query by Committee artificial neural network approach. Tissue

Antigens, 62, 378384.

Lund,O. et al. (2004) Definition of supertypes for HLA molecules using clustering

of specificity matrices. Immunogenetics, 55, 797810.

Lundegaard,C. et al. (2007) Modeling the adaptive immune system: predictions

and simulations. Bioinformatics, 23, 32653275.

Moutaftsi,M. et al. (2006) A consensus epitope prediction approach identifies the

breadth of murine T(CD8)-cell responses to vaccinia virus. Nat. Biotechnol.,

24, 817819.

Nielsen,M. et al. (2003) Reliable prediction of T-cell epitopes using neural

networks with novel sequence representations. Protein Sci., 12, 10071017.

Peters,B. et al. (2006) A community resource benchmarking predictions of peptide

binding to MHC-I molecules. PLoS Comput. Biol., 2, e65.

Rammensee,H. et al. (1999) SYFPEITHI: database for MHC ligands and peptide

motifs. Immunogenetics, 50, 213219.

Sette,A. et al. (2005) A roadmap for the immunomics of category A-C pathogens.

Immunity, 22, 155161.

Sette,A. and Sidney,J. (1998) HLA supertypes and supermotifs: a functional

perspective on HLA polymorphism. Curr. Opin. Immunol., 10, 478482.

Yu,K. et al. (2002) Methods for prediction of peptide binding to MHC molecules:

a comparative study. Mol. Med., 8, 137148.

Table 1. PCC and AUC values of predicted 8mer and 10mer affinities

using the approximation method

The total number of 8mer and 10mer data, is given as #8mer data and #10mer

data, respectively. The number of binding peptides ( Kd5500nM) in the 8mer and

10mer datasets are given as #8mer binders and #10mer binders, respectively. NA:

Value not calculated due to lack of data.

Fig. 1. (A) Illustration of one 8mer to five 9mer conversion.

(B) Illustration of one 10mer to six 9mer conversion. (C) Calculation

of six pseudo 9mer predictions back to a 10mer prediction.

1398

C.Lundegaard et al.
http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xlshttp://www.cbs.dtu.dk/services/NetMHC-http://www.cbs.dtu.dk/services/NetMHC-http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xls

Documents

Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen 2008