Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen 2008

Embed Size (px)

Citation preview

  • 7/27/2019 Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen

    1/2

    Vol. 24 no. 11 2008, pages 13971398

    BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn128

    Sequence analysis

    Accurate approximation method for prediction of class I MHC

    affinities for peptides of length 8, 10 and 11 using prediction

    tools trained on 9mersClaus Lundegaard*, Ole Lund and Morten NielsenCenter for Biological Sequence Analysis CBS, Department of Systems Biology, The Technical University of

    Denmark DTU, Kemitorvet Build. 208, 2800 Lyngby, Denmark

    Received on February 8, 2008; revised and accepted on April 4, 2008

    Advance Access publication April 14, 2008

    Associate Editor: Burkhard Rost

    ABSTRACT

    Summary: Several accurate prediction systems have been devel-

    oped for prediction of class I major histocompatibility complex

    (MHC):peptide binding. Most of these are trained on binding affinity

    data of primarily 9mer peptides. Here, we show how prediction

    methods trained on 9mer data can be used for accurate binding

    affinity prediction of peptides of length 8, 10 and 11. The method

    gives the opportunity to predict peptides with a different length than

    nine for MHC alleles where no such peptides have been measured.

    As validation, the performance of this approach is compared to

    predictors trained on peptides of the peptide length in question. In

    this validation, the approximation method has an accuracy that is

    comparable to or better than methods trained on a peptide length

    identical to the predicted peptides.

    Availablility: The algorithm has been implemented in the web-

    accessible servers NetMHC-3.0: http://www.cbs.dtu.dk/services/

    NetMHC-3.0, and NetMHCpan-1.1: http://www.cbs.dtu.dk/services/

    NetMHCpan-1.1

    Contact: [email protected]

    Supplementary information: Supplementary data are available at

    Bioinformatics online

    1 INTRODUCTION

    Determination of peptide binding to MHC class I is an

    important step in cytotoxic T cell lymphocyte (CTL) epitope

    discovery methods for class I MHC peptide binding. These

    methods have become increasingly accurate (Lundegaard et al.,

    2007; Moutaftsi et al., 2006; Peters et al., 2006), limiting the

    effort significantly. Most MHCs, however, prefer peptides of

    the length 9, making available binding data of 9mer peptides

    significantly more abundant than data for other lengths such as8, 10 and 11mers as these more rarely binds to the MHCs. Since

    the amount of available data is crucial for the developing of

    accurate predictions (Yu et al., 2002), the number of accurate

    predictors of these other lengths is limited. Ligand motifs have

    been elucidated for several MHCs and groups of MHCs (Lund

    et al., 2004; Rammensee et al., 1999; Sette and Sidney, 1998).

    According to these motifs, the most important (anchor) residues

    are in general the positions 2, 3 and the C-terminal, disregarding

    the peptide length. However, for a limited number of non-

    human MHCs other peptide positions might be the primary

    anchors (see Supplementary Material). Using this knowledge,

    we exploited the possibility of generating pseudo 9mers frompeptides of other length by fixating these positions and inserting

    or deleting residues at other positions. This resulted in a simple

    though remarkably accurate method to overcome the length

    problem using affinity predictions by 9mer predictors of such

    pseudo 9mers. This method will in principle work with any type

    of existing MHC 9mer binding prediction methods.

    2 METHODS

    2.1 9mer predictions

    Here, we used predictions generated by NetMHC-3.0 (http://www.

    cbs.dtu.dk/services/NetMHC-3.0) (Buus et al., 2003; Nielsen et al.,

    2003). However, any 9mer:MHC binding prediction algorithm thataccepts unknown amino acids (i.e. X) can be used.

    2.2 Prediction of 8mer affinities

    In 8mer peptides (e.g. EIGHTMER) an X is inserted repeatedly

    at either position 4, 5, 6, 7 or 8, resulting in five new pseudo

    peptides of length 9; EIGXHTMER, EIGHXTMER, EIGHTXMER,

    EIGHTMXER and EIGHTMEXR (Fig. 1A). The final predicted

    affinity is calculated as the geometrical mean of the five predicted

    affinities in nano Molar units.

    2.3 Prediction of 10 and 11mer affinities

    The longermers (e.g. TENELEVENS) are converted into 9mers by

    deleting 1 (10mers) or 2 (11mers) residues at positions 4, 5, 6, 7, 8 or 9,

    resulting in six new pseudo-peptides; TENLEVENS, TENEEVENS,TENELVENS, TENELEENS, TENELEVNS and TENELEVES

    (Fig. 1B). The final predicted affinity is calculated as the geometrical

    mean of the six predicted affinities in nano Molar units (Fig. 1C).

    2.4 Evaluation

    The method was evaluated using peptide IC50 and Kd affinity data

    extracted from the web site of the Immune Epitope Database

    and Analysis resource (IEDB) (Sette et al., 2005). This resulted

    in the 8mer, 9mer and 10mer evaluation data available at*To whom correspondence should be addressed.

    The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1397

    http://www.cbs.dtu.dk/services/http://www.cbs.dtu.dk/services/http://www/http://www/http://www.cbs.dtu.dk/services/http://www.cbs.dtu.dk/services/
  • 7/27/2019 Accurate Approximation Method for Prediction of Class I MHC Affinities for Peptides - Lundegaard, Lund & Nielsen

    2/2

    http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xls, http://

    www.cbs.dtu.dk/services/NetMHC-3.0/evalset_10mers_all.xls and http://

    www.cbs.dtu.dk/services/NetMHC-3.0/evalset_11mers.xls, respectively.

    8mer data: 1975 measurements distributed on 35 MHC alleles. 10mer

    data: 13 507 measurements distributed on 31 MHC alleles. 11mer data:

    181 measurements, distributed on 25 MHC alleles. We evaluated the

    accuracy by Pearson correlation coefficients (PCC), and area under

    receiver operating characteristic (ROC) curves (AUC) using a binding

    cutoff of 500 nM.

    3 RESULTS

    We predicted affinities for 8mer, 10mer and 11mer data using

    the approximation method described in Methods. The overall

    PCCs for all predictions within each dataset were 0.69, 0.73 and

    0.74, respectively. The overall AUCs were 0.86, 0.87 and 0.89,

    respectively. To calculate an AUC value, we needed both

    negative (IC50 or Kd500 nM) and positive (IC50 or Kd 5

    500 nM) affinity data, thus we removed alleles having only

    binders or non-binders from the evaluation. This lead to 27

    compared alleles on 8mer peptides. The resulting PCC values

    had a mean of 0.72. Acceptable AUC values (above 0.7) were

    obtained for 25 of the 27 covered alleles (Table 1). To evaluate

    the 10mer approximation, we calculated PCC and AUC values

    for 27 alleles. Using a 500 nM threshold, 26 of 27 alleles had

    AUCs above 0.7 (Table 1).

    To compare the approximation method with specifically

    trained methods, we used artificial neural networks (ANNs)

    previously trained as described in (Nielsen et al., 2003) on

    10mer data. For 10mers, 2037 new data points covering 16

    alleles had become available since training of 10mer specific

    ANNs, available at http://www.cbs.dtu.dk/services/NetMHC-

    3.0/evalset_10mers.xls. AUC values were calculated for eachallele using either ANNs trained on 10mers or the approxima-

    tion method described here (Supplementary Fig. 1). For 12 of

    the 16 alleles the approximation method performed better than

    the 10mer trained ANNs (P50.01).

    For the currently small number of alleles for which the

    primary peptide anchor position(s) are in positions 48 the

    approximation method will not work well. Examples of such

    alleles and how to identify these are described in the

    Supplementary Material.

    Conflict of Interest: none declared.

    REFERENCES

    Buus,S. et al. (2003) Sensitive quantitative predictions of peptide-MHC binding

    by a Query by Committee artificial neural network approach. Tissue

    Antigens, 62, 378384.

    Lund,O. et al. (2004) Definition of supertypes for HLA molecules using clustering

    of specificity matrices. Immunogenetics, 55, 797810.

    Lundegaard,C. et al. (2007) Modeling the adaptive immune system: predictions

    and simulations. Bioinformatics, 23, 32653275.

    Moutaftsi,M. et al. (2006) A consensus epitope prediction approach identifies the

    breadth of murine T(CD8)-cell responses to vaccinia virus. Nat. Biotechnol.,

    24, 817819.

    Nielsen,M. et al. (2003) Reliable prediction of T-cell epitopes using neural

    networks with novel sequence representations. Protein Sci., 12, 10071017.

    Peters,B. et al. (2006) A community resource benchmarking predictions of peptide

    binding to MHC-I molecules. PLoS Comput. Biol., 2, e65.

    Rammensee,H. et al. (1999) SYFPEITHI: database for MHC ligands and peptide

    motifs. Immunogenetics, 50, 213219.

    Sette,A. et al. (2005) A roadmap for the immunomics of category A-C pathogens.

    Immunity, 22, 155161.

    Sette,A. and Sidney,J. (1998) HLA supertypes and supermotifs: a functional

    perspective on HLA polymorphism. Curr. Opin. Immunol., 10, 478482.

    Yu,K. et al. (2002) Methods for prediction of peptide binding to MHC molecules:

    a comparative study. Mol. Med., 8, 137148.

    Table 1. PCC and AUC values of predicted 8mer and 10mer affinities

    using the approximation method

    The total number of 8mer and 10mer data, is given as #8mer data and #10mer

    data, respectively. The number of binding peptides ( Kd5500nM) in the 8mer and

    10mer datasets are given as #8mer binders and #10mer binders, respectively. NA:

    Value not calculated due to lack of data.

    Fig. 1. (A) Illustration of one 8mer to five 9mer conversion.

    (B) Illustration of one 10mer to six 9mer conversion. (C) Calculation

    of six pseudo 9mer predictions back to a 10mer prediction.

    1398

    C.Lundegaard et al.

    http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xlshttp://www.cbs.dtu.dk/services/NetMHC-http://www.cbs.dtu.dk/services/NetMHC-http://www.cbs.dtu.dk/services/NetMHC-3.0/evalset_8mers.xls