Upload
garey-roberts
View
227
Download
1
Tags:
Embed Size (px)
Citation preview
Training and applying Training and applying hidden Markov models and hidden Markov models and support vector machines support vector machines
for prediction of T-cell epitopesfor prediction of T-cell epitopes
Van Hai VanVan Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc, Cao Thi Ngoc Phuong, Tran Linh Thuoc
Faculty of Biology, University of Natural Sciences, Faculty of Biology, University of Natural Sciences, VNU-HCMC, VietnamVNU-HCMC, Vietnam
Sixth InternationalSixth InternationalConference on BioinformaticsConference on Bioinformatics
InCoB2007InCoB2007
Epitope predictionEpitope prediction
“Epitope is the portion of an antigen that is recognized by the antigen receptor on lymphocytes”
Molecular Biology
Epitope prediction:
Computers aid to develop epitope-based vaccines against various human pathogens for which no vaccines currently exist
http://www.scripps.edu/newsandviews/e_20050228/hiv.html
T-cell epitope predictionT-cell epitope prediction •T-cell epitopes are a subset of MHC binding peptides prediction of the peptides binding to MHC is essential for design of peptide-based vaccines•HLA-A0201
Sequence
Binding motifs
Quantitative matrices
Decision tree
Artificial neural networks
Hidden Markov models
Support vector machines
Molecular Biology
HMMs & SVMsHMMs & SVMs
HMMs
(Hidden Markov Models)
Statistical model that can capture complex relationships in data sets.
SVMs
(Support Vector Machines):
Learning machine that can find the optimal separating hyperplane.
Epitope prediction for dengue virusEpitope prediction for dengue virusTropical disease• Dengue fever• Dengue hemorraghic fever• Dengue shock syndromeHypothesis of pathogenesis• Antibody – dependent
enhancement• Virus virulenceNo dengue vaccine is available
In our research:
. Develop procedure for building automatically T-cell epitope predicting models
. Find candidates in silico for making multivalent vaccines on 4 types of Dengue virus
Building models for predicting T-cell epitopes Building models for predicting T-cell epitopes & applying these models on dengue virus& applying these models on dengue virus
Building effective prediction models?Building effective prediction models?
The predicting ability of HMM and SVM models depends on:
•Experimentally peptides binding to MHC molecules
•Partition of the peptides into training set and testing set
•Encoding method
A system finds easily and quickly the best prediction model when type of MHC molecules and quantity of binding peptides are changed
Experiment 1Experiment 1Method HMMs SVMs
Databases MHCBN, MHCPEP
Homology 7- amino acid
No. homologous groups binding seq.: 11 , non-binding seq.: 3
Kind of peptide BindingNon-
bindingBinding
Non-binding
No. peptides
Training set 623 25 20
Testing set 80 30 678 30
Training times 200 200
Parameters E-value = 0 ÷ 10
Linear kernel, c = 0
Encoding: binary, Blosum-62,
physical-chemical method
Result of the training by HMMsResult of the training by HMMs
HMM.7.136:
AROC=0.914
Choose parameter from HMM.7.136:
At point: E=3.4, S=-8.5,
SE=0.91, SP= 0.86, AROC=0.885
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1.2 2.4 3.6 4.8 6 7.2 8.4 9.6
E - val ue
AROC
Result of the training by SVMsResult of the training by SVMs
Binary encoding: AROC=0.42÷0.77
Blosum-62 encoding: AROC= 0.47÷0.87
Chemical-physical encoding: AROC= 0.41÷0.71
At blosum-62 encoding, data set SVM.7.blo62.46:
SE=0.83, SP=0.90, AROC=0.87
Experiment 2Experiment 2
Method HMMs SVMs
Databases MHCBN, MHCPEP, IEDB
Homology 7- amino acid, 6-amino acid, 5-amino acid
Training times 200 100
Parameters E-value = 40 ÷ 80
Linear kernel, c = 0
Encoding: binary, Blosum-62,
Binary - Blosum-62 method
Result of the training by HMMsResult of the training by HMMs
Homology 5-amino acid 6-amino acid 7-amino acid
Kind of peptide Binding Binding Binding
No. homologous group 82 139 84
No. Sequences in homologous groups
1232 551 374
Total peptides
Training set 1189 1165 1188
Testing set 632 656 633
AROC 0.832÷0.877 0.835÷0.883 0.828÷0.876
The best HMM profile HMM.6.78
Training in 6-amino acid homologous groupsTraining in 6-amino acid homologous groups
0.8
0.85
0.9
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191
The training time
AR
OC
valu
es
Parameters of HMM.6.78:
At point: E=42, S=-9.2,
SE=0.91, SP= 0.84, AROC=0.875
HMM.6.78: AROC=0.883
0.8
0.85
0.9
40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80
E value
ARO
C v
alue
s
Result of the training by SVMs methodsHomology 5-amino acid 6-amino acid 7-amino acid
Kind of peptide BindingNon-
bindingBinding
Non-binding
BindingNon-
binding
Total homologous group 82 176 139 45 84 21
Sequence in homologous groups
1232 540 551 116 374 60
Total sequences
Training set 1189 1282 1165 1365 1188 1367
Testing set 632 557 656 474 633 472
AROC
Binary encoding (1) 0.847÷0.884 0.845÷0.880 0.838÷0.882
Blosum-62 encoding (2) 0.843÷0.884 0.846 ÷0.883 0.838÷0.894
Binary-Blosum-62 encoding (3) 0.849÷0.879 0.847 ÷0.889 0.850÷0.891
Chosen setSVM.blo62.7.8
5
Training in 7-amino acid homologous groupsTraining in 7-amino acid homologous groups
At SVM.2.7.85:
SE=0.93, SP=0.86, AROC=0.894
0.8
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.9
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96The training time
AR
OC v
alu
es
: Binary encoding : Blosum-62 encoding : Binary-Blosum-62 encoding
Epitope predicting procedure for dengue virusEpitope predicting procedure for dengue virus
1. Do multiple sequence alignment
2. Extract consensus sequences more than or equal 9 amino acids
3. Create 9-mer overlap sequences
4. Predict peptides binding to MHC by HMMs profile or SVMs model
Experiment 1Experiment 1Proteins (1,2,3,4) Epitope sequences Methods
537NS3, 536NS3, 2010DV3_gp1, 536NS3 LMRRGDLPVWL HMMs, SVMs
763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs, SVMs
358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI SVMs
658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV SVMs
472NS5, 473NS5, 223NS5, 473NS5 AIWYMWLGA SVMs
101E, 99E, 99glycoprot, 99E RGWGNGCGL SVMs
194NS1, 194NS1, 193NS1, 194NS1 VHADMGYWI SVMs
352NS5, 353NS5, 103NS5, 353NS5 RVFKEKVDT SVMs
13NS1, 13NS1, 12NS1, 13NS1 LKCGSGIFV SVMs
26NS1, 26NS1, 25NS1, 26NS1 HTWTEQYKF SVMs
230NS1, 230NS1, 229NS1, 230NS1 TLWSNGVLES SVMs
327NS1, 327NS1, 326NS1, 327NS1 DGCWYGMEIRP SVMs
148NS3, 148NS3, 142Pep_S7, 148NS3 GLYGNGVVT SVMs
256NS3, 255NS3, 67DEXHc, 255NS3 EIVDLMCHA SVMs
297NS3, 296NS3, 108DEXHc, 296NS3 ARGYISTRV SVMs
410NS3, 409NS3, 54HELICc, 409NS3 DISEMGANF SVMs
36NS4B, 35NS4B, 35NS4B, 32NS4B ASAWTLYAV SVMs
118NS4B, 117NS4B, 117NS4B, 114NS4B HYAIIGPGLQA SVMs
142NS4B, 141NS4B, 141NS4B, 138NS4B IMKNPTVDGI SVMs
224NS4B, 223NS4B, 223NS4B, 220NS4B NIFRGSYLAGA SVMs
81NS5, 81NS5, 27FtsJ, 81NS5 GCGRGGWSY SVMs
529NS5, 530NS5, 280NS5, 530NS5 MYADDTAGW SVMs
602NS5, 603NS5, 353NS5, 603NS5 QVGTYGLNT SVMs
606NS5, 607NS5, 357NS5, 607NS5 YGLNTFTNM SVMs
682NS5, 683NS5, 434NS5, 684NS5 DMGKVRKDI SVMs
745NS5, 746NS5, 497NS5, 747NS5 WSLRETACLG SVMs
788NS5, 789NS5, 540NS5, 790NS5 PTSRTTWSI SVMs
Proteins (1,2,3,4) Epitope sequences Methods
537NS3, 536NS3, 2010DV3_gp1, 536NS5 LMRRGDLPV HMMs
763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs
358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI HMMs
658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV HMMs
469NS5, 470NS5, 220NS5, 470NS5 GSRAIWYMWLGAR HMMs
103E, 101E, 101DV3_gp1, 101E WGNGCGLFG SVMs
193NS1, 193NS1, 192NS1, 193NS1 AVHADMGYWIES SVMs
348NS5, 349NS5, 99NS5, 349NS5 FGQQRVFKE SVMs
568NS5, 569NS5, 319NS5, 569NS5 FKLTYQNKV HMMs
Experiment 2Experiment 2
Result of epitope prediction (peptide binding to HLA-A0201 prediction):
Join overlap 9-amino acid peptides predicted binding to HLA-A0201 molecules
Result of prediction Result of prediction
• HMMs profile is stable and increase ability of prediction when there are additional data sets.
• SVMs model is good but ability of prediction decreases when amount of training data increases.
ConclusionConclusion
• Successfully building system for training Hidden Markov models and Support Vector Machines
• Generating training and testing data based on separating data set into homologous groups give us good result.
• Could predict consensus epitope for 4 types of Dengue virus based on data of peptides binding to HLA-A0201
Future plansFuture plans
• Set other kernels on SVMs method
• Survey other encoding method for sequences having flexible length
• Survey other methods for classifying MHC data to homologous groups
• Automate procedure collecting and updating data of peptide binding MHC from databases