Upload
stephen-blaskowski
View
102
Download
5
Embed Size (px)
Citation preview
Genetic predictors of antibody-specific neutralization of HIV
Machine learning for HIV vaccine development
The Problem
Identify binding site (epitope) of an antibody that is capable of neutralizing HIV
Image: Wikipedia.com
Image: Burton et al., 2012
The Observation• Virus strains with variable genetic sequences
are neutralized/not neutralized by specific antibodies to varying degrees
The Assumption• Genetic variation is causative of this observed
variation in neutralization
HIV Genetic & Functional Variation
Image: bnaber.org HIV Strain
The Approach
1) Model neutralization/non-neutralization as a function of genetic features (classification)
2) Perform feature selection to identify the most predictive genetic features
3) Plug selected features into secondary predictive model to validate selection
4) Test hypothesis against a) existing literature b) laboratory test methods
Feature Vectorization
• Position/residue pairs– Ex: 789=K
• Potential N-Linked Glycosylation Sites– Regex (N[^P][ST])– Ex: 197=PNGS
Naïve Bayes(… first swing & a miss...)
ROC AUC: 0.887 Log Loss: 3.77
Expected predictive features
Feature Selection
• Trimming data set• Decision tree• Random forest
Feature Selection
Feature Selection
Decision tree w/ ROC AUC
Validation with Logistic Regression
Analysis Across 4 AntibodiesAntibody Most predictive
features supported by literature
Model MCC Literature MCC (Gananakananalnasaanan)
2F5 1) 789=K2) 791=A
0.83 0.81
PG9 1) 197=PNGS198=V vs. 198=1
0.53 0.43
VRC01 1) 561=R2) 564=G3) 587=E4) 359=N
0.51 N/A
2G12 1) 363=PNGS2) 408=N, 411=S *3) 479=PNGS
0.66 N/A
Future Directions
• More sophisticated feature vectorization– Chemically similar amino acids– Pairwise features– Small chunks of sequence (n-grams)– Structural modeling
• Better feature selection – Minimum Redundancy Max Relevance (mRMR)– Correct for cross-clade correlations
• Regression model
Sources• Bnaber database: http://www.bnaber.org/
• Burton, Dennis R., et al. "Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses." Science 337.6091 (2012): 183-186.
• Chuang, Gwo-Yu, et al. "Residue-level prediction of HIV-1 antibody epitopes based on neutralization of diverse viral strains." Journal of virology 87.18 (2013): 10047-10058.
• Gnanakaran, S., et al. "Genetic signatures in the envelope glycoproteins of HIV-1 that associate with broadly neutralizing antibodies." PLoS Comput Biol6.10 (2010): e1000955.
• Hepler, N. Lance, et al. "IDEPI: Rapid Prediction of HIV-1 Antibody Epitopes and Other Phenotypic Features from Sequence Data Using a Flexible Machine Learning Platform." PLOS Comput Biol 10.9 (2014): e1003842.
• LANL HIV database CATNAP tool: http://www.hiv.lanl.gov/components/sequence/HIV/neutralization/user.comp
• Libbrecht, Maxwell W., and William Stafford Noble. "Machine learning applications in genetics and genomics." Nature Reviews Genetics 16.6 (2015): 321-332.
• Pillai, Satish K., et al. "Semen-specific genetic characteristics of human immunodeficiency virus type 1 env." Journal of virology 79.3 (2005): 1734-1742.
• West, Anthony P., et al. "Computational analysis of anti–HIV-1 antibody neutralization panel data to identify potential functional epitope residues."Proceedings of the National Academy of Sciences 110.26 (2013): 10598-10603.