17
Genetic predictors of antibody-specific neutralization of HIV Machine learning for HIV vaccine development

Dat DC 11 HIV Machine Learning

Embed Size (px)

Citation preview

Page 1: Dat DC 11 HIV Machine Learning

Genetic predictors of antibody-specific neutralization of HIV

Machine learning for HIV vaccine development

Page 2: Dat DC 11 HIV Machine Learning

The Problem

Identify binding site (epitope) of an antibody that is capable of neutralizing HIV

Page 3: Dat DC 11 HIV Machine Learning

Image: Wikipedia.com

Page 4: Dat DC 11 HIV Machine Learning

Image: Burton et al., 2012

Page 5: Dat DC 11 HIV Machine Learning

The Observation• Virus strains with variable genetic sequences

are neutralized/not neutralized by specific antibodies to varying degrees

The Assumption• Genetic variation is causative of this observed

variation in neutralization

Page 6: Dat DC 11 HIV Machine Learning

HIV Genetic & Functional Variation

Image: bnaber.org HIV Strain

Page 7: Dat DC 11 HIV Machine Learning

The Approach

1) Model neutralization/non-neutralization as a function of genetic features (classification)

2) Perform feature selection to identify the most predictive genetic features

3) Plug selected features into secondary predictive model to validate selection

4) Test hypothesis against a) existing literature b) laboratory test methods

Page 8: Dat DC 11 HIV Machine Learning

Feature Vectorization

• Position/residue pairs– Ex: 789=K

• Potential N-Linked Glycosylation Sites– Regex (N[^P][ST])– Ex: 197=PNGS

Page 9: Dat DC 11 HIV Machine Learning

Naïve Bayes(… first swing & a miss...)

ROC AUC: 0.887 Log Loss: 3.77

Page 10: Dat DC 11 HIV Machine Learning

Expected predictive features

Page 11: Dat DC 11 HIV Machine Learning

Feature Selection

• Trimming data set• Decision tree• Random forest

Page 12: Dat DC 11 HIV Machine Learning

Feature Selection

Page 13: Dat DC 11 HIV Machine Learning

Feature Selection

Decision tree w/ ROC AUC

Page 14: Dat DC 11 HIV Machine Learning

Validation with Logistic Regression

Page 15: Dat DC 11 HIV Machine Learning

Analysis Across 4 AntibodiesAntibody Most predictive

features supported by literature

Model MCC Literature MCC (Gananakananalnasaanan)

2F5 1) 789=K2) 791=A

0.83 0.81

PG9 1) 197=PNGS198=V vs. 198=1

0.53 0.43

VRC01 1) 561=R2) 564=G3) 587=E4) 359=N

0.51 N/A

2G12 1) 363=PNGS2) 408=N, 411=S *3) 479=PNGS

0.66 N/A

Page 16: Dat DC 11 HIV Machine Learning

Future Directions

• More sophisticated feature vectorization– Chemically similar amino acids– Pairwise features– Small chunks of sequence (n-grams)– Structural modeling

• Better feature selection – Minimum Redundancy Max Relevance (mRMR)– Correct for cross-clade correlations

• Regression model

Page 17: Dat DC 11 HIV Machine Learning

Sources• Bnaber database: http://www.bnaber.org/

• Burton, Dennis R., et al. "Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses." Science 337.6091 (2012): 183-186.

• Chuang, Gwo-Yu, et al. "Residue-level prediction of HIV-1 antibody epitopes based on neutralization of diverse viral strains." Journal of virology 87.18 (2013): 10047-10058.

• Gnanakaran, S., et al. "Genetic signatures in the envelope glycoproteins of HIV-1 that associate with broadly neutralizing antibodies." PLoS Comput Biol6.10 (2010): e1000955.

• Hepler, N. Lance, et al. "IDEPI: Rapid Prediction of HIV-1 Antibody Epitopes and Other Phenotypic Features from Sequence Data Using a Flexible Machine Learning Platform." PLOS Comput Biol 10.9 (2014): e1003842.

• LANL HIV database CATNAP tool: http://www.hiv.lanl.gov/components/sequence/HIV/neutralization/user.comp

• Libbrecht, Maxwell W., and William Stafford Noble. "Machine learning applications in genetics and genomics." Nature Reviews Genetics 16.6 (2015): 321-332.

• Pillai, Satish K., et al. "Semen-specific genetic characteristics of human immunodeficiency virus type 1 env." Journal of virology 79.3 (2005): 1734-1742.

• West, Anthony P., et al. "Computational analysis of anti–HIV-1 antibody neutralization panel data to identify potential functional epitope residues."Proceedings of the National Academy of Sciences 110.26 (2013): 10598-10603.