Dat DC 11 HIV Machine Learning

Preview:

Citation preview

Genetic predictors of antibody-specific neutralization of HIV

Machine learning for HIV vaccine development

The Problem

Identify binding site (epitope) of an antibody that is capable of neutralizing HIV

Image: Wikipedia.com

Image: Burton et al., 2012

The Observation• Virus strains with variable genetic sequences

are neutralized/not neutralized by specific antibodies to varying degrees

The Assumption• Genetic variation is causative of this observed

variation in neutralization

HIV Genetic & Functional Variation

Image: bnaber.org HIV Strain

The Approach

1) Model neutralization/non-neutralization as a function of genetic features (classification)

2) Perform feature selection to identify the most predictive genetic features

3) Plug selected features into secondary predictive model to validate selection

4) Test hypothesis against a) existing literature b) laboratory test methods

Feature Vectorization

• Position/residue pairs– Ex: 789=K

• Potential N-Linked Glycosylation Sites– Regex (N[^P][ST])– Ex: 197=PNGS

Naïve Bayes(… first swing & a miss...)

ROC AUC: 0.887 Log Loss: 3.77

Expected predictive features

Feature Selection

• Trimming data set• Decision tree• Random forest

Feature Selection

Feature Selection

Decision tree w/ ROC AUC

Validation with Logistic Regression

Analysis Across 4 AntibodiesAntibody Most predictive

features supported by literature

Model MCC Literature MCC (Gananakananalnasaanan)

2F5 1) 789=K2) 791=A

0.83 0.81

PG9 1) 197=PNGS198=V vs. 198=1

0.53 0.43

VRC01 1) 561=R2) 564=G3) 587=E4) 359=N

0.51 N/A

2G12 1) 363=PNGS2) 408=N, 411=S *3) 479=PNGS

0.66 N/A

Future Directions

• More sophisticated feature vectorization– Chemically similar amino acids– Pairwise features– Small chunks of sequence (n-grams)– Structural modeling

• Better feature selection – Minimum Redundancy Max Relevance (mRMR)– Correct for cross-clade correlations

• Regression model

Sources• Bnaber database: http://www.bnaber.org/

• Burton, Dennis R., et al. "Broadly neutralizing antibodies present new prospects to counter highly antigenically diverse viruses." Science 337.6091 (2012): 183-186.

• Chuang, Gwo-Yu, et al. "Residue-level prediction of HIV-1 antibody epitopes based on neutralization of diverse viral strains." Journal of virology 87.18 (2013): 10047-10058.

• Gnanakaran, S., et al. "Genetic signatures in the envelope glycoproteins of HIV-1 that associate with broadly neutralizing antibodies." PLoS Comput Biol6.10 (2010): e1000955.

• Hepler, N. Lance, et al. "IDEPI: Rapid Prediction of HIV-1 Antibody Epitopes and Other Phenotypic Features from Sequence Data Using a Flexible Machine Learning Platform." PLOS Comput Biol 10.9 (2014): e1003842.

• LANL HIV database CATNAP tool: http://www.hiv.lanl.gov/components/sequence/HIV/neutralization/user.comp

• Libbrecht, Maxwell W., and William Stafford Noble. "Machine learning applications in genetics and genomics." Nature Reviews Genetics 16.6 (2015): 321-332.

• Pillai, Satish K., et al. "Semen-specific genetic characteristics of human immunodeficiency virus type 1 env." Journal of virology 79.3 (2005): 1734-1742.

• West, Anthony P., et al. "Computational analysis of anti–HIV-1 antibody neutralization panel data to identify potential functional epitope residues."Proceedings of the National Academy of Sciences 110.26 (2013): 10598-10603.

Recommended