25
What can your library do for you? Rajarshi Guha, Dac-Trung Nguyen, Alexey Zhakarov, Ajit Jadhav NIH NCATS ACS Fall Meeting 2016, Philadelphia August 21, 2016

What can your library do for you?

  • Upload
    rguha

  • View
    130

  • Download
    3

Embed Size (px)

Citation preview

Page 1: What can your library do for you?

What can your library do for you?

Rajarshi Guha, Dac-Trung Nguyen,Alexey Zhakarov, Ajit Jadhav

NIH NCATS

ACS Fall Meeting 2016, Philadelphia

August 21, 2016

Page 2: What can your library do for you?

Library Design

I Historical collections and assay data provide information onhow a set of compounds has faired

I Use (dis)similarity and machine learning to construct newcollections that show similar behavior

I Plus various constraints

I Libraries can be designed for certain target families or specificscreening paradigms

If sufficiently annotated, compound behaviormay be correlated to assay and biology char-acteristics

Page 3: What can your library do for you?

Library Design

I Historical collections and assay data provide information onhow a set of compounds has faired

I Use (dis)similarity and machine learning to construct newcollections that show similar behavior

I Plus various constraints

I Libraries can be designed for certain target families or specificscreening paradigms

If sufficiently annotated, compound behaviormay be correlated to assay and biology char-acteristics

Page 4: What can your library do for you?

Two Questions

How likely are compounds, associated with a given annotation,identified as active?

Given a new set of compounds, what sets of assay conditions (asimplied by the annotations) will they be active in?

Page 5: What can your library do for you?

BAO 2.0

Page 6: What can your library do for you?

Assay Modeling

Page 7: What can your library do for you?

Prior Work

I BAO annotated datasetsI de Souza et al, 2014; Vempati et al, 2012

I Analyzing HTS datasets using BAOI Zander-Balderud et al, 2015; Schurer et al, 2011

I Semi-automated annotation of assay descriptions using theBAO

I Clark et al, 2014

Page 8: What can your library do for you?

Workflow

I Extract unique BAO terms and for each term identifyannotated assays

I Extract active compounds from this set of assays

I Compute fingerprint bit distribution

I Use these conditional bit distributions to identify the BAOterms that describe the assay that they are likely to be activein

Page 9: What can your library do for you?

Dataset Overview

I Extracted 4010 Pubchem AIDs from BARD

I Primary, confirmation, counterscreening assays

I 154M outcomes

I 740K compounds

I Pubchem 881-bit keys using CDK and NCGC implementations

I 192 unique BAO terms

Page 10: What can your library do for you?

Dataset Overview

1e+02

1e+05

1e+08

Active Inactive Inconclusive Probe Unspecified

Num

ber

of o

utco

mes

0

200

400

0 2 4 6log Num. Compound

Den

sity

(as

says

)

Outcome

Active

Inactive

0

300

600

900

2 3 4 5 6 7 8 9 10 11 12 13Term Depth

Num

ber

of T

erm

s

0

1000

2000

1 2 3 4 5 6 7 8 9 13Number of BAO Terms

Num

ber

of A

ssay

s

Page 11: What can your library do for you?

Class Imbalances

Imbalanced classes are problematic, and some ofthe terms with near-balanced classes are not veryspecific (e.g., imaging method)

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●

●●

●●

●●

VICTOR X2 Multilabel Plate Readerimaging method

radiometry method

SpectraMax 190 Microplate Reader

reporter protein

thermal shift

0

3

6

9

BAO ID

Cla

ss R

atio

Page 12: What can your library do for you?

Problem formulationFor a given library of compounds X , we would like to calculate aranked list relevant T of BAO terms that are most likely associatedwith X . Let x ∈ X and t is a BAO term. The list T is an orderedlist based on the following:

argmaxi

∑j

p(ti |xj)

, (1)

where p(ti |xj) is the probability that BAO term ti is associatedwith compound xj . From Bayes’ rule, we have

p(ti |xj) =p(xj |ti )p(ti )

p(xj)or p(ti |xj) ∝ p(xj |ti )p(ti ).

Given that BAO terms are annotated at the assay level, we insteadhave

p(ti |xj) ∝ p(ti )∑k

p(xj |ak)p(ak |ti ), (2)

where ak is a BAO annotated assay.

Page 13: What can your library do for you?

A Bayesian Approach for Ranking

Note that p(xj |ak) is the sampling function specified over onlyactive compounds in assay ak . In our model, xj is defined asindependent Bernoulli distribution with parameter θ, i.e.,

p(xj |ak) =∏i

θxjii (1− θi )1−xji ,

where xji ∈ {0, 1} is the i-th bit of the PubChem substructuralfingerprint.

Learning BAO terms for a library of compounds amounts toestimating θ, p(ti ), and p(ak |ti ).

Page 14: What can your library do for you?

Per-Term Activity Classifier

I For a given ontology term Ti , predict whether a compoundwill be active or not

I Model this using Naıve Bayes, where we extract set of activesand inactives from assays annotated with Ti

I Results in a set of models {M1,M2, · · · ,MN},

I For a new compound in library, obtain probability of beingactive for term Ti for all i and take top k terms

I Aggregate top k terms from all compounds in library

I Represents the set of ontology terms defining an assayin which these compounds would likely be active

Page 15: What can your library do for you?

Test Libraries

I Considered several libraries to test out the approach

I MIPE (1912 compounds) - Approved, investigational drugs,constructed for functional diversity

I LOPAC (1280 compounds) - Diverse library, designed forenrichment of bioactivity

I Natural Products (5000 compounds)

I 1000 member subset of ChEMBL GPCR collection

I 1000 member subset of ChEMBL Kinase collection

Page 16: What can your library do for you?

Test Libraries

In the Pubchem fingerprint space, the libraries are not very different

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

gpcr1gpcr2

kinase1kinase2

lopacm

ipenp

Bit Position

Nor

mal

ized

Fre

quen

cy

Page 17: What can your library do for you?

Test Libraries - Distance Matrix

lopac

mipe

np

kinase1

kinase2

gpcr1

gpcr2

lopac

mipe np

kinas

e1

kinas

e2gp

cr1gp

cr2

0

1

2

3

4

Euc. Dist.

Page 18: What can your library do for you?

Prediction Workflow

Bayesian Ranking

I Compute liklihood of allterms for each compound

I Aggregate across library(mean liklihood) and taketop k

Activity Models

I For molecules predictedactive, collect correspondingterms

I Retain the top k mostfrequent terms across thelibrary

We take the top k terms as the set of anno-tations describing an assay in which the librarywill show activity in

Page 19: What can your library do for you?

Result - Bayesian Ranking

BAO_0000001BAO_0000002BAO_0000003BAO_0000004BAO_0000005BAO_0000010BAO_0000015BAO_0000035BAO_0000045BAO_0000046BAO_0000049BAO_0000050BAO_0000051BAO_0000055BAO_0000057BAO_0000062BAO_0000063BAO_0000070BAO_0000079BAO_0000080BAO_0000100BAO_0000123BAO_0000129BAO_0000130BAO_0000131BAO_0000139BAO_0000142BAO_0000152BAO_0000160BAO_0000164BAO_0000166BAO_0000217BAO_0000218BAO_0000219BAO_0000220BAO_0000221BAO_0000223BAO_0000224BAO_0000225BAO_0000249BAO_0000250BAO_0000251BAO_0000254BAO_0000357BAO_0000363BAO_0000366BAO_0000394BAO_0000405BAO_0000450BAO_0000452BAO_0000453BAO_0000508BAO_0000512BAO_0000513BAO_0000515BAO_0000516BAO_0000572BAO_0000577BAO_0000578BAO_0000591BAO_0000593BAO_0000657BAO_0000682BAO_0000691BAO_0000697BAO_0000698BAO_0000699BAO_0000701BAO_0000705BAO_0000706BAO_0000722BAO_0000850BAO_0000884BAO_0000902BAO_0000903BAO_0000904BAO_0000905BAO_0000906BAO_0000913BAO_0000943BAO_0000982BAO_0001019BAO_0001036BAO_0001046BAO_0001047BAO_0001104BAO_0002001BAO_0002041BAO_0002043BAO_0002090BAO_0002100BAO_0002168BAO_0002176BAO_0002182BAO_0002188BAO_0002196BAO_0002424BAO_0002527BAO_0002528BAO_0002530BAO_0002534BAO_0002656BAO_0002989BAO_0002990BAO_0002991BAO_0002993BAO_0002994BAO_0002995BAO_0002996BAO_0002997BAO_0002998BAO_0003000BAO_0003002BAO_0003003BAO_0003004BAO_0003005BAO_0003006BAO_0003007BAO_0003008BAO_0003009BAO_0003010BAO_0003063BAO_0003064BAO_0003069

lopac

mipe NP

−5e+09

−4e+09

−3e+09

−2e+09

−1e+09

Avg Prob

Page 20: What can your library do for you?

Result - Per Term Activity Classifier

bioluminescence

molecular redistribution determination method

direct enzyme activity measurement method

whole cell lysate format

BAO_0000722

lopac

mipe np

kinas

e1

kinas

e2gp

cr1gp

cr2

Rank

1

2

3

4

5

Page 21: What can your library do for you?

What’s Different Between Libraries?

Page 22: What can your library do for you?

What’s Different Between Libraries?

Page 23: What can your library do for you?

Term Depth for the ’Differential’ Terms

●●

●●●

●●●

●●

●●

●●●● ●

3

4

5

6

7

8

3 4 5 6 7 8Term Depth (LOPAC)

Term

Dep

th (

MIP

E)

● ●

●●●

●●●

●●

●●

● ●●● ●

3

4

5

6

7

8

3 4 5 6 7 8Term Depth (NP)

Term

Dep

th (

MIP

E)

Page 24: What can your library do for you?

Pitfalls

If sufficiently annotated, compound behavior maybe correlated to assay and biology characteristics

I A very abstract, possibly lossy, view of the effect ofcompounds on biology

I Depends on correct and meaningful annotations

I Annotations terms are context dependent, but this may notbe considered when annotating a dataset

I BAO terms exhibit hierarchical relationships and ignoringthem is simplistic

Page 25: What can your library do for you?

Acknowledgements

I Qiong Cheng (U. Miami)

I Stephan Schurer (U. Miami)

Source code and slides