42
UNMCMD (University of New Mexico Center for Molecular Discovery) Biomolecular Screening Informatics Chem-, Bio- and Screening Informatics Jeremy J Yang PhD Student, IU Cheminformatics Mgr, Systems & Programming, UNM Biocomputing Indiana University School of Informatics and Computing - I571, Intro to Cheminformatics - Fall 2011

UNMCMD - Indiana University Bloomingtonhomes.soic.indiana.edu/jejyang/misc/unmcmd_screening_informatics.pdfMany other cheminfo tools for SAR, lead optimization, ... UNMCMD, 2007 (PubChem

Embed Size (px)

Citation preview

UNMCMD(University of New Mexico Center for Molecular Discovery)

Biomolecular ScreeningInformatics

Chem-, Bio- and Screening Informatics

Jeremy J YangPhD Student, IU Cheminformatics

Mgr, Systems & Programming, UNM Biocomputing

Indiana University School of Informatics and Computing - I571, Intro to Cheminformatics - Fall 2011

Abstract

Biomolecular screening is key to drug and probediscovery, requiring strong informatics in particularfor high-throughput screening (HTS). Screeninginformatics involves cheminformatics andbioinformatics and additional specializedfunctionality. In this talk we discuss the UNMCenter for Molecular Discovery screeninginformatics system from a cheminformaticsviewpoint, but also noting how the primaryfunctions and context are biological, not chemical,and some consequences of this central fact.

Definition: Screening Informatics

• Informatics in support of screening forbiomolecular discovery, usually pharma discovery.• Acquisition, processing and storage of bioassaydata for use during projects and for retrospectiveanalyses.• Searching over molecules, assays, activities,targets, etc. I/O & integration in conformance withcontractual, legal/regulatory, business, andscientific requirements.• Applications and interfaces suited to trans-disciplinary audience (biology, chemistry, etc.).

Why do we care about Screening Informatics?

One of the primary motivations forcheminformatics has been drug discovery andbiomedical research. This motivation includesscientific and economic components. (E.g. thereare careers and rewards for biomedicaladvances.)

More on role of cheminformatics

“…strengthened ties to related fields… and we

need to find our place in relation to our sister field

of bioinformatics.

DJ Wild, 2009”

Proposed analogy:

Cheminformatics : Bioinformatics ~ Math : Science

i.e., cheminformatics provides rigor, quantification and formalism.e.g., UCSF SEA: cheminfo probe/metric of bioinfo.

Outline

1. Intro to biomolecular screening2. NIH Roadmap Molecular Libraries Program3. UNM Center for Molecular Discovery4. Intro to flow cytometry at UNMCMD5. Evolution and development of a screening

informatics system6. Current system technical description

It starts with a biological question, hypothesis, and experimental design

A biological assay is an experiment.A substance is tested for interaction with a

biological entity: organism, tissue, cell, or molecule.Substance may be a known compound, mixture,

or unknown.High Throughput Screening (HTS) normally

means automated, robotic, many substances per experiment and per day.

KEY POINT: Substance may be a known compound, mixture, or unknown...

SID:4253253, CID:252101

… and “known” structures may be corrected or changed.… and purity can be an issue – “known”is relative.

http://www.ted.com/talks/bonnie_bassler_on_how_bacteria_communicate.htmlAn incredible story and perspective; see also NIH Microbiome Project.

Example: Bacterial quorum sensing assay

Bioassay Design“Endpoints” measured/inferred in many waysFluorescence and fluorophores often used to

indicate and endpoint such as protein-ligand binding, presence or absence of a metabolic process/proteinExperimental “artifacts” are produced by the

method, not the system under study and can lead to false conclusions (e.g. fluorescent compounds, aggregators)Bioassay design a huge challenge in itself (1

compound, 1 test, 1 time)

Slide 10 / 40

Slide from talk: “Cheminformatics and the evolving relationship between data in the public domain & pharma”, by Dana Vanderwall, BMS, 2010, http://www.scivee.tv/node/26112

Slide c/o Tudor Oprea, UNM Division of Biocomputing

For history, see http://www.cyto.purdue.edu/cdroms/cyto10a/researchcenters/unm.html

Slide c/o Tudor Oprea, UNM Division of Biocomputing

Intro to Flow Cytometry at UNMCMD

Multiplexed!

Flow Cytometry Shifting Gears, Genetic Eng & Biotech News, Nov 15, 2011 (Vol. 31, No. 20) , http://www.genengnews.com/gen-articles/flow-cytometry-shifting-gears/3913.

Introduce and Manipulate Gates

c/o Anna Waller, UNMCMDHyperViewSession_20110603

HyperView: app to view, analyze, process raw HyperCyt fluorescence data

Binning data: correlating data with wells

HyperView: app to view, analyze, process raw HyperCyt fluorescence data

c/o Anna Waller, UNMCMD HyperViewSession_20110603 Slide 20 / 40

Evolution and development of a screening informatics system

Dates Milestones/phases2004 - 2008 MLSCN pilot phase2004-2006 ActivityBase (IDBS) used somewhat2006 - 2008 RoadRunner, in-house system developed and used

2008 - 2014 MLPCN production phase2008 Evaluation of Symyx and Accelrys systems, selection of

Accelrys2009 Installation, customization and configuration of AEI/PP

system (~50 telecons)2009 - present Accelrys Accord Enterprise (AEI), Pipeline Pilot

2010 - present In-house tools developed to work with AEI & PP

2006 - present Excel and CSV files used a lot too!

As a process, sub-optimal, but as a collaborative journey, highly instructive.

Roadrunner system [2006-2008]

• Main developer: Steve Mathias• Perl/PHP• Why was it discontinued? The answer is complex but instructive.

Current UNMCMD screening informatics system:Major Components

• HyperSip, HyperView• Pipeline Pilot (Scitegic Enterprise Server) for

assay uploads• Oracle/AEI chem/bio database• Custom Perl for assay result analysis • Prism for dose-response analysis…

Current UNMCMD screening informatics system:Major Components

Many other cheminfo tools for SAR, lead optimization, secondary assay guidance Still plenty of Excel PubChem (e.g. CIDs)

MicroSoft Excel, not going away soon

Excel remains an important tool for scientific data processing, analysis and visualization, at UNMCMD and elsewhere. But it has fundamental limitations and drawbacks, esp. data and code access and version control.

E.g. Bcl-2 assay analysis worksheets, UNMCMD, 2007 (PubChem AID=1693).

Pipeline Pilot, custom upload protocol

Loading raw HTS data into AEI database

AEI schema

Note:No

cheminformatics in this SQL!

SELECT hts_ap_archive.runset_number as "RUN",hts_ap_archive.ap_alias as "APlateName",hts_plate.plate_id,hts_plate.alternate_id as "IPlateName",hts_well.well_no,hts_sample.alternate_id,hts_result_detail.value_char AS Target,hts_result_type.type_desc AS Result_Type,hts_assay_result.concentration || hts_conc_unit.unit_value AS CONC,hts_assay_result.dilution,hts_assay_result.result_value AS ValueFROM hts_well,

hts_plate,hts_sample,hts_conc_unit,ddi_container_master,hts_ap_archive,hts_assay_result,hts_result_type,hts_result_detail

WHERE hts_ap_archive.ap_alias='213_20110712_135454-1'AND hts_assay_result.sample_id=hts_well.sample_idAND hts_assay_result.sample_id=hts_sample.sample_idAND hts_well.plate_id=hts_plate.plate_idAND hts_assay_result.plate_id=hts_ap_archive.ap_numberAND hts_plate.alternate_id=ddi_container_master.container_nameAND hts_well.plate_id=hts_plate.plate_idAND hts_well.sample_id=hts_sample.sample_idAND hts_assay_result.sample_plate=ddi_container_master.container_idAND hts_result_type.result_type=hts_assay_result.result_typeAND hts_assay_result.result_id=hts_result_detail.result_idAND hts_assay_result.conc_unit=hts_conc_unit.unit_idORDER BY hts_well.well_no,Target,Result_Type

AEI schema

Example of chemical exact match search

SELECT DISTINCThts_compound.compound_id "CorpID",hts_sample.alternate_id "UNMID",hts_sample.alias_no "SID",hts_sample.alias_id "SupplierID",hts_sample.sample_id "SampleID",hts_sample.text1 "Library_Comment",hts_sample.text2 "Library_Type",hts_sample.text3 "Library_Date",hts_assay.assay_name,hts_assay_protocol.version_no "Protocol",hts_plate.alternate_id "IPlateName",hts_ap_archive.ap_alias "APlateName",hts_well.well_no

FROMstructures,hts_sample,hts_compound,hts_compound_lot,hts_assay,hts_assay_protocol,hts_assay_result,hts_plate,hts_ap_archive,hts_well

WHEREChemistryMatches(structures.chemistry,SSSConst.MatchExact,'COCCSc1ccccc1C(=O)Nc1ccc(Cl)cn1',

'SMILES')>0AND structures.substance_id=hts_compound.compound_idAND hts_compound_lot.compound_id=hts_compound.compound_idAND hts_sample.sample_id=hts_compound_lot.sample_idAND hts_sample.sample_id=hts_well.sample_idAND hts_sample.sample_id=hts_assay_result.sample_idAND hts_assay_result.alt_assay_id=hts_assay_protocol.alt_assay_idAND hts_assay_protocol.assay_id=hts_assay.assay_idAND hts_well.plate_id=hts_plate.plate_idAND hts_plate.plate_status=13AND hts_assay_result.plate_id=hts_ap_archive.ap_number;Slide 30 / 40

Analyzing data with aeva.cgi

Analyzing data with aeva.cgi

Analyzing data with aeva.cgi

Secondary assays Primary HTS typically few 100s hits per 100k Next steps: Confirm “good” hits, discard “bad” hits “Good” and “bad” can mean many things Good hits = leads SAR to understand and suggest additional

compounds for synthesis or purchase (analog by catalog) “Cherry-pick” plates, confirmatory and secondary

assay for lead optimization Repeat till lead good enough for next phase

SAR: Clustering

SAR patterns: ChemTattoo™

Mesa Analytics & Computing

Filtering & profiling by SMARTSa.k.a. structural alerts

H3CO

OH3C

O

O

NHOH3C S O

N

O

O

HN

Dose-response curves

Hill Function, logistic 4-parameter model

Deliverables: data, probes, papers

Acknowledgements

Thanks to UNMCMD colleagues for education, support, and slides:

•Tudor Oprea•Cristian Bologa•Oleg Ursu•Steve Mathias•Gergely Zahoransky-Kohalmi•Jerry Abear•Jarrett Hines

•Anna Waller•Annette Evangelisti•Bruce Edwards•Larry Sklar

The End

Feel free to contact me directly with questions or ideas!

Jeremy J [email protected]