Upload
phungkiet
View
215
Download
0
Embed Size (px)
Citation preview
UNMCMD(University of New Mexico Center for Molecular Discovery)
Biomolecular ScreeningInformatics
Chem-, Bio- and Screening Informatics
Jeremy J YangPhD Student, IU Cheminformatics
Mgr, Systems & Programming, UNM Biocomputing
Indiana University School of Informatics and Computing - I571, Intro to Cheminformatics - Fall 2011
Abstract
Biomolecular screening is key to drug and probediscovery, requiring strong informatics in particularfor high-throughput screening (HTS). Screeninginformatics involves cheminformatics andbioinformatics and additional specializedfunctionality. In this talk we discuss the UNMCenter for Molecular Discovery screeninginformatics system from a cheminformaticsviewpoint, but also noting how the primaryfunctions and context are biological, not chemical,and some consequences of this central fact.
Definition: Screening Informatics
• Informatics in support of screening forbiomolecular discovery, usually pharma discovery.• Acquisition, processing and storage of bioassaydata for use during projects and for retrospectiveanalyses.• Searching over molecules, assays, activities,targets, etc. I/O & integration in conformance withcontractual, legal/regulatory, business, andscientific requirements.• Applications and interfaces suited to trans-disciplinary audience (biology, chemistry, etc.).
Why do we care about Screening Informatics?
One of the primary motivations forcheminformatics has been drug discovery andbiomedical research. This motivation includesscientific and economic components. (E.g. thereare careers and rewards for biomedicaladvances.)
More on role of cheminformatics
“…strengthened ties to related fields… and we
need to find our place in relation to our sister field
of bioinformatics.
DJ Wild, 2009”
Proposed analogy:
Cheminformatics : Bioinformatics ~ Math : Science
i.e., cheminformatics provides rigor, quantification and formalism.e.g., UCSF SEA: cheminfo probe/metric of bioinfo.
Outline
1. Intro to biomolecular screening2. NIH Roadmap Molecular Libraries Program3. UNM Center for Molecular Discovery4. Intro to flow cytometry at UNMCMD5. Evolution and development of a screening
informatics system6. Current system technical description
It starts with a biological question, hypothesis, and experimental design
A biological assay is an experiment.A substance is tested for interaction with a
biological entity: organism, tissue, cell, or molecule.Substance may be a known compound, mixture,
or unknown.High Throughput Screening (HTS) normally
means automated, robotic, many substances per experiment and per day.
KEY POINT: Substance may be a known compound, mixture, or unknown...
SID:4253253, CID:252101
… and “known” structures may be corrected or changed.… and purity can be an issue – “known”is relative.
http://www.ted.com/talks/bonnie_bassler_on_how_bacteria_communicate.htmlAn incredible story and perspective; see also NIH Microbiome Project.
Example: Bacterial quorum sensing assay
Bioassay Design“Endpoints” measured/inferred in many waysFluorescence and fluorophores often used to
indicate and endpoint such as protein-ligand binding, presence or absence of a metabolic process/proteinExperimental “artifacts” are produced by the
method, not the system under study and can lead to false conclusions (e.g. fluorescent compounds, aggregators)Bioassay design a huge challenge in itself (1
compound, 1 test, 1 time)
Slide 10 / 40
Slide from talk: “Cheminformatics and the evolving relationship between data in the public domain & pharma”, by Dana Vanderwall, BMS, 2010, http://www.scivee.tv/node/26112
Slide c/o Tudor Oprea, UNM Division of Biocomputing
Intro to Flow Cytometry at UNMCMD
Multiplexed!
Flow Cytometry Shifting Gears, Genetic Eng & Biotech News, Nov 15, 2011 (Vol. 31, No. 20) , http://www.genengnews.com/gen-articles/flow-cytometry-shifting-gears/3913.
Introduce and Manipulate Gates
c/o Anna Waller, UNMCMDHyperViewSession_20110603
HyperView: app to view, analyze, process raw HyperCyt fluorescence data
Binning data: correlating data with wells
HyperView: app to view, analyze, process raw HyperCyt fluorescence data
c/o Anna Waller, UNMCMD HyperViewSession_20110603 Slide 20 / 40
Evolution and development of a screening informatics system
Dates Milestones/phases2004 - 2008 MLSCN pilot phase2004-2006 ActivityBase (IDBS) used somewhat2006 - 2008 RoadRunner, in-house system developed and used
2008 - 2014 MLPCN production phase2008 Evaluation of Symyx and Accelrys systems, selection of
Accelrys2009 Installation, customization and configuration of AEI/PP
system (~50 telecons)2009 - present Accelrys Accord Enterprise (AEI), Pipeline Pilot
2010 - present In-house tools developed to work with AEI & PP
2006 - present Excel and CSV files used a lot too!
As a process, sub-optimal, but as a collaborative journey, highly instructive.
Roadrunner system [2006-2008]
• Main developer: Steve Mathias• Perl/PHP• Why was it discontinued? The answer is complex but instructive.
Current UNMCMD screening informatics system:Major Components
• HyperSip, HyperView• Pipeline Pilot (Scitegic Enterprise Server) for
assay uploads• Oracle/AEI chem/bio database• Custom Perl for assay result analysis • Prism for dose-response analysis…
Current UNMCMD screening informatics system:Major Components
Many other cheminfo tools for SAR, lead optimization, secondary assay guidance Still plenty of Excel PubChem (e.g. CIDs)
MicroSoft Excel, not going away soon
Excel remains an important tool for scientific data processing, analysis and visualization, at UNMCMD and elsewhere. But it has fundamental limitations and drawbacks, esp. data and code access and version control.
E.g. Bcl-2 assay analysis worksheets, UNMCMD, 2007 (PubChem AID=1693).
AEI schema
Note:No
cheminformatics in this SQL!
SELECT hts_ap_archive.runset_number as "RUN",hts_ap_archive.ap_alias as "APlateName",hts_plate.plate_id,hts_plate.alternate_id as "IPlateName",hts_well.well_no,hts_sample.alternate_id,hts_result_detail.value_char AS Target,hts_result_type.type_desc AS Result_Type,hts_assay_result.concentration || hts_conc_unit.unit_value AS CONC,hts_assay_result.dilution,hts_assay_result.result_value AS ValueFROM hts_well,
hts_plate,hts_sample,hts_conc_unit,ddi_container_master,hts_ap_archive,hts_assay_result,hts_result_type,hts_result_detail
WHERE hts_ap_archive.ap_alias='213_20110712_135454-1'AND hts_assay_result.sample_id=hts_well.sample_idAND hts_assay_result.sample_id=hts_sample.sample_idAND hts_well.plate_id=hts_plate.plate_idAND hts_assay_result.plate_id=hts_ap_archive.ap_numberAND hts_plate.alternate_id=ddi_container_master.container_nameAND hts_well.plate_id=hts_plate.plate_idAND hts_well.sample_id=hts_sample.sample_idAND hts_assay_result.sample_plate=ddi_container_master.container_idAND hts_result_type.result_type=hts_assay_result.result_typeAND hts_assay_result.result_id=hts_result_detail.result_idAND hts_assay_result.conc_unit=hts_conc_unit.unit_idORDER BY hts_well.well_no,Target,Result_Type
AEI schema
Example of chemical exact match search
SELECT DISTINCThts_compound.compound_id "CorpID",hts_sample.alternate_id "UNMID",hts_sample.alias_no "SID",hts_sample.alias_id "SupplierID",hts_sample.sample_id "SampleID",hts_sample.text1 "Library_Comment",hts_sample.text2 "Library_Type",hts_sample.text3 "Library_Date",hts_assay.assay_name,hts_assay_protocol.version_no "Protocol",hts_plate.alternate_id "IPlateName",hts_ap_archive.ap_alias "APlateName",hts_well.well_no
FROMstructures,hts_sample,hts_compound,hts_compound_lot,hts_assay,hts_assay_protocol,hts_assay_result,hts_plate,hts_ap_archive,hts_well
WHEREChemistryMatches(structures.chemistry,SSSConst.MatchExact,'COCCSc1ccccc1C(=O)Nc1ccc(Cl)cn1',
'SMILES')>0AND structures.substance_id=hts_compound.compound_idAND hts_compound_lot.compound_id=hts_compound.compound_idAND hts_sample.sample_id=hts_compound_lot.sample_idAND hts_sample.sample_id=hts_well.sample_idAND hts_sample.sample_id=hts_assay_result.sample_idAND hts_assay_result.alt_assay_id=hts_assay_protocol.alt_assay_idAND hts_assay_protocol.assay_id=hts_assay.assay_idAND hts_well.plate_id=hts_plate.plate_idAND hts_plate.plate_status=13AND hts_assay_result.plate_id=hts_ap_archive.ap_number;Slide 30 / 40
Secondary assays Primary HTS typically few 100s hits per 100k Next steps: Confirm “good” hits, discard “bad” hits “Good” and “bad” can mean many things Good hits = leads SAR to understand and suggest additional
compounds for synthesis or purchase (analog by catalog) “Cherry-pick” plates, confirmatory and secondary
assay for lead optimization Repeat till lead good enough for next phase
Acknowledgements
Thanks to UNMCMD colleagues for education, support, and slides:
•Tudor Oprea•Cristian Bologa•Oleg Ursu•Steve Mathias•Gergely Zahoransky-Kohalmi•Jerry Abear•Jarrett Hines
•Anna Waller•Annette Evangelisti•Bruce Edwards•Larry Sklar