Babel-fish in Cheminformatics
W3C Workshop on
Semantic Web for Life SciencesCambridge MA 27th-28th October 2004
www.denovopharma.com
The babel fish – universal language
www.denovopharma.com
Cheminformatics – infinite data formats
Data Type Data format(s) Data Domain Supplier Supported Databases
Protein Structure PDB, mmCIF Target Validation and Lead Identification
RCSB PDB
Sequence FASTA, Swiss-Prot, PIR, EMBL, GenBank, GCGdata, GCG, ClustalW, MSF
Target Identification and Target Validation
GENBANK, EMBL, NBRF, Stanford, SIB, EBI
SWISS-PROT, PIR, EMBL, TrEMBL, GenBank, Wisconsin, SeqStore, GenoMax, PROSITE, Merops, SRS
Small Molecule Mol, mol2, SMILES, SD, .msi, .skc, .chm, .cpd, sdf,
Lead Identification and Lead Optimisation
MDL, CambridgeSoft, Accelrys, Tripos, Daylight, IDBS
ISISBase, DayCartUnity, Catalyst, Chemfinder
Image Jpeg, gif, tiff, eps, ps, pict Any Northplains systems, SciMagix
SIMS, Telescope
HTS/Assay .xls, txt, delimited ASCII Lead Identification and Lead Optimisation
IDBS, MDL ActivityBase, AssayExplorer
Text Doc, txt, pdf Any Multiple Documentum, Lotus, Verity, RetreivalWare, Muscat
Xray CSSR, .csd, .fdat, .dat, cif, mif
Target Validation and Lead Identification
CCDC, SERC IsoStar, CSD
www.denovopharma.com
Cheminformatics – molecule formats
Mol, mol2, sd, SMILES, SMIRKS, SMARTS, skc, sdf, rlx, ptr, sph, pdb, molen, molin, Shelx, FDAT, CSSR, Charmm, CADPAC, Chem3D, Xed, Spartan, MM2, MM3, Gromos, Gaussian, GAMESS (various), GSTAT, Boogie, Cacao, GROMOS, Hyperchem, Tinker, Diagnostics, BGF, Dock, CAChe, Mopac (various), Maccs, etc.etc.
www.denovopharma.com
Impact of heterogeneity
TIME and MONEY
A very small initiative
…compound collections
www.denovopharma.com
Use Case – Compound Collections
www.denovopharma.com
Thinking about the bigger picture…
www.denovopharma.com
Return on investment
Save FTE time Real-time updates Accurate availabilities and amounts Automated ordering Compound brokerage Save money Less delays You know what you’re getting!
www.denovopharma.com
So why is no one interested?
www.denovopharma.com
Vendors
“Manually normalised” collections £11,000 per copy per annum Tie-in with vendor application Repeat fee for file format change
Always out of date Time consuming updates Vertical mind set for vertical sales
www.denovopharma.com
And they’re all at it…so more formats!
www.denovopharma.com
How do we change this?
Vendors respond to $’s
A group of Big customers weighs more than a 600lb gorilla
The vendor who builds a customer requested standard gets bigger market share
www.denovopharma.com
So where’s my Babel-fish?
Data first, fish second Standardisation groups and initiatives W3C I3C OMG LSR
CSAR LSID Compound Collections?
www.denovopharma.com
WORKING GROUP CHAIRS at LSR
Contacts: [email protected] (Juan Esteva)
[email protected] (Steve Chervitz)
[email protected] (Richard Scott)
[email protected] (Charles Troup)
[email protected] (Martin Senger)
[email protected] (Tokio Kano)
www.denovopharma.com
LSR WANTS