27

Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

ODOSOS in Life Sciences

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Prof. Peter Murray-Rust Group

Unilever Center for Molecular Informatics

University of Cambridge

2010-12-13

Page 2: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

ODOSOS in Life Sciences

Open Data

Open Source

Open Standards(Speci�cations)

Drug Discovery(pharmaceuticalbiosciences)

Metabolomics

Predictive ToxicologyODOSOS in chemometrics!

2010-12-13 University of Cambridge - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 3: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Knowledge...

Solanum lycopersicum...

We model our world, but ...

Life is not a latin name

Transformations areneeded

Knowledge is hidden inPDFs

Methods are hidden inproprietary software

Information Loss!

2010-12-13 University of Cambridge - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 4: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Knowledge Representation: Information

Loss

2010-12-13 University of Cambridge - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 5: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Not paying attention?

EL Willighagen, J. Chem. Inf. Model. 2006, 46:487-494

2010-12-13 University of Cambridge - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 6: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

... Molecular reality...

1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000

... and that just the chemical graphs ...

2010-12-13 University of Cambridge - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 7: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Uncertainty

Metabolomics

92.0938 m/z, glycerol?

2010-12-13 University of Cambridge - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 8: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Underlying problems..

scientists are sloppy

context is lost, causing confusion

2010-12-13 University of Cambridge - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 9: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Building Blocks

Statistics

Molecular Representation(cheminformatics /semantics)

HPC / eScience

2010-12-13 University of Cambridge - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 10: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Reproducibility needs ODOSOS

Open Data

No Intellectual Monopoly

Open Source

algorithms are complex

implementations even more

strong interaction with representation

Open Standards

Semantic Web

formats

unique identi�ers

http: // en. wikipedia. org/ wiki/ Glyn_ Moody

2010-12-13 University of Cambridge - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 11: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Open Data? Linking Data?

http://rdf.openmolecules.net/

2010-12-13 University of Cambridge - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 12: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

W3C Health Care and Life Sciences

Working Group

M. Samwald, Linked Open Drug Data for Pharmaceutical Research and

Development, submitted.

2010-12-13 University of Cambridge - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 13: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

But what about similarity?

identitity: owl:sameAs

stereochemistry: rdf:seeAlso ?

similar molecules: rdf:seeAlso, chem:hasHighTanimoto ?

has spectrum like ?

E.L. Willighagen, et al. Linking the Resource Description Framework to

Cheminformatics and Proteochemometrics, J. Biomed. Sem., in print.

2010-12-13 University of Cambridge - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 14: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Open Source: The Chemistry Development

Kit

A Family of Projects

CDK-Taverna (chemoinformatics work�ows)

JChemPaint (semantic 2D editor)

ChemoJava (GPL-ed extension)

Goals

library of cheminformatics algorithms

educational

Usage

CDK: 100+ times cited in scienti�c literature

Bioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003

C. Steinbeck et al., Curr.Pharm.Design, 2006

2010-12-13 University of Cambridge - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 15: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Bioclipse

O. Spjuth et al., BMC Bioinformatics 2007, 8:59

2010-12-13 University of Cambridge - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 16: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

QSAR Wizards

2010-12-13 University of Cambridge - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 17: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Substructure Mining

A. Andersson, M.Sc. Report

2010-12-13 University of Cambridge - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 18: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

OpenTox

E.L. Willighagen, in preparation2010-12-13 University of Cambridge - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 19: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Chemical Translation Service

G. Wolgemuth, et al., Bioinformatics. 2010 Oct 15;26(20):2647-82010-12-13 University of Cambridge - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 20: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Reference Databases?

A. Williams, Community Views and Trust in Public Domain Chemistry

Resources, 2010.

2010-12-13 University of Cambridge - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 21: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

How do we use these in Chemometrics?

More clear where sources of error are

We can validate with way more data

We can aggregate new data to make better models

2010-12-13 University of Cambridge - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 22: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Visualization: Self-Organizing Maps

4

4

4

4

4 4

4

4

44

4

8

4

4444

121212121212121212

2222

55

53

5

7

101010

5

1111

44

99 9

9

9

11

222

9

10

55555

1212

5

1 77

77

5

222 222

555

666666

3

12

9

10

7

5555

9

1 777

8

3

111111

3

3

3

3

33

3

1

8

3

8

1212

77

1 7

7

6

4

33

3

3

6

1111111111 3

9

9999

7

3

9

4

77

6

17

1

66

3 3

3

9

3

4

11

744

10

8888

3

6

71

88

8

8

3

222223

7

3

11

1

3

66

99

99

10

3

33

1

3

33

Non-Linear Mapping

Similar objects aregrouped together

Similar classes are groupedtogether

EL Willighagen, Crystal Growth & Design 2007, 7, 1738-1745.

2010-12-13 University of Cambridge - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 23: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

What about the n spaces, you showed

earlier?

2010-12-13 University of Cambridge - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 24: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Bayesian Statistics

E.L. Willighagen, et al. Linking the Resource Description Framework to

Cheminformatics and Proteochemometrics, J. Biomed. Sem., in print.

2010-12-13 University of Cambridge - 24 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 25: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Blue Obelisk

R Guha et al., J.Chem.Inf.Model.,

2006

2010-12-13 University of Cambridge - 25 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 26: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

Changes are coming!

2010-12-13 University of Cambridge - 26 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 27: Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy

Setting

Problems

BuildingBlocks

Chemometrics

Conclusion

The Details

http://www.citeulike.org/user/

egonw/tag/papers

http://chem-bla-ics.blogspot.com

http://egonw.github.com

2010-12-13 University of Cambridge - 27 - Egon Willighagen | chem-bla-ics.blogspot.com