19
Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data mining session Jules J. Berman, Ph.D., M.D. Program Director, Pathology Informatics Cancer Diagnosis Program, DCTD, NCI, NIH [email protected]

Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Embed Size (px)

Citation preview

Page 1: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Data standards in pathology informatics and experimental pathology

Experimental Biology 2004April 17, 2004Association for Pathology InformaticsData mining session

Jules J. Berman, Ph.D., M.D.Program Director, Pathology InformaticsCancer Diagnosis Program, DCTD, NCI, [email protected]

Page 2: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Standards issues

Standard ways of obtaining medical research data (confidentiality/security methods)

Standard ways of organizing data (nomenclatures, data structures)

Standards in professional behavior (submitting primary data with research)

Page 3: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

UFO Abductees

Lots of them

They often say about the same thing (independent confirmations)

All walks of life

Mostly honest and rational people

Minority are a little crazy

One problem: no evidence

Page 4: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Researchers who don’t publish their primary data

Lots of them

They often say about the same thing (independent confirmations)

All walks of life

Mostly honest and rational people

Minority are a little crazy

One problem: no evidence

This is why we need to share data

Page 5: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

After your research data reaches a certain size, the data becomes the publication, and the journal articles become tiny editorials that describe or interpret the data

Think of the relationship between the earth and the sun.

Terra-centrics did not want to think that their planet was not the center of the universe.

But actually, earth is a tiny fraction of the size of the sun, and people eventually switched to a heliocentric vision of reality.

Research papers are mere editorials that revolve around a central large BLOB of data.

The database is the publication. Everything else is peripheral.

Page 6: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Examples where the data is the central research object:

Human Genome Project

Gene Expression Arrays

Tissue Microarrays (a thousand cores of tissue)

Proteomics

Page 7: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Data standards are all about data Sharing:

NIH Statement on Data Sharinghttp://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html

National Research Council UPSIDE Universal Principle of Sharing Integral Data Expeditioushttp://books.nap.edu/books/0309088593/html/R1.html

Page 8: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

NIH Funding for data sharing!!!

TOOLS FOR COLLABORATIONS THAT INVOLVE DATA SHARINGhttp://grants1.nih.gov/grants/guide/pa-files/PAR-03-134.html

INFRASTRUCTURE FOR DATA SHARING AND ARCHIVINGhttp://grants.nih.gov/grants/guide/rfa-files/RFA-HD-03-032.html

Page 9: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

What is a standard?

Approval through a Standards Organization (ISO, IEC)

Standards Organization route is less and less appealing for scientists:

Takes too long (obsolete when complete)

Enormous bureaucratic overhead

Doesn’t guarantee acceptance (DICOM visible light)

Doesn’t guarantee conformance (most ANSI computer languages)

Doesn’t guarantee availability (MUMPS)

Don’t despair: alternates are available

Page 10: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

OR: Find a field where progress is impeded because of lack of standards

Gather stakeholders

Have a few open meetings

Create a product using standard methods

Open the drafts to public scrutiny

Publish the product as an open specification (not a standard)

Publish implementations of the specification

Page 11: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

W3C (WorldWideWeb Consortium) is a good example of an open specification that did not go through any approval process of a Standards organization

Consortium of interested companies

They promote activities related to the WWW and the internet.

They publish reports

IT SEEMS TO WORK

Page 12: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Real world example: The Tissue Microarray Data Exchange Specification

The greatest value of TMAs is the ability to link TMA data with data from other TMAs and from other databases that inform on the data contained in the TMA database.

That value is essentially untapped because there has been no way to publish, exchange, merge and link TMA datasets in a manner that everyone can use and understand.

Page 13: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

The basic properties of the TMA specification:

1. Self-describing

2. Made from commonly understood data structures

3. Extremely simple (most of our stakeholders are not sophisticated bioinformaticians, computer scientists, or metadata experts)

4. Infinitely scalable (can be endlessly combined with other data sources)

Page 14: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Four API meetings to discuss the TMA specification

May 30, 2001. Ann Arbor, Michigan. Chair of speaker session: Mark A Rubin. Speakers: David Rimm, Steve Bova, Matt Van de Rijn, Jules Berman

Oct. 6, 2001. Pittsburgh, PA and co-sponsored by The National Cancer Institute. Chair, Mary Edgerton. Speakers: Olli Kallioniemi, Chris Chute, Richard Lieberman, Paul Spellman. Chair of Data Exchange Workshop: Mary Edgerton.

May 22, 2002. Ann Arbor, Michigan and co-sponsored by the National Cancer Institute. Chair of Speaker session: Mark A. Rubin. Speakers: James Bacus, Angelo de Marzo, Peggy Porter, David Rimm and Guido Sauter. Chair of Data Exchange Workshop: Dr. Mary Edgerton.

October 4, 2002. Held in conjunction with Advancing Pathology Informatics, Imaging and the Internet, Pittsburgh, PA. Chair of speaker session: Mary Edgerton. Speakers: Steve Hewitt, Ulysses Balis. Chair of Data Exchange Workshop: Mary Edgerton.

Page 15: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

In brief:

The TMA Specification is an open access document that can be used without any restriction.

Its development was sponsored by the NCI and by the Association for Pathology Informatics

All the documents and software that you might need to obtain, understand and implement the specification are

available in two recently published open access manuscripts.

Page 16: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Basics of the specification:

Jules J Berman, Mary Edgerton and Bruce Friedman.The tissue microarray data exchange specification: a community-based, open source tool for sharing tissue microarray data. BMC Med Inform Decis Mak. 2003 May 23;3:5

Real-world implementation example:

Jules J Berman, Milton Datta, Andre Kajdacsy-Balla, Jonathan Melamed, Jan Orenstein, Kevin Dobbin, Ashok Patel, Rajiv Dhir, Michael J Becich. The tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource. BMC Bioinformatics 2004 Feb 27, 5:19

Page 17: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

What’s next for the Association for Pathology Informatics?

Page 18: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

Pathology image specification (part of API Laboratory Digital Imaging Project)

1. DICOM provides a visible light image standard

2. Nobody likes it (zero implemenatations of the visible light standard since its release, about 5 years ago

3. Isn’t designed to encapsulate multiplexed images, non-pictorial image data (spectral data, tiled images, histopathological and clinical annotation, data security protocols)

4. Lets make a specification in XML that pathologists can actually implement

5. Lets make tools that can port back and forth to DICOM

6. Lets get community participation along the way

Page 19: Data standards in pathology informatics and experimental pathology Experimental Biology 2004 April 17, 2004 Association for Pathology Informatics Data

end