Upload
amos-snow
View
212
Download
0
Embed Size (px)
Citation preview
Data standards in pathology informatics and experimental pathology
Experimental Biology 2004April 17, 2004Association for Pathology InformaticsData mining session
Jules J. Berman, Ph.D., M.D.Program Director, Pathology InformaticsCancer Diagnosis Program, DCTD, NCI, [email protected]
Standards issues
Standard ways of obtaining medical research data (confidentiality/security methods)
Standard ways of organizing data (nomenclatures, data structures)
Standards in professional behavior (submitting primary data with research)
UFO Abductees
Lots of them
They often say about the same thing (independent confirmations)
All walks of life
Mostly honest and rational people
Minority are a little crazy
One problem: no evidence
Researchers who don’t publish their primary data
Lots of them
They often say about the same thing (independent confirmations)
All walks of life
Mostly honest and rational people
Minority are a little crazy
One problem: no evidence
This is why we need to share data
After your research data reaches a certain size, the data becomes the publication, and the journal articles become tiny editorials that describe or interpret the data
Think of the relationship between the earth and the sun.
Terra-centrics did not want to think that their planet was not the center of the universe.
But actually, earth is a tiny fraction of the size of the sun, and people eventually switched to a heliocentric vision of reality.
Research papers are mere editorials that revolve around a central large BLOB of data.
The database is the publication. Everything else is peripheral.
Examples where the data is the central research object:
Human Genome Project
Gene Expression Arrays
Tissue Microarrays (a thousand cores of tissue)
Proteomics
Data standards are all about data Sharing:
NIH Statement on Data Sharinghttp://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html
National Research Council UPSIDE Universal Principle of Sharing Integral Data Expeditioushttp://books.nap.edu/books/0309088593/html/R1.html
NIH Funding for data sharing!!!
TOOLS FOR COLLABORATIONS THAT INVOLVE DATA SHARINGhttp://grants1.nih.gov/grants/guide/pa-files/PAR-03-134.html
INFRASTRUCTURE FOR DATA SHARING AND ARCHIVINGhttp://grants.nih.gov/grants/guide/rfa-files/RFA-HD-03-032.html
What is a standard?
Approval through a Standards Organization (ISO, IEC)
Standards Organization route is less and less appealing for scientists:
Takes too long (obsolete when complete)
Enormous bureaucratic overhead
Doesn’t guarantee acceptance (DICOM visible light)
Doesn’t guarantee conformance (most ANSI computer languages)
Doesn’t guarantee availability (MUMPS)
Don’t despair: alternates are available
OR: Find a field where progress is impeded because of lack of standards
Gather stakeholders
Have a few open meetings
Create a product using standard methods
Open the drafts to public scrutiny
Publish the product as an open specification (not a standard)
Publish implementations of the specification
W3C (WorldWideWeb Consortium) is a good example of an open specification that did not go through any approval process of a Standards organization
Consortium of interested companies
They promote activities related to the WWW and the internet.
They publish reports
IT SEEMS TO WORK
Real world example: The Tissue Microarray Data Exchange Specification
The greatest value of TMAs is the ability to link TMA data with data from other TMAs and from other databases that inform on the data contained in the TMA database.
That value is essentially untapped because there has been no way to publish, exchange, merge and link TMA datasets in a manner that everyone can use and understand.
The basic properties of the TMA specification:
1. Self-describing
2. Made from commonly understood data structures
3. Extremely simple (most of our stakeholders are not sophisticated bioinformaticians, computer scientists, or metadata experts)
4. Infinitely scalable (can be endlessly combined with other data sources)
Four API meetings to discuss the TMA specification
May 30, 2001. Ann Arbor, Michigan. Chair of speaker session: Mark A Rubin. Speakers: David Rimm, Steve Bova, Matt Van de Rijn, Jules Berman
Oct. 6, 2001. Pittsburgh, PA and co-sponsored by The National Cancer Institute. Chair, Mary Edgerton. Speakers: Olli Kallioniemi, Chris Chute, Richard Lieberman, Paul Spellman. Chair of Data Exchange Workshop: Mary Edgerton.
May 22, 2002. Ann Arbor, Michigan and co-sponsored by the National Cancer Institute. Chair of Speaker session: Mark A. Rubin. Speakers: James Bacus, Angelo de Marzo, Peggy Porter, David Rimm and Guido Sauter. Chair of Data Exchange Workshop: Dr. Mary Edgerton.
October 4, 2002. Held in conjunction with Advancing Pathology Informatics, Imaging and the Internet, Pittsburgh, PA. Chair of speaker session: Mary Edgerton. Speakers: Steve Hewitt, Ulysses Balis. Chair of Data Exchange Workshop: Mary Edgerton.
In brief:
The TMA Specification is an open access document that can be used without any restriction.
Its development was sponsored by the NCI and by the Association for Pathology Informatics
All the documents and software that you might need to obtain, understand and implement the specification are
available in two recently published open access manuscripts.
Basics of the specification:
Jules J Berman, Mary Edgerton and Bruce Friedman.The tissue microarray data exchange specification: a community-based, open source tool for sharing tissue microarray data. BMC Med Inform Decis Mak. 2003 May 23;3:5
Real-world implementation example:
Jules J Berman, Milton Datta, Andre Kajdacsy-Balla, Jonathan Melamed, Jan Orenstein, Kevin Dobbin, Ashok Patel, Rajiv Dhir, Michael J Becich. The tissue microarray data exchange specification: implementation by the Cooperative Prostate Cancer Tissue Resource. BMC Bioinformatics 2004 Feb 27, 5:19
What’s next for the Association for Pathology Informatics?
Pathology image specification (part of API Laboratory Digital Imaging Project)
1. DICOM provides a visible light image standard
2. Nobody likes it (zero implemenatations of the visible light standard since its release, about 5 years ago
3. Isn’t designed to encapsulate multiplexed images, non-pictorial image data (spectral data, tiled images, histopathological and clinical annotation, data security protocols)
4. Lets make a specification in XML that pathologists can actually implement
5. Lets make tools that can port back and forth to DICOM
6. Lets get community participation along the way
end