Ulysses J. Balis, MD Professor of Pathology University of
Michigan [email protected]
Slide 2
* A brief history of digital imaging, digital annotation,
markup and metadata encoding * Primer on image data encoding
concepts and requirements * Primer on Image storage data formats *
Primer on DICOM * Primer on image metadata concepts * Exploration
of OME as a representative image metadata lexicon * Case studies in
image metadata challenges, as created by a contemporary lack of a
single common framework
Slide 3
Corona Satellite Image Program (1959-1972) Film based, but with
digital assistance its latest phase. The challenge of image search
as experienced with this project provided the first insights as to
the difficulty of this type of computational problem.
Slide 4
Modern Remote Sensing Era (1972-present)
Slide 5
Taken together, this was, and continues to be, a big mess. *
With the advent of massive remote sensing repositories, it became
necessary to have both image file formats and standard conventions
for storing the accompanying image markup or annotation data. *
Initially, images were stored as raw information, with separate
associated metadata files * Over time, it became obvious that there
was an organizational and ontological advantage in blending image
data with its respective descriptive information * This blend
directly leads to the contemporary reality, as there is no one
right way to blend such data: there are many ways * Some are public
knowledge * Some are proprietary * Some are blends of open and
proprietary formats
Slide 6
Big Data Cloud Based Repository Investigator Collaborating
Scientist Query modes: By image itself By ROI By free text By
metadata tag Health Management Team Challenge: to allow for these
disparate user classes and groups to effectively make use of this
data, it will be necessary to leverage a common set of definitions
such that concept-based retrieval is always consistent and
complete
Slide 7
* A brief history of digital imaging, digital annotation,
markup and metadata encoding * Primer on image data encoding
concepts and requirements * Primer on Image storage data formats *
Primer on DICOM * Primer on image metadata concepts * Exploration
of OME as a representative image metadata lexicon * Case studies in
image metadata challenges, as created by a contemporary lack of a
single common framework
Slide 8
* Image information must be encoded into files in some standard
format in order to be correctly decoded * A key challenge arises
from the contemporary reality that there are literally hundreds of
image file formats, and even more formats of embedded metadata *
This reality is greatly compounded by the often presence of
proprietary data elements, whose meaning and syntax are difficult,
if not impossible, to elucidate without proprietary
documentation
Slide 9
* Key modules of a typical image file will include the
following: * Image Descriptive header used to describe the type,
geometry, color order and bit depth of the image * Additional
metadata header image creation date, capture conditions,
proprietary data elements * Image Payload the image itself (may be
more than one image) * Image Appended Data any type of image data
or metadata that needs to be added following image creation (e.g.
chain of custody information)
Slide 10
Header Data General and Proprietary Metadata Image Payload File
Start Optional Appended /Journal Data
Slide 11
Header Data General and Proprietary Metadata Image Payload
Header Data is typically utilized to represent the following
top-level concepts: Image Type Image Dimensions Image Bit Depth
Encoding Sequence Number of planes Alpha Channel detail Optional
Appended /Journal Data
Slide 12
Header Data Image Payload Image Metadata can include anything
beyond the usual image-level descriptors Acquisition conditions
Experimental conditions (in vivo / in vitro) Reagents and lot
numbers Observations /diagnoses Categorical data SNPs Variants
Small Molecules Computational findings Anything else Optional
Appended /Journal Data General and Proprietary Metadata
Slide 13
Header Data General and Proprietary Metadata The Image Payload
typically utilized to house the actual image data itself,
including: Individual color/fluorescence channels Alpha channels
Annotation channels Collaborative annotation channels Spatially
gated image markup (global markup is usually in the Metadata
section) Mask data Optional Appended /Journal Data Image
Payload
Slide 14
Header Data General and Proprietary Metadata Image Payload The
optional Appended / Journal Data section can house:
Chain-of-Custody events Post-acquisition image transformation and
normalization operations Logged data sharing events Post-capture
analytic information such as derived from multi-parametric analyses
and machine learning operations Optional Appended /Journal
Data
Slide 15
* A brief history of digital imaging, digital annotation,
markup and metadata encoding * Primer on image data encoding
concepts and requirements * Primer on Image storage data formats *
Primer on DICOM * Primer on image metadata concepts * Exploration
of OME as a representative image metadata lexicon * Case studies in
image metadata challenges, as created by a contemporary lack of a
single common framework
Slide 16
First Part of header
Slide 17
Second Part of header
Slide 18
Slide 19
Slide 20
From http://bigtiff.orghttp://bigtiff.org
Slide 21
Slide 22
Slide 23
* All with distinct Header, Metadata and Payload formats * Many
have proprietary data elements * Tools to cross-walk the images
exist but a universal metadata translator is not yet
available.
Slide 24
Slide 25
Slide 26
* A brief history of digital imaging, digital annotation,
markup and metadata encoding * Primer on image data encoding
concepts and requirements * Primer on Image storage data formats *
Primer on DICOM * Primer on image metadata concepts * Exploration
of OME as a representative image metadata lexicon * Case studies in
image metadata challenges, as created by a contemporary lack of a
single common framework
Slide 27
* Formal ISO Name: * ISO standard 12052:2006 "Health
informatics -- Digital imaging and communication in medicine
(DICOM) including workflow and data management" * Created as a
direct consequence of proprietary manufacture standards that arose
in the late 1970s as a result of new native-digital radiography
modalities (CT, MRI, US etc.) * A standard for handling, storing,
printing, and transmitting information in medical imaging * A
result of a longstanding partnership between the American College
of Radiology and the National Electrical Manufacturers Association
(ACR/NEMA) * Now in its third major version release - PS3
(recognizing that versions 1 and 2 were unsuccessful) * Essentially
a TCP/IP protocol (but more recently, also a storage format and
even a media storage specification)
Slide 28
* The standard is both a technical specification for the
encoding of image (and waveform data) and the encoding of any
additional metadata that supports the image data * Metadata is
divided into three classes * Mandatory: the element must be
included by the encoding instrument, in a format that is
constrained to the normative specification * User: the element is
optional and conforms to the vendor-provided normative
specification, that may or may not be proprietary. The format need
not be constrained to any particular normative specification, other
than the global requirement that it fits in the allotted allocation
space. * Conditional: the element may be required for inclusion, if
certain image acquisition conditions apply. If included, the
element must be encoded in a format that is constrained to the
normative specification * From the above list, if can be understood
that DICOM did not solve the problems of proprietary data elements;
rather, it provided a compromise by which manufacturers could still
encode some data elements in proprietary format, if they agreed to
encode a minimum essential set of data into a commonly agreed upon
framework and ontology as such, DICOM is a hybrid open/closed
standard
Slide 29
* Parts of the Standard * PS 3.1: Introduction and Overview *
PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service
Class Specifications * PS 3.5: Data Structure and Encoding * PS
3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network
Communication Support for Message Exchange * PS 3.9: Retired
(formerly Point-to-Point Communication Support for Message
Exchange) * PS 3.10: Media Storage and File Format for Media
Interchange * PS 3.11: Media Storage Application Profiles * PS
3.12: Media Formats and Physical Media for Media Interchange * PS
3.13: Retired (formerly Print Management Point-to-Point
Communication Support) * PS 3.14: Grayscale Standard Display
Function * PS 3.15: Security and System Management Profiles * PS
3.16: Content Mapping Resource * PS 3.17: Explanatory Information *
PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19:
Application Hosting * PS 3.20: Transformation of DICOM to and from
HL7 Standards
Slide 30
* Parts of the Standard * PS 3.1: Introduction and Overview *
PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service
Class Specifications * PS 3.5: Data Structure and Encoding * PS
3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network
Communication Support for Message Exchange * PS 3.9: Retired
(formerly Point-to-Point Communication Support for Message
Exchange) * PS 3.10: Media Storage and File Format for Media
Interchange * PS 3.11: Media Storage Application Profiles * PS
3.12: Media Formats and Physical Media for Media Interchange * PS
3.13: Retired (formerly Print Management Point-to-Point
Communication Support) * PS 3.14: Grayscale Standard Display
Function * PS 3.15: Security and System Management Profiles * PS
3.16: Content Mapping Resource * PS 3.17: Explanatory Information *
PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19:
Application Hosting * PS 3.20: Transformation of DICOM to and from
HL7 Standards
Slide 31
* Specification Document Structure * IODs Image Object
Definitions * The core of DICOM * Defines the storage specification
for each distinct storage modality * CT Computed Tomography * MR
Magnetic Resonance Imaging * US Ultrasound * VL Visible Light (Used
for endoscopy, ophthalmology and single field microscopy)
Slide 32
* Parts of the Standard * PS 3.1: Introduction and Overview *
PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service
Class Specifications * PS 3.5: Data Structure and Encoding * PS
3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network
Communication Support for Message Exchange * PS 3.9: Retired
(formerly Point-to-Point Communication Support for Message
Exchange) * PS 3.10: Media Storage and File Format for Media
Interchange * PS 3.11: Media Storage Application Profiles * PS
3.12: Media Formats and Physical Media for Media Interchange * PS
3.13: Retired (formerly Print Management Point-to-Point
Communication Support) * PS 3.14: Grayscale Standard Display
Function * PS 3.15: Security and System Management Profiles * PS
3.16: Content Mapping Resource * PS 3.17: Explanatory Information *
PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19:
Application Hosting * PS 3.20: Transformation of DICOM to and from
HL7 Standards
Slide 33
* Specification Document Structure * Data Structure and
Encoding / Service Classes * The core specification of DICOM
interoperability * Extremely technical and detailed * Learning
curve is steep
Slide 34
* Parts of the Standard * PS 3.1: Introduction and Overview *
PS 3.2: Conformance * PS 3.3: Information Object * PS 3.4: Service
Class Specifications * PS 3.5: Data Structure and Encoding * PS
3.6: Data Dictionary * PS 3.7: Message Exchange * PS 3.8: Network
Communication Support for Message Exchange * PS 3.9: Retired
(formerly Point-to-Point Communication Support for Message
Exchange) * PS 3.10: Media Storage and File Format for Media
Interchange * PS 3.11: Media Storage Application Profiles * PS
3.12: Media Formats and Physical Media for Media Interchange * PS
3.13: Retired (formerly Print Management Point-to-Point
Communication Support) * PS 3.14: Grayscale Standard Display
Function * PS 3.15: Security and System Management Profiles * PS
3.16: Content Mapping Resource * PS 3.17: Explanatory Information *
PS 3.18: Web Access to DICOM Persistent Objects (WADO) * PS 3.19:
Application Hosting * PS 3.20: Transformation of DICOM to and from
HL7 Standards
Slide 35
* Specification Document Structure * Data Dictionary *
Conceived in a pre-XML time * Fixed length binary word mapping
terms * Meshes with the overall philosophy of DICOM to be a
fixed-field data format * Will need significant updating to be
compatible with modern Ontologic framework concepts
Slide 36
* A brief history of digital imaging, digital annotation,
markup and metadata encoding * Primer on image data encoding
concepts and requirements * Primer on Image storage data formats *
Primer on DICOM * Primer on image metadata concepts * Exploration
of OME as a representative image metadata lexicon * Case studies in
image metadata challenges, as created by a contemporary lack of a
single common framework
Slide 37
* Metadata is, in essence, data about data with there being two
general classes: * Structural Metadata: the design and
specification of data structures (i.e. data about the containers of
data) * Descriptive Metadata: specification of individual instances
of application data (i.e. the data content itself) * Term coined in
1968 by Philip Bagley, in the text "Extension of programming
language concepts
Slide 38
* Key Concepts * Create an encoding system that is both human
and machine readable * Whenever possible, constrain concepts and
terms to a normative namespace * Unfortunately, namespaces are
either non-existent or reduplicated * Moreover, there are a
plurality of ways of representing namespaces
Slide 39
* Select an ontology representation model * Select the
plurality of namespaces to reference for images, from a vast field
of candidates * Identify domains where a suitable namespace does
not exist and build it, consortially, from the ground up * Curate
the construct, in perpetuity, recognizing that standards are
dynamic constructs All this is hard work
Slide 40
* A brief history of digital imaging, digital annotation,
markup and metadata encoding * Primer on image data encoding
concepts and requirements * Primer on Image storage data formats *
Primer on DICOM * Primer on image metadata concepts * Exploration
of OME as a representative image metadata lexicon * Case studies in
image metadata challenges, as created by a contemporary lack of a
single common framework
Slide 41
Slide 42
* Another possible starting point for ontology development
Slide 43
* A brief history of digital imaging, digital annotation,
markup and metadata encoding * Primer on image data encoding
concepts and requirements * Primer on Image storage data formats *
Primer on DICOM * Primer on image metadata concepts * Exploration
of OME as a representative image metadata lexicon * Case studies in
image metadata challenges, as created by a contemporary lack of a
single common framework
Slide 44
* An investigator seeks to automate the laser capture micro-
dissection (LCM) process by converting manual workflow to
computer-aided region-of-interest selected workflow * Upon
developing the image segmentation algorithms, the investigator
determines that laser cut maps used by the Acturis XT instrument
are derived from proprietary coordinate maps buried within the
multiple field-captured jpg images used by the platform. *
Effective turnkey integration will require reverse engineering of
the file format.
Slide 45
Slide 46
Slide 47
* What are possible mitigating strategies for this type of
hidden metadata? * How would an open ontology solve this problem? *
Is it reasonable for vendors of such platforms to contractually set
in place stipulations banning reverse- engineering of proprietary
file formats and if so, what remedies remain available to the
investigator?
Slide 48
* A whole slide imaging vendor makes a new high-throughput
scanner available at a substantial discount to a community surgical
pathology department, with the proviso (one of many, actually) that
said department will exclusively make use of that vendors
image/case viewer application. * The department agrees to the terms
of the contract and soon discovers that there is no programmatic
pathway by when non- proprietary images can be exported out of the
system, for consultative review by outside locations. * When
approached, the vendor indicates that a image file format
conversion software package is available, but the licensing model
is per-image, with no discount for volume.
Slide 49
* What measures could have prevented this interoperability
challenge, in the first place? * Are vendors legally able to
restrict the use of data comes off of their systems in proprietary
format? Who owns the data, anyway? * Are reverse-engineering
contractual limitations legally binding / enforceable?
Slide 50
* A histomorphology investigative team seeks to create a
consortial network of investigative partners that will use a common
image viewing / image analysis framework for distributed case
review. * A review of the contemporary offerings reveals that no
software package offers what is needed.
Slide 51
* What are possible mitigating actions the investigative team
can carry out to address this interoperability need? * What
standards can be brought to bear immediately to help address the
need? * What interoperability needs will remain, after the
deployment of a partial image format solution?