1
An Example in The DCO Data Portal Semantic Specification of Data Type Information in the Deep Carbon Observatory Data Portal Xiaogang (Marshall) Ma ([email protected]), John Erickson, Patrick West, Stephan Zednik, Peter Fox Tetherless World Constellation, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, USA (Image credit: Ainsley Seago, PLoS Biology) Background Data types are often treated only as syntax of variables, such as integer, float, boolean, character, and string, etc. Such declaration does not offer any domain specific meaning to the data types. Our intention is to let a data type include more meanings, such as who create the data type, the source standard that the data type derives from, the operations that can be done on datasets of that data type, and typical scientific domains, software programs and/or instruments that use the data type. Initial results have already been achieved in the Deep Carbon Observatory (DCO) Data Portal (http://info.deepcarbon.net). Nature of Efforts A registered DCO dataset is asserted as an instance of a BASIC DATA TYPE, such as Dataset, Image, Video, and Audio, etc. It is possible to further annotate a registered dataset with the SPECIFIC DATA TYPES defined by the DCO community members. Our Aim Any humans or machines facing a data type can quickly understand or be in a situation to at least process details within the dataset without even downloading it. Initial Results Updates to the DCO Ontology: A new class dco:DataType. Each specific data type is an instance of it An object property dco:hasDataType linking a dataset and a data type A collection of other classes and properties associated with dco:DataType Scan to get a copy of the poster: Each registered object, such as a dataset or a data type, has a unique identifier called DCO ID, which is similar to the DOI for a journal paper. (Images credit: deepcarbon.net and X. Ma) Geospatial/geotemporal: country, latitude, longitude, elevation Geologic context: rock types/mineralogy, age, structure/tectonic, depth Field Geochemical: P, T, fluid comp. (inorganic, organic), pH, Eh, EC, biomarkers, gases, isotopes, sampling protocols, sample storage, sample archiving and tracking, time series results Analytical: measurement type, sample preparation, instrument type, instrument conditions, accuracy, precision, error propagation Bench Geochemical: P, T, fluid comp. (inorganic, organic), pH, Eh, EC, biomarkers, gases, sampling protocols, sample storage, sample archiving, isotopes Biochemical: microbial inventory, DNA sequencing [data links to DL], substrates Monitoring: time series, sensor data recovery, resolution (signal/noise) – link to R&F Modeling: empirical, canned codes (e.g. EQ3/EQ6; Chiller, GWB), MD Kinetics: dynamics of chemical deep carbon processes; field-based versus laboratory-base Thermodynamics: equation of state of carbon-bearing systems; link to robust data sets identified in EPC Surface and interface science, catalysis: solid-fluid interactions under extreme conditions … … Future Works More use case analyses relevant to data types in the DCO community Refine the schema for the annotation and provenance of specific data types Interoperability between DCO specific data types and data types registered in other communities A separate ontology for data type? WHY Should You Care? Data types make aspects of data more visible Data types group data sets with similar characteristics Data types will help you find data sets matching your needs Data types enable machines to find tools and algorithms for specific datasets More features in an ‘inter-linked world’…

Semantic Specification of Data Type Information in the Deep …tw.rpi.edu/media/2016/02/27/e96f/AGU2015_XMa_poster.pdf · An Example in The DCO Data Portal Semantic Specification

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantic Specification of Data Type Information in the Deep …tw.rpi.edu/media/2016/02/27/e96f/AGU2015_XMa_poster.pdf · An Example in The DCO Data Portal Semantic Specification

An Example in The DCO Data Portal

Semantic Specification of Data Type Information in the Deep Carbon Observatory Data Portal

Xiaogang (Marshall) Ma ([email protected]), John Erickson, Patrick West, Stephan Zednik, Peter Fox

Tetherless World Constellation, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, USA

(Image credit: Ainsley Seago, PLoS Biology)

Background • Data types are often treated only as syntax

of variables, such as integer, float, boolean, character, and string, etc. Such declaration does not offer any domain specific meaning to the data types.

• Our intention is to let a data type include more meanings, such as who create the data type, the source standard that the data type derives from, the operations that can be done on datasets of that data type, and typical scientific domains, software programs and/or instruments that use the data type.

• Initial results have already been achieved in the Deep Carbon Observatory (DCO) Data Portal (http://info.deepcarbon.net).

Nature of Efforts • A registered DCO dataset is asserted as an

instance of a BASIC DATA TYPE, such as Dataset, Image, Video, and Audio, etc.

• It is possible to further annotate a registered dataset with the SPECIFIC DATA TYPES defined by the DCO community members.

Our Aim Any humans or machines facing a data type can quickly understand or be in a situation to at least process details within the dataset without even downloading it.

Initial Results Updates to the DCO Ontology: • A new class dco:DataType. Each specific data type is an instance of it • An object property dco:hasDataType linking a dataset and a data type • A collection of other classes and properties associated with dco:DataType

Scan to get a copy of the poster:

Each registered object, such as a dataset or a

data type, has a unique identifier called DCO ID,

which is similar to the DOI for a journal paper.

(Images credit: deepcarbon.net and X. Ma)

Geospatial/geotemporal: country, latitude, longitude, elevation

Geologic context: rock types/mineralogy, age, structure/tectonic, depth

Field Geochemical: P, T, fluid comp. (inorganic, organic), pH, Eh, EC, biomarkers, gases, isotopes, sampling protocols, sample storage, sample archiving and tracking, time series results

Analytical: measurement type, sample preparation, instrument type, instrument conditions, accuracy, precision, error propagation

Bench Geochemical: P, T, fluid comp. (inorganic, organic), pH, Eh, EC, biomarkers, gases, sampling protocols, sample

storage, sample archiving, isotopes

Biochemical: microbial inventory, DNA sequencing [data links to DL], substrates

Monitoring: time series, sensor data recovery, resolution (signal/noise) – link to R&F

Modeling: empirical, canned codes (e.g. EQ3/EQ6; Chiller, GWB), MD

Kinetics: dynamics of chemical deep carbon processes; field-based versus laboratory-base

Thermodynamics: equation of state of carbon-bearing systems; link to robust data sets identified in EPC

Surface and interface science, catalysis: solid-fluid interactions under extreme conditions

… …

Future Works • More use case analyses

relevant to data types in the DCO community

• Refine the schema for the annotation and provenance of specific data types

• Interoperability between DCO specific data types and data types registered in other communities

• A separate ontology for data type?

WHY Should You Care? • Data types make aspects of data more

visible • Data types group data sets with similar

characteristics • Data types will help you find data sets

matching your needs • Data types enable machines to find

tools and algorithms for specific datasets

• More features in an ‘inter-linked world’…