DATA SYSTEMS FOR SAMPLE- BASED OBSERVATIONS 1 Kerstin
Lehnert
Slide 2
2
Slide 3
Data from Samples Distributed data acquisition Different
labs/researchers analyze the same sample or subsamples of it.
Distributed data publication Different data for the same sample are
published in different papers. Distributed data archiving Data for
the same sample are kept in different data systems. Integrated data
access required to maximize utility. 3
Slide 4
Geochemical Data diverse hundreds of parameters thousands of
materials vary with space and time over a range of more than ten
orders of magnitude complex mostly sample-based with complex
relations among samples & subsamples distributed data
acquisition (one sample analyzed in different labs by different
researchers at different times) Idiosyncratic data acquisition
methods 4
Slide 5
Geoinformatics for Geochemistry DATABASES thematic geochemical
databases (PetDB, SedDB, VentDB) DATA REPOSITORY Geochemical
Resource Library REGISTRIES System for Earth Sample Registration
SESAR IEDA Data Publication Agent of the STD-DOI system (DataCite)
GeoPass: single sign-on authentication system DATA ACCESS &
ANALYSIS TOOLS GfG user interfaces EarthChem Data Engine (Portal)
5
Slide 6
EarthChem XML DB Metadata catalog datasets (original data &
derived products) GCDM DB GfG Architecture 6 USGS NAVDAT GEOROC
EarthChem Portal GfG Data Entry User Submission External Databases
Topical Data Collections Geochemical Resource Library
Slide 7
GeoChemical Data Model 7 observed value publicationdata source
method/DQ sample feature of interest collection, geospatial
analysis material preparation, obs. point
Slide 8
Metadata Geospatial Geographical coordinates Geographical names
Collection Sampling technique Field program Description & Age
Classification Texture Alteration Age Data Quality Technique
Instrument Laboratory Precision Reference material measurements
Correction procedures
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
Standards for Data Access & Integration WMS, WFS For
visualization tools OAI-PMH For joint data inventories EarthChemML
For integration across geochemical data systems For
interoperability with other systems 16
Slide 17
17
Slide 18
IEDA System-wide Inventory Inventory Expedition Metadata
Reference Metadata Dataset Metadata Geospatial Metadata RSS feed
MGDS SESAR EarthChem GRL Geochem DBs Object Registration Object
Metadata Object Registration Object Metadata Chemical Data Cruise
Info Chemical Data Cruise Info DOI Registration
Slide 19
EarthChem Portal 19 PetDB Others USGS GEORO C NAVDA T EarthChem
Data Engine Database XML EarthChem Data Engine Search &
Visualization Partner databases encode their data & metadata in
XML and send them to the EarthChem portal database in Kansas.
Queries submitted at the EarthChem portal search the contents of
the EarthChem Portal Database.
Slide 20
20
Slide 21
Access Levels
Slide 22
EarthChemML
Slide 23
EarthChem Repository: user submission need tools that are easy
to use and support the data flow from lab to publication ideally,
represent pipelines for data capture early in the data acquisition
process tools need to include data validation and DQC procedures
offer citable data publication need data policies 23
Slide 24
IEDA data publication service 24
Slide 25
STD-DOIs The STD-DOI metadata are mainly Dublin Core elements,
plus data specific elements. The metadata transmitted to the
National Library via web service (HTTP/SOAP) and incorporated into
the library catalogue. The metadata may contain references to other
objects (DOI, IGSN,...): Element isCited, isParent, isChild,
isDuplicate, 25
Slide 26
STD-DOIs The element can be used to point to other electronic
objects: Point to the literature where the data set is interpreted.
Point to samples, from which the data were derived. Point to other
datasets that belong to the same collection of datasets. These
links can be used by machines (e.g. data portals) to make search
suggestions and thus aid discovery of data, literature and samples,
or other added value services. 26
Slide 27
STD-DOI System Architecture
Slide 28
Data DOIs 28
Slide 29
Information Discovery Link to publication Citation of data IGSN
points to sample
Slide 30
The International GeoSample Number 30
Slide 31
Ambiguous Sample Naming Examples from the PetDB Database Sample
names are duplicated. Sample names are modified or changed. Sample
names are duplicated. Sample names are modified or changed.
Slide 32
Provides & manages unique identifiers for samples IGSN -
International Geo Sample Number Assigned upon registration of
sample metadata Catalogs & archives sample metadata Access to
sample metadata via web site & web services Long-term
preservation of metadata Link to sample archives Facilitates links
to data IGSN will be incorporated into persistent resolvable
GUIDs
Slide 33
IGSN:SIO8JH3M4 International GeoSample Number A Global Unique
Identifier for Earth Samples Strict syntax (9 digits, alphanumeric)
First three characters are unique user code (registered with SESAR)
Last 6 characters are random numbers + letters Allows 2,176,782,336
sample identifiers per registrant Does not replace personal or
institutional names. Applied to samples & sub-samples system
tracks relations 33 www.geosamples.org Name space
Slide 34
Geoinformatics for Geochemistry Core Core Section 1 Core
Section 3 Core Section 2 Sample 1 Sample 2 Sample 1 Sample 2 Sample
3 Sample 1 Sample 2 Sample 3 Rock powder Mineral conc. Leachate
Fossil separate Microprobe mount Parent Child Parent IGSN:XXX000120
IGSN:XXX0065B3 IGSN:XXX9K23G6 IGSN:XXX07ST4K IGSN:XYZ0G693M
IGSN:ABC0L98SW IGSN:ABC0L53NW IGSN:ABC0L653X IGSN:ABC078HGB
Slide 35
Sample Types Sampling events such as holes, cores, dredges,
stratigraphic sections Individual samples: specimens rocks,
minerals, fossils, fluid samples, precipitates, synthetic material,
etc. Sub-samples of any of above: processed samples such as mineral
or fossil separates, leachates, thin sections, etc.
Slide 36
Sample Registration Spreadsheet forms for batch loading
Interoperability (web services) Interoperability SESAR Web
Site
Slide 37
Implementation Challenges Diversity of users Large sampling
campaigns (IODP, ICDP, ECS) Repositories Data systems Individual
investigators Diversity of sample types Integration into existing
policies, procedures, data systems International scope Connectivity
in the field 37
Slide 38
Solutions Schema improvements Web-service based registration
from client data systems Distributed system of registration nodes
(Trusted Agents) Handle service for IGSNs (persistent, resolvable)
http://dx.doi.org/18.2539/IGSN.SIO001234
http://dx.doi.org/18.2539/IGSN.SIO001234 Tools to facilitate
registration iSESAR (registration via iPhone) eCollections
(personal sample management) webCollections (hosting services for
repositories) IGSN International Consortium 38