Upload
safe-software
View
297
Download
0
Tags:
Embed Size (px)
Citation preview
A brief history in TimeSeriesdata at Environment CanadaJames DoyleProject Manager
&Christopher ThorneGeomatics Data Analyst
Environment Canada’s Data Management Program (2011 –Present)
Projects:
1. Data Governance and Architecture(Data Stewardship Model & Standards)
2. Data Catalogue (supporting Open Data and Federal Geospatial Platform)
3. Data Access and Sharing
4. Data Consolidation
5. Data Integration
EC Subject Area Model
Hunting for a standard -XML Architecture
North American Profile of ISO 19115
(ISO/TS 19139) GeographyMarkupLanguage 3.2(ISO 19136)
Observations andMeasurements 2.0 (ISO 19156)
SWECommon Data Model
2.0
WaterML 2.0Part 1- Timeseries
TimeSeriesML
• WMO/NOAA and EC want WaterML 2.0 Part 1 rebranded
• IMD is participating in the OGC TimeSeriesML SWG
COMP Logical Data ModelProvides a simple, stable, logical layer used for:
User interfaces Data resource modularization
Common Observation and Measurement ProfileA common XML exchange profile for time series data that is 100% compliant with the OGC international standards:
wml2: WaterML 2.0 Part 1 – Timeseries om: Observations & Measurements swe: Sensor Web Enablement Common Data Model gml: Geography Markup Language
What does the standard look like?The Anatomy of COMP
What does COMP offer EC and its partners?XML Data Exchange
COMP ViewerWhen you open an online COMP XML file in your browser, the Viewer tracks down all the external references and presents you with a complete picture of the metadata and data as an HTML report in the official language of your choice – with outlining for easy navigation
COMP Data Point UtilitiesTo extract data values into tabular formats for consumption by your analytical software
Value Added Tools
GIS Mapping Data Visualization
A common XML exchange profile for time series data that is 100% compliant with OGC international standards – no local extensions
COMP XMLThis XML fragment references a name and a unit of measure in SKOS taxonomies
SKOS TaxonomiesDefine these terms in English and French
COMP ViewerLooks up these SKOS references and resolves them in English and French
en-CAfr-CA
Simple Knowledge Organization System
EC ISO-NAP NAtChem (Air Quality) Substance Unit of Measure WaterML2 Species Bio-organism Water Quality Water Quantity Meteorology Ice Service Wild Life Service ?
Example of COMP Use of SKOS Taxonomies
COMP in Actionhttp://www.ec.gc.ca/data_donnees/compCOMP XML File
When the user clicks on the file, it asks the browser to render the XML using the COMP Viewer instead of its own default XSLT script
(See 2nd line of syntax)
2
The Download Service pipes back the output to the browser invoking its standard file download facilities
4
Selecting a download optioninvokes the Download Service
3
DownloadService
Data PointExtraction Scripts
COMP Viewer XSLT
COMP Files SKOS Taxonomies
1Browser uses COMP XSLT
Setting up Pilot Project
What EC Monitoring Program will be our guinea pig?
Weather Monitoring
Water Quality & Availability Monitoring
Air Quality Monitoring
Emissions Sources (Air, Water, Land)
Species & Habitat Monitoring
…etc.
Pick me
Pick me!
Selecting Program Observations (Input Dataset)
Data Input:
The National Atmospheric Chemistry Database (NatChem) NARSTO Quality Science Center of the U.S. Oak Ridge Laboratory.
Accessory COMP specific XLS data entry templates For data not found or not easily accessible within source data.
Output Data:
OGC WaterML2.0- Time series (XML) Observations Data linking to Reference Master Data
Reference Data: monitoring site, instrument procedures, parameters (data types), bilingual terms & look up lists.
Who is going to migrate the data?
“No problem, Chris will do it!”
(Correction: Chris + FME )
What does the Input Data Look Like?
Natchem holds 100s of these NARSTO files: organized by study or monitoring network across Canada (+100’s
sites)
~35 years of data at each location/region
~500 instrument and sampling measurement procedures
Time Series logged data can be in - days, hours, or minute
NARSTO files are (TXT/CSV)
With some (not complete) accessory Program Reference Data (CSV, XLS)
All stored on a file share drive.
Input file – header info*
NA
RS
TO
Varia
ble
s
Contacts
File Description/Name
File Abstract / Versioning Info
n…
.File Begins
Input file – Monitoring site information
Site Location(s)
Table Schema/Metadata(uom)
….
n…
.
Table Info
Input file – Observation data & metadata
Time Series by Site Observations(data point records)
Table Schema/Metadata
Observation Table Name & Notes
….
Column Metadatainstrument/sampling procedures
Project Planning
ETL
Data RequestBy timeBy locationBy substance…
Web Services(controlled user driven quality data products )
(centralization & cleaning within DB)
Master Data Recast
(conversion & migration transactions)
Reporting- Data Profiling- QA/QC - Internal Business needs- Data Process logs
Resource Intensity(time & resource)
Quality of Data
Task Breakdown
1. Parse NARSTO formatted csv sources and load into MS SQL Database.
2. Reference Data
i. Develop data profiling & reporting methods to QA/QC the reference data between submitted observation files.
ii. Centralize Program data master reference data for – bilingual definitions, contacts, sites, variables (procedures), and observations.
iii. Data mapping of reference data to OGC WaterML2.0– convert, store, and publishing processes.
3. Time Series Data
1. Create physical data model within MSSQL for storage and also for the TimeSeries XML output.
2. Join/Link reference data to 34 years of observations (semantic web relationships).
3. Produce, validate & publish to online COMP viewer
Data Publication System Architecture
ETL
Data RequestCOMP Viewer& Conversion
Web Services(controlled user driven quality data products )
(prepping & cleaning within DB)
Master Data Recast
(conversion & migration transactions)
Reporting- Data Profiling- QA/QC- Internal Business needs- Data Process Logs
COMP WaterML2.0
XML
Data Sources
…n
XLS
COMP Templates
(data entry)
+
SME(NatChem)
NARSTO Parser using FME
Reader: TEXTLINE (Line by line)
Transformer:
StringSearcher, StringReplacers, AttributeSplitter, ListExploder, ListSearcher, AttributeTrimmer, AttributeRemover. NARSTOFileMetadata (custom)
Writer: MSSQL tables ->
File header, observation, site, lookup tables NARSTO information
FME workbench was HUGE!
Mostly due to the complexity of the NARSTO custom structure.
Using Lists were my friend.
Able to preform batch import on folders!
Once Run able to Query and validate across files within MSSQL
Database System View
CSV
1. NARSTO Files
2. Query data content across imported txt files.
3. Create TABLES: sites, & observations
B. List Values Parsing/ Table Schema Extraction
A. Custom File Parser & Batch File Importer
Create TABLES: sites, file header, variable name, lookup tables, & observations
C. Create QA/QC Tables, (reports)
4. Data Consolidation & Assessment
D. Data ValueConsolidation & Assessment E. Reference
Data Creation
6b. Join References files
5b. Upload Reference COMP Templates (terms, contacts)
6a. Join Reference Value
F. Build & Map XML
7. Store XML
G. Publish XML to Website
5a. Clean Reference Values
8. COMP Viewer
XMLFINISH
START
Data Quality Feedback Loop…n
ETL
Data RequestCOMP Viewer(Internal)
Web Services(controlled user driven quality data products )
(prepping & cleaning within DB)
Master Data Recast
(conversion & migration transactions)
Reporting- Data Profiling- QA/QC - Internal Business needs
COMP WaterML2.0
XML
Data Sources
XLS
…n
COMP Templates
(data entry)
Program - QA/QC
+
SME
Data Quality Improvement process feedback loop…
Remember This?
COMP Logical Model(WaterML2.0)
Mapping Tables to WaterML and store.
n….
Semantic Web Data Uniform Resource Identifiers (URIs):
<om:name
xlink:href="../def/natchem/1-0/natchem-skos.rdf#ObservationType"
xlink:title="Category Parameter"
owns="false"
xlink:type="simple"
/>
Links to semantic values:
</skos:Concept>
<skos:Concept rdf:about="http://intranet.ec.gc.ca/donnees-data/comp/def/natchem/1-0/natchem-skos.rdf#ObservationType">
<skos:prefLabel xml:lang="en-CA">Observation type</skos:prefLabel>
<skos:prefLabel xml:lang="fr-CA">Type d'observation</skos:prefLabel>
<skos:inScheme rdf:resource="http://intranet.ec.gc.ca/donnees-data/comp/def/natchem/1-0/natchem-skos.rdf" />
</skos:Concept>
Unexpected Challenges: Converting Tabular Values to Semantic Web Data
Due to the source data complexity and huge volumes of descriptive reference data the transformations required:
Lots of StringSearchers & StingReplacer of the tabular values with the URI reference location on the web.
Lots of FeatureMergers (>100) due to source data complexity.
With Semantic Web Values have to deal with relative vs. absolute URI paths.
Where do all these values go within WaterML2.0 logical components? XMLTemplater – was a big help!
Across many workbenches (~20- fmw).
Overall lots of time, effort reworking of the data, transformations and facilitation with program to ensure quality over ~6 months of effort.
Using FME Benefits
FME Workspace transformation diagram helps communicate areas of improvement required back to data owners.
Similar to a Data Model Diagram, Can demonstrate the data transformation complexes and issues
Once Workbenches are set up. Enabling Programs to run the FME Workbenches as new or updated data comes.
Improved overall data quality management and reporting.
Supports all of data consumers needs of air quality data, now and in the future.
Next Steps…
API
WFS Service
Query
ResponseCOMP XML PayloadAudience
EC GOC International
Built-in Functionality COMP Viewer Data Point
Downloads
Data Warehouse
Query Dimensions
Temporal extent Spatial extent Sites Variables Techniques
Indexed SQL Tables
XML-Relational Hybrid
Query-specific Collections of COMP components
are assembled on-the-flyfor the API
XML CLOBs
Pointing to
FME Server
Thank You!
Questions?
For more information:
James Doyle - [email protected]
Christopher Thorne – [email protected]