Upload
jean-paul-calbimonte
View
372
Download
0
Embed Size (px)
Citation preview
Toward Semantic Sensor Data
Archives on the WebJean-Paul Calbimonte – Karl Aberer
LSIR EPFL
MEPDAW, ESWC
Heraklion, Greece. June 2016
@jpcik
2
Sensor Data on the Web
http://mesowest.utah.edu/http://earthquake.usgs.gov/earthquakes/feed/v1.0/http://swiss-experiment.ch
• Monitoring • Alerts • Notifications• Hourly/daily updates
• Myriad of Formats• Ad-hoc access points• Informal description• Convention-semantics• Uneven use of standards• Manual exploration
Sensor Archives: Challenges
3
Discoverability: • Subject of sensing identified and searchable. • Explicit semantics on the sensor metadata • Common understanding of the objects of sensing• Agreed models e.g. ontologies
Storage: • Persistence not always required. • Sensor data is (sometimes) consumed live • Aggregations stored permanently. • Different archival options available• Reduce volume as much as possible, using compressed formats• Querying and transactional requirements often less critical • Silos of sensor data in the form of compressed files. • Replication or backup
Sensor Archives: Challenges
4
Reusability: • Reusing the data for other purposes • Compare data from another locations• Use for calibration purposes • Finding correlations. • Historical and batch analysis • Benchmarking • Training datasets for mining algorithms. • Feed numerical models
Accessibility: • Data access through APIs • Consumption from people/software applications.• De-referenceable URIs • Simple but effective retrieval of sensor data. • SPARQL -> selecting relevant parts of the data• Complex queries not always required • Simple time interval and filters just enough
Interoperability & Standardization. • RDF/SPARQ: building block for publishing
data,• Specific ontologies and vocabularies,
such as the SSN ontology• Represent both sensor metadata, and
observations.
Sensor Data & Linked Data
5
Zip Files
Number of Triples
Example: Nevada dataset-7.86GB in n-triples format-248MB zipped
An example: Linked Sensor Data
http://wiki.knoesis.org/index.php/LinkedSensorData
Sensor Data & Linked Data
6
<http://knoesis.wright.edu/ssw/MeasureData_Precipitation_4UT01_2003_3_31_5_10_00> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#MeasureData> .<http://knoesis.wright.edu/ssw/MeasureData_Precipitation_4UT01_2003_3_31_5_10_00> <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#floatValue> "30.0"^^<http://www.w3.org/2001/XMLSchema#float> .<http://knoesis.wright.edu/ssw/MeasureData_Precipitation_4UT01_2003_3_31_5_10_00> <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#uom> <http://knoesis.wright.edu/ssw/ont/weather.owl#centimeters> .<http://knoesis.wright.edu/ssw/Observation_Precipitation_4UT01_2003_3_31_5_10_00> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://knoesis.wright.edu/ssw/ont/weather.owl#PrecipitationObservation> .<http://knoesis.wright.edu/ssw/Observation_Precipitation_4UT01_2003_3_31_5_10_00> <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#observedProperty> <http://knoesis.wright.edu/ssw/ont/weather.owl#_Precipitation> .<http://knoesis.wright.edu/ssw/Observation_Precipitation_4UT01_2003_3_31_5_10_00> <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#procedure> <http://knoesis.wright.edu/ssw/System_4UT01> .<http://knoesis.wright.edu/ssw/Observation_Precipitation_4UT01_2003_3_31_5_10_00> <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#samplingTime> <http://knoesis.wright.edu/ssw/Instant_2003_3_31_5_10_00> . <http://knoesis.wright.edu/ssw/Instant_2003_3_31_5_10_00> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2006/time#Instant> .<http://knoesis.wright.edu/ssw/Instant_2003_3_31_5_10_00> <http://www.w3.org/2006/time#inXSDDateTime> "2003-03-31T05:10:00-07:00^^http://www.w3.org/2001/XMLSchema#dateTime" .
What do we get in these datasets?
Nice triples
Do we care about all the rest?
What is measured?
MeasurementUnit
Sensor
When is it measured
Semantic Sensor Data Archives
7
How to address these challenges?
Discoverability
Reusability
Accessibility
Interoperability & Standardization
Storage
How to use existing Semantic Web technologies appropriately?Need for new standards and techniques?
Localization: GNSS fusioned with odometry
GPRS
• packet parser• system logging• database server• GPS interpolation• advanced filtering• fault detection• system health monitor• automatic reporting
10 b
uses
in L
ausa
nne
CO, NO2, O3, CO2, UFP, temperature, humidity
OpenSense2 @ Lausanne
8
Reference station
Crowd sensing
Public transportation
Raw Data Acquisition
Air Pollutants Time Series
Temporal Spatial
Aggregations
Pollution Maps Pollution Models Air Quality recommendation
s
Health Studies
Air Quality Products &
Applications
From Sensing to Actionable Data
9
Running example for discussing a Semantic Sensor Data Archive
An Architecture for a Sensor Archive
10Disclaimer: Work in Progress
• RDF for Sensor and Catalog metadata• Native format for Sensor observations (time series)• CSV archive for sensor observations• RDF-unpack of CSV archived data• Mappings for Native format-to-RDF live transofrmation
Data characteristics
Sensor data characteristics
11
Sensor data regularity• Raw sensor data typically collected as time series• Very regular structure. • Patterns can be exploited
E.g. mobile NO2 sensor readings
29-02-2016T16:41:24,47,369,46.52104,6.6357929-02-2016T16:41:34,47,358,46.52344,6.6359529-02-2016T16:41:44,47,354,46.52632,6.6363429-02-2016T16:41:54,47,355,46.52684,6.63729...
Sensor data order• Order of sensor data is crucial • Time is the key attribute for establishing an order among the data items. • Important for indexing • Enables efficient time-based selection, filtering and windowing
Timestamp Sensor Observed Value
Coordinates
Sensor Dataset Metadata
13
:sensorCatalog a dcat:Catalog ; dct:title "OpenSense data catalog" ; dct:language iso639-1:en ; dct:publisher :LSIR-EPFL ; foaf:homepage <http://opensense.epfl.ch/data/> ; dcat:dataset :geo-osanm, :geo-osfpm , :geo-oso3m.
:geo-osanm-csv a dcat:Distribution ; dcat:downloadURL <http://opensense.epfl.ch/data/api/sensors/geo_osanm>; dct:title "CSV distribution of NO2 measurements"; dcat:mediaType "text/csv"; dcat:byteSize "5534530"^^xsd:decimal .
• Dataset distribution: different accessible formats• Multiple distributions for the same dataset
Using DCAT• W3C Recommendation• Organizing Sensor archive
in datasets
Sensor Dataset Metadata
14
:geo-osanm a dcat:Dataset; dct:title "OpenSense NO2 measurements"; dcat:theme :NO2; dct:issued "2015-12-05"^^xsd:date; dct:temporal g-interval:1977-11-01T12:22:45/P1Y; dct:spatial <http://www.geonames.org/6695072>; dct:publisher :LSIR-EPFL; dct:accrualPeriodicity sdmx:freq-W; ssn:isProducedBy :NO2VsensorBox; dcat:distribution :geo-osanm-csv .
:NO2VsensorBox a ssn:Sensor; rdfs:label "NO2 Virtual Sensor Lausanne"; ssn:observes :NO2; ssn:hasMeasurementCapability [ a ssn:Accuracy; ssn:forProperty :NO2; ssn:inCondition ... ; ssn:hasValue ... ] .
Using DCAT + SSN• W3C Recommendation• Dataset description• Sensor description
• Observed property• Feature of interest• Accuracy• Measurement
Capabilities• Location, extension,
context
Semantic Sensor Network Ontology
16
ssn:Sensor
ssn:Platform
ssn:FeatureOfInterest
ssn:Deployment
ssn:Property
cf-prop:air_temperature
ssn:observes
ssn:onPlatform
dul:Placedul:hasLocation
ssn:SensingDevicessn:inDeployment
ssn:MeasurementCapability
ssn:MeasurementProperty
geo:lat, geo:lngxsd:double
ssn:hasMeasurementProperty
ssn:Accuracy
ssn:ofFeature
aws:TemperatureSensor
aws:Thermistor
ssn:Latency
dim:Temperature
qu:QuantityKind
cf-prop:soil_temperature
cf-feat:Wind
cf-feat:Surface
cf-feat:Medium
cf-feat:aircf-feat:soil
dim:VelocityOrSpeed cf-prop:wind_speedcf-prop:rainfall_rate
aws:CapacitiveBead …
…
…
Sensor Observations
17
:no2obs1 a :NO2Observation ; ssn:observedProperty :NO2 ; ssn:featureOfInterest aq:AirMedium ; ssn:observedBy :NO2SensorBox ; ssn:observationResult :no2obs1result ; ssn:observationResultTime :instant_20160331232000 .
:no2obs1result a :NO2ObservationValue ; qu:numericalValue "345.00"^^xsd:float ; qu:unit :ppm .
:instant_20160331232000 a time:Instant ; time:inXSDDateTime "2016-03-31T23:20:00"^^xsd:datetime .
Type of Measurement
Sensor
Observed Value
Unit
Generated only on demand through mappings
R2RML Mappings
18
:ObsValueMap rr:subjectMap [ rr:template "http://opensense.epfl.ch/data/ObsResult_NO2_{sensor}_{time}"]; rr:predicateObjectMap [ rr:predicate qu:numericalValue; rr:objectMap [ rr:column "no2"; rr:datatype xsd:float; ]];
rr:predicateObjectMap [ rr:predicate obs:uom; rr:objectMap [ rr:parentTriplesMap :UnitMap; ]].
:ObservationMap rr:subjectMap [ rr:template "http://opensense.epfl.ch/data/Obs_NO2_{sensor}_{time}"]; rr:predicateObjectMap [ rr:predicate ssn:observedProperty; rr:objectMap [ rr:constant opensense:NO2]];
URI of subject
URI of predicate
Object: colum name
Column names in a template
Can be used for mapping both databases and CSVs
Discussion: Preliminary Experimentation
19
E.g. comparing with ERI: RDF data compression: what is the size and how long it takes?
Live filtering: how much do we wait to get the data?
CSV on the Web Standards
20
{ "@context": ["http://www.w3.org/ns/csvw", ... ], "tableSchema": { "columns": [ { "name": "no2", "titles": "NO2 concentration", "aboutUrl": "ObsResult_NO2_{sensor}_{time}", "propertyUrl": "qu:numericalValue", { "name": "sensor", "titles": "Bus sensor", "aboutUrl": "Obs_NO2_{sensor}_{time}", "propertyUrl": "ssn:observedBy", "valueUrl": "Sensor_{sensor}” }, { "name": "obsProperty", "virtual": true, "aboutUrl": "Obs_NO2_{sensor}_{time}", "propertyUrl": "ssn:observedProperty", "valueUrl": "opensense:NO2”} ]}
http://www.w3.org/TR/csv2rdf/
URI of subject
Predicate
URI Value
Convenient alternative to R2RML mappings?
Constant URI