Upload
rui-vieira
View
160
Download
4
Tags:
Embed Size (px)
Citation preview
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 1
Linked Open Data
State of the art, challenges and applications
Part of the Linking Open (LOD) Data Project Cloud Diagram
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 2
Open Data
● What is Open Data?
– Non-proprietary and standard format
– Machine computable
– Non-discriminatory
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 3
What is Linked Open Data● Open Data Distributed over the network● Standardised format
– Resource Description Format (RDF) as data model
– SPARQL for queries● RDF “Triples” serialised as
– XML, N3 or many other standards● Semantics and Ontologies● Architecture similar to WWW
– Hypertext Transfer Protocol (HTTP)
– Domain Name System (DNS)
– Uniform Resource Identifiers (URIs)
@prefix a:<http://example.com/2013/02/A#> .@prefix b:<http://example.net/2013/02/B#> .@prefix eg:<http://example.com/> .
eg:sharedContenta:externalContent <http://somewhere/RDF/ont.owl>
N3 RDF example
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 4
NewcastleCity Council
Pollution data City: Newcastle Country: UK Values: … Units: Measurements: … Compound: …
Linked Open Data
UK government
World HealthOrganisation
Government data City: Newcastle Population: … Lat: … Long: …
Exposure levels Compound: Max Levels: … (more data)
HTTP
HTTP
Data consumer
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 5
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 6
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 7
Issues● Data quality
– Syntax error, invalid semantics● Scalability
– Unavailability, unresponsiveness● Security
– Trust, Encryption ● Currency
– Temporal context, versioning● Aggregation
– Co-referencing, source
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 8
Issues● Data quality
– Syntax error, invalid semantics● Scalability
– Unavailability, unresponsiveness● Security
– Trust, Encryption ● Currency
– Temporal context, versioning● Aggregation
– Co-referencing, source
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 9
Data Quality
● Tim Berners-Lee LOD principles ● Syntactically valid
– Several validators● RDFAlerts, Vapour
– Frameworks to build own tools
● Redland RDF (C library to build RDF parsers)
– Allow to gather metrics on conformance
● Dataset scoring
validator
Data
metrics
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 10
Data Quality
● Semantically valid
– Use standard vocabularies and ontologies when
possible
– Use Domain specific vocabularies:
● eg SensorML for data-in-motion
● Vast amount for medicine, bioinformatics, etc.
● Linked Open Vocabularies database
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 11
Scalability● Unavailable or unresponsive providers
– Non-dereferenceable resources is missing data
– LOD uses HTTP and WS● HTTP status codes to convey more information
– Redirect to other resources
● Caching
● Dataset dumps
– Use analytics to provide useful information● Profile servers
● Tune servers
● Plan scalability
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 12
Security● Trust
– Community scoring of datasets and providers
– Provenance● Watermarking
– All attributes can be in meta-data● Encrypted channels
– SSL, certificates to ensure provenance
consumerTrustedprovider
Trusted?
Trusted?
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 13
Currency
● Temporal context for data
– Facebook hits/day: average? Specific day?● Is data versioned?
– Are we mixing old and new data?● Solutions:
– Currency meta-data
● At the statement level, not just dataset level
– Specific ontologies for current (OntoCurrency)
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 14
Aggregation● Co-reference
– Discovering co-referents
– Resolution of co-referents● Multiple sources
– Discovery
– SPARQL over remote datasets● Software solutions
– Semantic Web Client Library● Handles dereferencing and aggregation
– DARQ● Multi-source SPARQL queries
example.net example.com
1203 A
P
consumer
Person A?
http://example.com/People/A
http://example.net/2013/Staff/1203
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 15
Applications
● Open Government● The Web Of Things
– Massive real-time structured sensor data
● The DataHub● Data.gov● PublicData.eu● Association mining● Intelligent recommendation systems
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 16
Conclusion
● LOD is becoming a preferred solution for data providers
● Obstacles to the global, machine computable, linked
Semantic Web
● No integrated solution dealing with all the issues
● But solutions exist
– To deal with specific areas
– To build new tools
15/03/2013 Research Skills, MSc ITEC Rui M. Vieira 17
Questions ?
slide:question
en:noun
ens:part
slide:enquiryowl:sameAs
speech:property
rdfs:type