Upload
audrey-cunningham
View
212
Download
0
Embed Size (px)
Citation preview
Data
In
ten
siv
e T
ech
niq
ues t
o B
oost
the R
eal-
tim
e
Perf
orm
an
ce o
f G
lob
al A
gri
cu
ltu
ral D
ata
In
frastr
uctu
res
SEMAGROWUSING A POWDER TRIPLE STORE FOR BOOSTING THE REAL-TIME PERFORMANCE OF GLOBAL AGRICULTURAL DATA INFRASTRUCTURES
KREAM 20135 June 2013
Pythagoras KarampiperisNational Centre for Scientific Research
“Demokritos”
KREAM 2013
Outline
5 June 2013
2/15
Introduction / Problem Statement
The SemaGrow Solution
The POWDER W3C Recommendation
SemaGrow Architecture
The SemaGrow Stack
SemaGrow Maintenance Components
Moving Forward with “Old” Technologies3/15
KREAM 2013 5 June 2013
HARVESTER
OAI-PMH Service Provider #1
Schema #1
OAI-PMH Service Provider #n
Schema #n
INDEXER
AggregatedXML Repository
Web Portals
Open AGRIS (FAO)AgLR/GLN (ARIADNE)Organic.Edunet (UAH)
VOA3R (UAH)...
AGRIS AP Schema
IEEE LOM Schema
DC Schema
...
RDF Triple Store
Common Schema
SPARQL endpoint(Data Source #1)
SPARQL endpoint(Data Source #n)
INDEXER
Web Portals
SPARQL endpoint
NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES
How Many?
BigData Problem!
Is it feasible?
KREAM 2013
What Semantic Web can bring into the picture
4/15
5 June 2013
Query
Federated endpoint Wrapper
SemaGrow SPARQL endpoint
Resource Discovery
Query results
query fragment,Source
(#1)
Instance StatisticsData Summaries
SPARQL endpoint
POWDER Inference Layer
P-Store
InstanceStatistics
query fragment,target Source
transformed query
Query Decomposition
querypatterns
Query Results Merger
query fragment,Source
(#n)
queryresults
Client
Reactivityparameters
Query Decomposer
Data Source(s) Selector
Ctrl
Candidate Source(s) List· Instance Statistics· Load Info· Semantic Proximity
Query Transformation Service
SchemaMappings
SPARQL endpoint(Data Source #n)
SPARQLquery
Ctrl
Ctrl
Load Info
Instance Statistics
Data Summaries
Set of query
patternsQuery Pattern Discovery
Service
equivalentpatterns
querypattern
SemanticProximity
Resource Selector
query results schema
transformed schema
queryrequest #1
queryrequest #n
queryresults
SPARQL endpoint(Data Source #1)
SPARQLquery
Query Manager
Going beyond existing Distributed Triple Store Implementations· Link Heterogeneous but
Semantically Connected Data· Index Extremely Large
Information Volumes (Peta Sizes)· Improve Information Retrieval
response
Data (+Metadata) physically stored in Data Provider· No need for
harvesting
Vocabularies / Thesauri / Ontologies of Data Provider choice· No need for
aligning according to common schemas
One Data Access Point for the entire Data Cloud· Enabling Service-Data level agreements with Data providers
Application-level Vocabularies / Thesauri / Ontologies· Enabling different application facets for different communities of users over
the SAME data pool
KREAM 2013
The SemaGrow Solution
5 June 2013
5/15
Use POWDER to mass-annotate large-subspaces· Exploit naming convention regularities to
compress the indexes used by the system Partition triple patterns in the original
query Annotate each fragment with an ordered
list of data sources most likely to contain relevant data
Distribute and transform the query fragments
Collect and align the results
KREAM 2013
The POWDER W3C Recommendation
5 June 2013
6/15
Exploits natural groupings of URIs to annotate all resources in a subset of the URI space
Regular expression based grouping Allows properties and their values to be
associated with an arbitrary number of subjects within a fully-defined semantic framework
POWDER Description Resources: http://www.w3.org/TR/powder-dr/ POWDER Formal Semantics: http://www.w3.org/TR/powder-formal/
KREAM 2013
The SemaGrow Stack
5 June 2013
7/15
Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources
Targets the federation of independently provided data sources
KREAM 2013
SemaGrow Architecture
5 June 2013
8/15
Query
Federated endpoint Wrapper
SemaGrow SPARQL endpoint
Resource Discovery
Query results
query fragment,Source
(#1)
Instance StatisticsData Summaries
SPARQL endpoint
POWDER Inference Layer
P-Store
InstanceStatistics
query fragment,target Source
transformed query
Query Decomposition
querypatterns
Query Results Merger
query fragment,Source
(#n)
queryresults
Client
Reactivityparameters
Query Decomposer
Data Source(s) Selector
Ctrl
Candidate Source(s) List· Instance Statistics· Load Info· Semantic Proximity
Query Transformation Service
SchemaMappings
SPARQL endpoint(Data Source #n)
SPARQLquery
Ctrl
Ctrl
Load Info
Instance Statistics
Data Summaries
Set of query
patternsQuery Pattern Discovery
Service
equivalentpatterns
querypattern
SemanticProximity
Resource Selector
query results schema
transformed schema
queryrequest #1
queryrequest #n
queryresults
SPARQL endpoint(Data Source #1)
SPARQLquery
Query Manager
Query Decompositio
nResource Discovery
Data Summaries Endpoint
Federated Endpoint Wrapper
KREAM 2013
Query Decomposition
5 June 2013
9/15
Analyses SPARQL queries
Decides on the optimal way to create query fragments to be dispatched to sources’ endpoints
Components· Query Decomposition: Suggestions of possible
decompositions· Selector: Evaluates these suggestions based on
information and predictions from the Resource Discovery Component
KREAM 2013
Resource Discovery
5 June 2013
10/15
Provides an annotated list of candidate data sources that (possibly) hold triples matching a query pattern
Sources are annotated with additional information· Schema-level metadata· Instance-level metadata· Predicted Response Volume· Run-time information about current source
load· Semantic proximity of source and query
schemas
KREAM 2013
Data Summaries Endpoint
5 June 2013
11/15
Serves metadata about the schema and instances of the various federated data stores
Receives entity URIs
Returns the repositories where these entities are located (either at the schema or instance level)
Returns ontology alignment knowledge regarding entity equivalence between different sources
KREAM 2013
Federated Endpoint Wrapper
5 June 2013
12/15
Manages the communication with external data sources federated by the SemaGrow Stack
Query Manager· Call Query Transformation Service when necessary · Forwarding query fragments to the Query Results Merger· Collecting and forwarding run-time statistics to the Resource Discovery
Component Query Results Merger
· Pay-as-you-go behaviour· Provides first approximations and iteratively refines them if more
computational resources are warranted by the reactivity parameters
Query Transformation Service· Accesses the Schema Mappings Repository· Rewrites query fragments from the original query schema to that of the data
source that will be used for the fragment· Rewrites query results from the source schema to the query schema
KREAM 2013
Maintenance Components
5 June 2013
13/15
Authoring Tool· Visual tool for assisting data providers· Construction of POWDER statements· Provenance and cataloguing metadata
Ontology Alignment Tool· Semi-automatic (human intervention) alignment of
Semantic Vocabularies used by data providers and consumers
Content Classification and Ontology Evolution· Refine coarsely annotated data to a level of detail
where they can be more accurately aligned with other schemas within the federation
KREAM 2013
Project info
5 June 2013
14/15
SemaGrow: Data intensive techniques to boost the real-time performance of global agricultural data infrastructures
FP7-ICT-2011.4.4 (Intelligent Information Management)
No.
Name Country
1 Universidad de Alcala
2 NCSR “Demokritos”
3 Universita Degli Studi di Roma Tor Vergata
4 Semantic Web Company
5 Institut Za Fiziku
6 Stichting Dienst Landbouwkundik Onderzoek
7 Food and Agriculture Organization of the UN
8 Agroknow Technologies
Thank You!
5 June 2013KREAM 2013
15/15
Dr. Pythagoras P. Karampiperis
Institute of Informatics & Telecommunications (IIT),
NCSR “Demokritos” (NCSR)
www.semagrow.eu