Ian HorrocksInformation Systems GroupDepartment of Computer ScienceUniversity of Oxford
What is Big Data?
What is Big Data?
“a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications” (wikipedia)
What is Big Data?
“a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications” (wikipedia)
Case Study: Energy Services
Service centres responsible for remote monitoringand diagnostics of 1,000s of gas/steam turbines
Engineers use a variety of data for visualization, diagnostics and trend detection:
several TB of time-stamped sensor data several GB of event data data grows at 30GB per day
Case Study: Energy Services
Service centres responsible for remote monitoringand diagnostics of 1,000s of gas/steam turbines
Engineers use a variety of data for visualization, diagnostics and trend detection:
several TB of time-stamped sensor data several GB of event data data grows at 30GB per day
Service Requests1,000 requests per center per year80% of time used on data gatheringPotential saving: €50,000,000/year
Case Study: Energy Services
Service centres responsible for remote monitoringand diagnostics of 1,000s of gas/steam turbines
Engineers use a variety of data for visualization, diagnostics and trend detection:
several TB of time-stamped sensor data several GB of event data data grows at 30GB per day
Service Requests1,000 requests per center per year80% of time used on data gatheringPotential saving: €50,000,000/year
Diagnostic Functionality2–6 p/m to add new functionNew diagnostics → better
exploitation of dataPotential saving: incalculable
Case Study: Exploration
Develop stratigraphic models of unexplored areas Geologists & geophysicists use data from
previous operations in nearby locations 1,000 TB of relational data using diverse schemata spread over 1,000s of tables and multiple data bases
Case Study: Exploration
Develop stratigraphic models of unexplored areas Geologists & geophysicists use data from
previous operations in nearby locations 1,000 TB of relational data using diverse schemata spread over 1,000s of tables and multiple data bases
Data Access900 geologists & geophysicists30-70% of time on data gathering4 day turnaround for new queriesPotential saving: €70,000,000/year
Case Study: Exploration
Develop stratigraphic models of unexplored areas Geologists & geophysicists use data from
previous operations in nearby locations 1,000 TB of relational data using diverse schemata spread over 1,000s of tables and multiple data bases
Data Access900 geologists & geophysicists30-70% of time on data gathering4 day turnaround for new queriesPotential saving: €70,000,000/year
Data ExploitationBetter use of experts timeData analysis “most important
factor” for drilling success
Potential value: > €10bn/project
Data Access Problem
Data Access Problem
Solution: OBDA
Provide semantic end-to-end connectionbetween users and data sources
Objectives
Provide semantic end-to-end connectionbetween users and data sources
Enable users to rapidly formulate intuitive queries using familiar vocabularies and conceptualisations
Objectives
Provide semantic end-to-end connectionbetween users and data sources
Enable users to rapidly formulate intuitive queries using familiar vocabularies and conceptualisations
Return timely answers from large scaleand heterogeneous data sources
Objectives
Solution
Query rewriting:
• uses ontology & mappings
• computationally hard
• ontology & mappings small
Solution
Query rewriting:
• uses ontology & mappings
• computationally hard
• ontology & mappings small
Query evaluation:
• ind. of ontology & mappings
• computationally tractable
• data sets very large
Solution
Query rewriting:
• uses ontology & mappings
• computationally hard
• ontology & mappings small
Query evaluation:
• ind. of ontology & mappings
• computationally tractable
• data sets very large
Other features:
support for query
formulation
Solution
Query Formulation
Query Formulation
Query Formulation
Query Formulation
Query Formulation
Query Formulation
Query Formulation
Query rewriting:
• uses ontology & mappings
• computationally hard
• ontology & mappings small
Query evaluation:
• ind. of ontology & mappings
• computationally tractable
• data sets very large
Other features:
“Bootstrapping”
Ontology & mappings
Solution
Solution
Direct MappingsDirect
Mapping
Extractor
OWL Vocabulary
Metadata
propagator
SOTA
Ontology
Ontology
Alignment
OWL OntologyExtended
OWL
Ontology
Bootstrapping:
Query rewriting:
• uses ontology & mappings
• computationally hard
• ontology & mappings small
Query evaluation:
• ind. of ontology & mappings
• computationally tractable
• data sets very large
Other features:
IT-expert oversees
O&M management
Solution
Query rewriting:
• uses ontology & mappings
• computationally hard
• ontology & mappings small
Query evaluation:
• ind. of ontology & mappings
• computationally tractable
• data sets very large
Other features:
Adapter to support
streaming data
Solution
Stream Adapter
Goal: Support for data
generated by sensors historical data
Stream Adapter
Goal: Support for data
generated by sensors historical data
Challenges: Time aware OBDA
Queries Ontologies Mappings Data
Stream Adapter
Goal: Support for data
generated by sensors historical data
Challenges: Time aware OBDA
Queries Ontologies Mappings Data
STARQL query language Temporalised SPARQL
Query rewriting:
• uses ontology & mappings
• computationally hard
• ontology & mappings small
Query evaluation:
• ind. of ontology & mappings
• computationally tractable
• data sets very large
Other features:
Distributed query
execution
Solution
Thank you for listening
Any questions?