Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information Part I: Large Scale Open Environments Part Ii: Computational Paradigms Part III: RDF Event Processing Part IV: Theory of Event Exchange Part V: Approaches to Semantic Decoupling Part VI: Example Application: Linked Energy Intelligence
Text of Dealing with Semantic Heterogeneity in Real-Time Information
EarthBiAs2014 Global NEST University of the Aegean Dealing with Seman@c Heterogeneity in Real-Time Informa@on Dr. Edward Curry Insight Centre for Data Analy@cs, Na@onal University of Ireland Galway Tuesday 8th July 2014 7-11 July 2014, Rhodes, Greece EarthBiAs2014 1
Talk Overview Part I: Large Scale Open Environments Part Ii: ComputaKonal Paradigms Part III: RDF Event Processing Part IV: Theory of Event Exchange Part V: Approaches to SemanKc Decoupling Part VI: Example ApplicaKon: Linked Energy Intelligence 7-11 July 2014, Rhodes, Greece EarthBiAs2014
About Me PhD in Computer Science (NUI Galway) Green and Sustainable IT Research Group Leader in DERI/ Insight NUI Galway Researcher in both Computer Science and InformaKon Systems
Overall Objective WATERNOMICS will provide personalised and actionable information about water consumption and water availability to individual households, companies and cities in an intuitive and effective manner at a time-scale relevant for decision making.
Project-Sense Non-Technical Users Targets Occupants of the Building Non-Technical Office Workers No experience in Energy Management Low cost installation Self-Configuration Collaborative system configuration Crowdsourced contextual data from building occupants Imports relevant enterprise data via Excel Semantic event matching reduces configuration costs Decision Support Sensor and Data Fusion Multi-level decision support model Identifies Energy Saving Opportunities Leverages Open Data and Predictive Analytics User Experience From Awareness to Engagement Transtheoretical Model Gamification User Personalisation Simple non-technical user interfaces Self-conguring smart energy management systems for small commercial buildings
7European Data Forum 2014 BIG 318062 BIG Big Data Public Private Forum 7 BIG 318062 The BIG Project BIG aims to promote a well-developed EU industrial landscape in Big Data: Providing a clear picture of existing technology trends and their maturity Acquiring a sharp understanding of how Big Data can be applied to concrete environments / use cases Pushing European Big Data research and innovation to contribute in increasing European competitiveness Building a self-sustainable, industry-led initiative Overall Objective Work at technical, business and policy levels, shaping the future through the positioning of IIM and Big Data specifically in Horizon 2020. Bringing the necessary stakeholders into a self- sustainable industry-led initiative, which will greatly contribute to enhance the EU competitiveness taking full advantage of Big Data technologies.
@BYTE_EU www.byte-project.eu Big data roadmap and cross- disciplinarY community for addressing socieTal Externali9es The eects of a decision by stakeholders (e.g., governments, industry, scienKsts, policy-makers) that have an impact on a third party (especially members of the public). May be posiKve or negaKve Economic Boost to the economy InnovaKon Increase eciency Smaller actors le] behind Shrink economies Legal Privacy Data protecKon Data ownership Copyright Risks associated with inclusion & exclusion Social & Ethical Transparency DiscriminaKon Methodological diculKes Spurious relaKonships Consumer manipulaKon PoliKcal Reliance on US services Services have become uKliKes Legal issues become trade issues
LARGE SCALE OPEN ENVIRONMENTS PART I 7-11 July 2014, Rhodes, Greece EarthBiAs2014
Emerging Environments Smart City Energy Smart Building Water Management
From Internet of Things to Internet of Everything
Lots of Data 90% of the data in the world today has been created in the last two years alone IBM The bringing together of a vast amount of data from public and private sources  is what Big Data is all about IDC Over the next few years well see the adop@on of scalable frameworks and pla^orms for handling streaming, or near real-@me, analysis and processing. OReilly Big Data represents a number of developments in technology that have been brewing for years and are coming to a boil. They include an explosion of data and new kinds of data, like from the Web and sensor streams; [...]. IDC
From Rigid Schemas to Schema-less 13 Heterogeneous, complex and large-scale data Very-large and dynamic schemas Open Environments: distributed, decoupled data sources, anonymous users, mulK-domain, lack of global order of informaKon ow 10s-100s aeributes 1,000s-1,000,000s aeributes circa 2000 circa 2014
Fundamental DecentralizaKon 14 MulKple perspecKves (conceptualizaKons) of the reality. Ambiguity, vagueness, inconsistency.
Current Trends 7-11 July 2014, Rhodes, Greece EarthBiAs2014 Small scale, controlled environments Large scale, open environments Informa@on sources 10s to 100s 1000s to millions Data heterogeneity Small number of schemas High number of schemas Users Small number Know the environment Large number Not quite know the environment Users organiza@on Users know each others Top-down hierarchies (e.g. enterprises) Decoupled and distributed Dynamism Low High (sources and users join and leave o]en) Domain Domain specic Users interest range from domain specic to domain agnosKc
COMPUTATIONAL PARADIGMS PART II 7-11 July 2014, Rhodes, Greece EarthBiAs2014
InformaKon Flow Processing (IFP) Users need to collect informaKon Produced by mulKple distributed sources For Kmely way processing To extract knowledge asap 7-11 July 2014, Rhodes, Greece EarthBiAs2014 Financial Continuous Analytics RFID Inventory Management Environmental Monitoring
InformaKon Flow Processing (IFP) Processing informaKon as it ows No intermediate storage New informaKon produced Raw informaKon can be discarded 7-11 July 2014, Rhodes, Greece EarthBiAs2014 InformaKon Flow Processing Engine Producers Consumers Rule managers CUGOLA, G. AND MARGARA, A., 2011. Processing ows of informaKon: From data stream to complex event processing. ACM Compu:ng Surveys Journal.
InformaKon Flow Processing (IFP) Requirements Real-Kme or near real-Kme processing Expressive language for rules Scalability to large number of producers and consumers 7-11 July 2014, Rhodes, Greece EarthBiAs2014
ComputaKonal Paradigm Event Processing Event: object represenKng a happening. Deals with events and relaKons of events (e.g. inter-events sequencing, causality, etc.) Stream Processing Stream: homogeneous and totally ordered set of data items. Deals with streams and operaKons on streams (e.g. joins). Event cloud may contain steams of events as well as parKally ordered set of events. (Cugola & Margara, 2012)
Events Processing is Decoupled for Scalability 7-11 July 2014, Rhodes, Greece EarthBiAs2014 Event Processing Space Time SynchronizaKon Event source Event consumer Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/ subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-131.
AcKve Databases TradiKonal database systems Passive Store data and wait for users interacKon ReacKve behaviour in the applicaKon layer DAYAL, U., BLAUSTEIN, B., BUCHMANN, A., CHAKRAVARTHY, U., HSU, M., LEDIN, R., MCCARTHY, D., ROSENTHAL, A., SARIN, S., CAREY, M. J., LIVNY, M., AND JAUHARI, R. 1988. The hipac project: Combining acKve databases and Kming constraints. SIGMOD Rec. 17, 1, 5170. LIEUWEN, D. F., GEHANI, N. H., AND ARLEIN, R. M. 1996. The ode acKve database: T