Tutorial at the EarthBiAs 2014 Summer School on Dealing with Semantic Heterogeneity in Real-Time Information Part I: Large Scale Open Environments Part Ii: Computational Paradigms Part III: RDF Event Processing Part IV: Theory of Event Exchange Part V: Approaches to Semantic Decoupling Part VI: Example Application: Linked Energy Intelligence
EarthBiAs2014 Global NEST University of the Aegean Dealing with Seman@c Heterogeneity in Real-Time Informa@on Dr. Edward Curry Insight Centre for Data Analy@cs, Na@onal University of Ireland Galway Tuesday 8th July 2014 7-11 July 2014, Rhodes, Greece EarthBiAs2014 1
Talk Overview Part I: Large Scale Open Environments Part Ii: ComputaKonal Paradigms Part III: RDF Event Processing Part IV: Theory of Event Exchange Part V: Approaches to SemanKc Decoupling Part VI: Example ApplicaKon: Linked Energy Intelligence 7-11 July 2014, Rhodes, Greece EarthBiAs2014
About Me PhD in Computer Science (NUI Galway) Green and Sustainable IT Research Group Leader in DERI/ Insight NUI Galway Researcher in both Computer Science and InformaKon Systems
Overall Objective WATERNOMICS will provide personalised and actionable information about water consumption and water availability to individual households, companies and cities in an intuitive and effective manner at a time-scale relevant for decision making.
Project-Sense Non-Technical Users Targets Occupants of the Building Non-Technical Office Workers No experience in Energy Management Low cost installation Self-Configuration Collaborative system configuration Crowdsourced contextual data from building occupants Imports relevant enterprise data via Excel Semantic event matching reduces configuration costs Decision Support Sensor and Data Fusion Multi-level decision support model Identifies Energy Saving Opportunities Leverages Open Data and Predictive Analytics User Experience From Awareness to Engagement Transtheoretical Model Gamification User Personalisation Simple non-technical user interfaces Self-conguring smart energy management systems for small commercial buildings
7European Data Forum 2014 BIG 318062 BIG Big Data Public Private Forum 7 BIG 318062 The BIG Project BIG aims to promote a well-developed EU industrial landscape in Big Data: Providing a clear picture of existing technology trends and their maturity Acquiring a sharp understanding of how Big Data can be applied to concrete environments / use cases Pushing European Big Data research and innovation to contribute in increasing European competitiveness Building a self-sustainable, industry-led initiative Overall Objective Work at technical, business and policy levels, shaping the future through the positioning of IIM and Big Data specifically in Horizon 2020. Bringing the necessary stakeholders into a self- sustainable industry-led initiative, which will greatly contribute to enhance the EU competitiveness taking full advantage of Big Data technologies.
@BYTE_EU www.byte-project.eu Big data roadmap and cross- disciplinarY community for addressing socieTal Externali9es The eects of a decision by stakeholders (e.g., governments, industry, scienKsts, policy-makers) that have an impact on a third party (especially members of the public). May be posiKve or negaKve Economic Boost to the economy InnovaKon Increase eciency Smaller actors le] behind Shrink economies Legal Privacy Data protecKon Data ownership Copyright Risks associated with inclusion & exclusion Social & Ethical Transparency DiscriminaKon Methodological diculKes Spurious relaKonships Consumer manipulaKon PoliKcal Reliance on US services Services have become uKliKes Legal issues become trade issues
LARGE SCALE OPEN ENVIRONMENTS PART I 7-11 July 2014, Rhodes, Greece EarthBiAs2014
Emerging Environments Smart City Energy Smart Building Water Management
From Internet of Things to Internet of Everything
Lots of Data 90% of the data in the world today has been created in the last two years alone IBM The bringing together of a vast amount of data from public and private sources  is what Big Data is all about IDC Over the next few years well see the adop@on of scalable frameworks and pla^orms for handling streaming, or near real-@me, analysis and processing. OReilly Big Data represents a number of developments in technology that have been brewing for years and are coming to a boil. They include an explosion of data and new kinds of data, like from the Web and sensor streams; [...]. IDC
From Rigid Schemas to Schema-less 13 Heterogeneous, complex and large-scale data Very-large and dynamic schemas Open Environments: distributed, decoupled data sources, anonymous users, mulK-domain, lack of global order of informaKon ow 10s-100s aeributes 1,000s-1,000,000s aeributes circa 2000 circa 2014
Fundamental DecentralizaKon 14 MulKple perspecKves (conceptualizaKons) of the reality. Ambiguity, vagueness, inconsistency.
Current Trends 7-11 July 2014, Rhodes, Greece EarthBiAs2014 Small scale, controlled environments Large scale, open environments Informa@on sources 10s to 100s 1000s to millions Data heterogeneity Small number of schemas High number of schemas Users Small number Know the environment Large number Not quite know the environment Users organiza@on Users know each others Top-down hierarchies (e.g. enterprises) Decoupled and distributed Dynamism Low High (sources and users join and leave o]en) Domain Domain specic Users interest range from domain specic to domain agnosKc
COMPUTATIONAL PARADIGMS PART II 7-11 July 2014, Rhodes, Greece EarthBiAs2014
InformaKon Flow Processing (IFP) Users need to collect informaKon Produced by mulKple distributed sources For Kmely way processing To extract knowledge asap 7-11 July 2014, Rhodes, Greece EarthBiAs2014 Financial Continuous Analytics RFID Inventory Management Environmental Monitoring
InformaKon Flow Processing (IFP) Processing informaKon as it ows No intermediate storage New informaKon produced Raw informaKon can be discarded 7-11 July 2014, Rhodes, Greece EarthBiAs2014 InformaKon Flow Processing Engine Producers Consumers Rule managers CUGOLA, G. AND MARGARA, A., 2011. Processing ows of informaKon: From data stream to complex event processing. ACM Compu:ng Surveys Journal.
InformaKon Flow Processing (IFP) Requirements Real-Kme or near real-Kme processing Expressive language for rules Scalability to large number of producers and consumers 7-11 July 2014, Rhodes, Greece EarthBiAs2014
ComputaKonal Paradigm Event Processing Event: object represenKng a happening. Deals with events and relaKons of events (e.g. inter-events sequencing, causality, etc.) Stream Processing Stream: homogeneous and totally ordered set of data items. Deals with streams and operaKons on streams (e.g. joins). Event cloud may contain steams of events as well as parKally ordered set of events. (Cugola & Margara, 2012)
Events Processing is Decoupled for Scalability 7-11 July 2014, Rhodes, Greece EarthBiAs2014 Event Processing Space Time SynchronizaKon Event source Event consumer Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/ subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114-131.
AcKve Databases TradiKonal database systems Passive Store data and wait for users interacKon ReacKve behaviour in the applicaKon layer DAYAL, U., BLAUSTEIN, B., BUCHMANN, A., CHAKRAVARTHY, U., HSU, M., LEDIN, R., MCCARTHY, D., ROSENTHAL, A., SARIN, S., CAREY, M. J., LIVNY, M., AND JAUHARI, R. 1988. The hipac project: Combining acKve databases and Kming constraints. SIGMOD Rec. 17, 1, 5170. LIEUWEN, D. F., GEHANI, N. H., AND ARLEIN, R. M. 1996. The ode acKve database: Trigger semanKcs and implementaKon. In Proceedings of the 12th InternaKonal Conference on Data Engineering (ICDE96). IEEE Computer Society, Los Alamitos, CA, 412420. GATZIU, S. AND DITTRICH, K. 1993. Events in an acKve object-oriented database system. In Proceedings of the InternaKonal Workshop on Rules in Database Systems (RIDS), N. Paton and H. Williams, Eds. Workshops in CompuKng, Springer-Verlag, Edinburgh, U.K. CHAKRAVARTHY, S. AND ADAIKKALAVAN, R. 2008. Events and streams: Harnessing and unleashing their synergy! In Proceedings of the 2nd InternaKonal Conference on Distributed Event-Based Systems (DEBS08). ACM, New York, NY, 112. 7-11 July 2014, Rhodes, Greece EarthBiAs2014
AcKve Databases ReacKve behaviour to database layer Event-CondiKon-AcKon (ECA) rules Event: source. E.g. tuple inserted CondiKon: post event. E.g. inserted.value > 5 AcKon: what to do. E.g. modify the DB Cons Persistent storage model Suitable when updates not frequent and few rules 7-11 July 2014, Rhodes, Greece EarthBiAs2014
Data Stream Management Systems Streams unbounded (not like tables) No arrival order assumpKons Typically no storage Use conKnuous, or standing, queries ReacKve in nature CHANDRASEKARAN, S., COOPER, O., DESHPANDE, A., FRANKLIN, M. J., HELLERSTEIN, J. M., HONG, W., KRISHNAMURTHY, S., MADDEN, S. R., REISS, F., AND SHAH, M. A. 2003. Telegraphcq: ConKnuous dataow processing. In Proceedings of the ACM SIGMOD InternaKonal Conference on Management of Data (SIGMOD03). ACM, New York, NY, 668668. CHEN, J., DEWITT, D. J., TIAN, F., AND WANG, Y. 2000. Niagaracq: A scalable conKnuous query system for Internet databases. SIGMOD Rec. 29, 2, 379390. LIU, L., PU, C., AND TANG, W. 1999. ConKnual queries for internet scale event-driven informaKon delivery. IEEE Trans. Knowl. Data Eng. 11, 4, 610628. ARASU, A., BABU, S., AND WIDOM, J. 2006. The CQL conKnuous query language: SemanKc foundaKons and query execuKon. VLDB J. 15, 2, 121142. 7-11 July 2014, Rhodes, Greece EarthBiAs2014
Data Stream Management Systems ConKnuous queries semanKcs Answer: append only stream or update store Exact or approximate answer Cons Atomic item is the stream Not possible to detect sequencing or causal paeerns 7-11 July 2014, Rhodes, Greece EarthBiAs2014
Publish/Subscribe Systems InformaKon items are no:ca:on Indirect addressing-based communicaKon scheme Ancestors Message Passing Remote Procedure Call (RPC) Shared spaces Message Queueing EUGSTER, P.T., FELBER, P.A., GUERRAOUI, R. AND KERMARREC, A.M., 2003. The many faces of publish/subscribe. ACM Compu:ng Surveys (CSUR), 35(2), pp.114131. MUHL , G., FIEGE, L., AND PIETZUCH, P. 2006. Distributed Event-Based Systems. Springer 7-11 July 2014, Rhodes, Greece EarthBiAs2014
Publish/Subscribe Systems One-to-many and many-to-many distribuKon mechanism allows single producer to send a message to one user or potenKally hundreds of thousands of consumers E. Curry, Message-Oriented Middleware, in Middleware for CommunicaKons, Q. H. Mahmoud, Ed. Chichester, England: John Wiley and Sons, 2004, pp. 128. IntroducKon to Message-Oriented Middleware 28
Publish/Subscribe Systems Topic-based pub/sub Topics are groups or channels Events of a topic are sent to the topics subscribers ALTHERR, M., ERZBERGER, M., AND MAFFEIS, S. 1999. iBusa so]ware bus middleware for the Java plavorm. In Proceedings of the InternaKonal Workshop on Reliable Middleware Systems. 4353. Content-based pub/sub Matching by message lters Publishers and subscribers channels are dened by the content and the subscripKons David S. Rosenblum and Alexander L. Wolf. 1997. A design framework for Internet-scale event observaKon and noKcaKon. SIGSOFT SoGw. Eng. Notes 22, 6 (November 1997), 344-360. DOI=10.1145/267896.267920 hep://doi.acm.org/10.1145/267896.267920 Type-based pub/sub Matching on type hierarchy EUGSTER, P. AND GUERRAOUI, R. 2001. Content based publish/subscribe with structural reecKon. In Proceedings of the 6th Usenix Conference on Object-Oriented Technologies andSystems (COOTS01). 7-11 July 2014, Rhodes, Greece EarthBiAs2014...