Upload
egov-innovation-center
View
371
Download
2
Tags:
Embed Size (px)
DESCRIPTION
A l'occasion de l'eGov Innovation Day 2014 - DONNÉES DE L’ADMINISTRATION, UNE MINE (qui) D’OR(t) - Philippe Cudré-Mauroux présente Big Data et eGovernment.
Citation preview
Big Data & eGovernment
Prof. Dr. Philippe Cudré-Mauroux
eXascale Infolab, University of Fribourg Switzerland
eGov Innovation Day November 28, 2014 Fribourg – Switzerland
Instant Quizz
• Big Data? • 3 Vs? • CAP? • Hadoop? • Spark?
2
Demystifying the Big Data Guru
• General intro to Big Data • 3 use-cases in eGov/Smart Cities
– Feel free to interrupt – Franglais as main language
eXascale Infolab (www.exascale.info)
• New lab @ U. of Fribourg, Switzerland • Big (non-relational) Data management (… mostly)
Main Sponsors
Main Research Partners
Exascale Data Deluge
• Web companies – Google – Ebay – Yahoo
• Science
– Biology – Astronomy
– Remote Sensing
• Financial services, retail companies governments, etc.
© Wired 2009
➡ New data formats ➡ New machines ➡ Peta & exa-scale datasets ➡ Obsolescence of traditional
information infrastructures
The Web as the Main Driver
8
© Qmee
Big Data Buzz
9
Between now and 2015, the firm expects big data to create some 4.4 million IT jobs globally; of those, 1.9 million will be in the U.S. Applying an economic multiplier to that estimate, Gartner expects each new big-data-related IT job to create work for three more people outside the tech industry, for a total of almost 6 million more U.S. jobs.
Growth in the Asia Pacific Big Data market is expected to accelerate rapidly in two to three years time, from a mere US$258.5 million last year to in excess of $1.76 billion in 2016, with highest growth in the storage segment.
Big Data as a New Class of Asset
• The Age of Big Data (NYTimes Feb. 11, 2012) “The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.”
• Hype => fact (deal with it) • Problem => opportunity
10
Big Data Central Theorem
Data+Technology è Actionable Insight è $$
11
Reporting, Monitoring, Root Cause Analysis, (User) Modeling, Prediction
12
10 ways big data changes everything
• Some concrete examples – http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/
1. Can gigabytes predict the next Lady Gaga? 2. How big data can curb the world’s energy consumption 3. Big data is now your company’s virtual assistant 4. The future of Foursquare is data-fueled recommendations 5. How Twitter data-tracked cholera in Haiti 6. Revolutionizing Web publishing with big data 7. Can cell phone data cure society’s ills? 8. How data can help predict and create video hits 9. The new face of data visualization 10. One hospital’s embrace of big data
13
The 3-Vs of Big Data • Volume
– Amount of data
• Velocity – speed of data in and out
• Variety – range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-
variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" 14
What can you do with the data
• Reporting – Post Hoc – Real time
• Monitoring (fine-grained) • Exploration • Finding Patterns • Root Cause Analysis • Closed-loop Control • Model construction • Prediction • …
15
© Mike Franklin
Typical Big Data Success Story
• Modeling users through Big Data – Online ads sale / placement [e.g., Facebook] – Personalized Coupons [e.g., Target] – Product Placement [Walmart] – Content Generation [e.g., NetFlix] – Personalized learning [e.g., Duolingo] – HR Recruiting [e.g., Gild]
16
More Data => Better Answers?
• Not that easy… • More Rows: Algorithmic complexity kicks in • More Columns: Exponentially more hypotheses
• Another formulation of the problem: – Given an inferential goal and a fixed computational budget,
provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound)
• In other words: => Data should be a resource, not a load 17
© Mike Jordan
2 Key Ingredients
CAP Theorem
Big Data Infrastructures
19
3 eGov / SmartCities Examples from XI
• Three Big Data examples from the eXascale Infolab: – Volume: energy provisioning – Velocity: detecting anomalies in smart-cities – Variety: integrating information
Volume: Energy Provisioning
• Wide adoption of smart-meter technology – Individuals / neighborhood / city / country
More data => better energy provisioning ?
Not that Easy…
• Very difficult to analyze energy signals in a database!
• Solution: new encoding system, new database
Results
• 250x faster than current solutions • Error on prediction reduced by 100x • Paper presented at BigData 2014, Washington DC
Velocity: Real-Time Data Management for Smarter Cities
• Detecting leaks / pipe bursts / contamination in real-time for water distribution networks
24
Sensors installed in the water pipes!
• Spatial + temporal statistical processing (mini-Lisas) • Stream processing (Storm) + Array processing (SciDB)
base station 29
sensor 1053
sensor 1054
base station 17
base station 42Peer Information Management overlay
Array Data Management System
OLTP HYRISE OLAP
OLTP HYRISE OLAP
OLTP HYRISE OLAP
Anomaly Detection
Alert
Sliding-Window Average
Data GapEvent
Mini-LisaComputations
Missing Data?
Anomaly Detected?
Yes
No
Yes AnomalyEvent
DeltaCompression
Fluctuation?Yes Publish
ValueEvent
No
No
Alive Event
Stream Processing Flow
25
Variety: Integrating eGov Data
• Integration: still the biggest IT problem (Gartner)
• 2 inherently difficult problems – Integrating various data formats / text – Automated integration
Paradigm Change
• Use of Semantic / Knowledge Graphs to store, trace (provenance) and integrate information – RDF, Linked Data – Excel, Word, CSV, XML, Relational
• Combines both algorithmic and human matchers using probabilistic networks
ZenCrowd • Uses sets of algorithmic matchers to match entities to
rich knowledge graphs • Uses human intelligence at scale through crowdsourcing • Combines both algorithmic and human matchers using
probabilistic networks
Micro Matching
Tasks
HTMLPages
HTML+ RDFaPages
LOD Open Data Cloud
CrowdsourcingPlatform
ZenCrowdEntity
Extractors
LOD Index Get Entity
Input Output
Probabilistic Network
Decision Engine
Micr
o-Ta
sk M
anag
er
Workers Decisions
AlgorithmicMatchers
http://exascale.info
eGov Innovation Day November 28, 2014 Fribourg – Switzerland