Big Data et eGovernment

Preview:

DESCRIPTION

A l'occasion de l'eGov Innovation Day 2014 - DONNÉES DE L’ADMINISTRATION, UNE MINE (qui) D’OR(t) - Philippe Cudré-Mauroux présente Big Data et eGovernment.

Citation preview

Big Data & eGovernment

Prof. Dr. Philippe Cudré-Mauroux

eXascale Infolab, University of Fribourg Switzerland

eGov Innovation Day November 28, 2014 Fribourg – Switzerland

Instant Quizz

•  Big Data? •  3 Vs? •  CAP? •  Hadoop? •  Spark?

2

Demystifying the Big Data Guru

•  General intro to Big Data •  3 use-cases in eGov/Smart Cities

– Feel free to interrupt – Franglais as main language

eXascale Infolab (www.exascale.info)

•  New lab @ U. of Fribourg, Switzerland •  Big (non-relational) Data management (… mostly)

Main Sponsors

Main Research Partners

Exascale Data Deluge

•  Web companies –  Google –  Ebay –  Yahoo

•  Science

–  Biology –  Astronomy

–  Remote Sensing

•  Financial services, retail companies governments, etc.

© Wired 2009

➡  New data formats ➡  New machines ➡  Peta & exa-scale datasets ➡  Obsolescence of traditional

information infrastructures

The Web as the Main Driver

8

© Qmee

Big Data Buzz

9

Between now and 2015, the firm expects big data to create some 4.4 million IT jobs globally; of those, 1.9 million will be in the U.S. Applying an economic multiplier to that estimate, Gartner expects each new big-data-related IT job to create work for three more people outside the tech industry, for a total of almost 6 million more U.S. jobs.

Growth in the Asia Pacific Big Data market is expected to accelerate rapidly in two to three years time, from a mere US$258.5 million last year to in excess of $1.76 billion in 2016, with highest growth in the storage segment.

Big Data as a New Class of Asset

•  The Age of Big Data (NYTimes Feb. 11, 2012) “The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.”

•  Hype => fact (deal with it) •  Problem => opportunity

10

Big Data Central Theorem

Data+Technology è Actionable Insight è $$

11

Reporting, Monitoring, Root Cause Analysis, (User) Modeling, Prediction

12

10 ways big data changes everything

•  Some concrete examples –  http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/

1.  Can gigabytes predict the next Lady Gaga? 2.  How big data can curb the world’s energy consumption 3.  Big data is now your company’s virtual assistant 4.  The future of Foursquare is data-fueled recommendations 5.  How Twitter data-tracked cholera in Haiti 6.  Revolutionizing Web publishing with big data 7.  Can cell phone data cure society’s ills? 8.  How data can help predict and create video hits 9.  The new face of data visualization 10. One hospital’s embrace of big data

13

The 3-Vs of Big Data •  Volume

– Amount of data

•  Velocity –  speed of data in and out

•  Variety –  range of data types and sources

•  [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-

variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" 14

What can you do with the data

•  Reporting –  Post Hoc –  Real time

•  Monitoring (fine-grained) •  Exploration •  Finding Patterns •  Root Cause Analysis •  Closed-loop Control •  Model construction •  Prediction •  …

15

© Mike Franklin

Typical Big Data Success Story

•  Modeling users through Big Data – Online ads sale / placement [e.g., Facebook] – Personalized Coupons [e.g., Target] – Product Placement [Walmart] – Content Generation [e.g., NetFlix] – Personalized learning [e.g., Duolingo] – HR Recruiting [e.g., Gild]

16

More Data => Better Answers?

•  Not that easy… •  More Rows: Algorithmic complexity kicks in •  More Columns: Exponentially more hypotheses

•  Another formulation of the problem: –  Given an inferential goal and a fixed computational budget,

provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound)

•  In other words: => Data should be a resource, not a load 17

© Mike Jordan

2 Key Ingredients

CAP Theorem

Big Data Infrastructures

19

3 eGov / SmartCities Examples from XI

•  Three Big Data examples from the eXascale Infolab: – Volume: energy provisioning – Velocity: detecting anomalies in smart-cities – Variety: integrating information

Volume: Energy Provisioning

•  Wide adoption of smart-meter technology –  Individuals / neighborhood / city / country

More data => better energy provisioning ?

Not that Easy…

•  Very difficult to analyze energy signals in a database!

•  Solution: new encoding system, new database

Results

•  250x faster than current solutions •  Error on prediction reduced by 100x •  Paper presented at BigData 2014, Washington DC

Velocity: Real-Time Data Management for Smarter Cities

•  Detecting leaks / pipe bursts / contamination in real-time for water distribution networks

24

Sensors installed in the water pipes!

•  Spatial + temporal statistical processing (mini-Lisas) •  Stream processing (Storm) + Array processing (SciDB)

base station 29

sensor 1053

sensor 1054

base station 17

base station 42Peer Information Management overlay

Array Data Management System

OLTP HYRISE OLAP

OLTP HYRISE OLAP

OLTP HYRISE OLAP

Anomaly Detection

Alert

Sliding-Window Average

Data GapEvent

Mini-LisaComputations

Missing Data?

Anomaly Detected?

Yes

No

Yes AnomalyEvent

DeltaCompression

Fluctuation?Yes Publish

ValueEvent

No

No

Alive Event

Stream Processing Flow

25

Variety: Integrating eGov Data

•  Integration: still the biggest IT problem (Gartner)

•  2 inherently difficult problems –  Integrating various data formats / text – Automated integration

Paradigm Change

•  Use of Semantic / Knowledge Graphs to store, trace (provenance) and integrate information –  RDF, Linked Data –  Excel, Word, CSV, XML, Relational

•  Combines both algorithmic and human matchers using probabilistic networks

ZenCrowd •  Uses sets of algorithmic matchers to match entities to

rich knowledge graphs •  Uses human intelligence at scale through crowdsourcing •  Combines both algorithmic and human matchers using

probabilistic networks

Micro Matching

Tasks

HTMLPages

HTML+ RDFaPages

LOD Open Data Cloud

CrowdsourcingPlatform

ZenCrowdEntity

Extractors

LOD Index Get Entity

Input Output

Probabilistic Network

Decision Engine

Micr

o-Ta

sk M

anag

er

Workers Decisions

AlgorithmicMatchers

http://exascale.info

eGov Innovation Day November 28, 2014 Fribourg – Switzerland

Recommended