48
The growing revolution in Data: A presentation to the Social Business Summit Nicholas Gruen Chairman, Kaggle Chairman, Government 2.0 Taskforce, E [email protected] T @nicholasgruen Singapore, 6 th April, 2011

2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

The growing revolution in Data: A presentation to the Social

Business Summit

Nicholas GruenChairman, KaggleChairman, Government 2.0 Taskforce,

E [email protected]

T @nicholasgruenSingapore, 6th April, 2011

Page 2: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Outline

• The changing landscape – and what’s behind it

• The ecology of data • Finding the people to find the value in your data

– Kaggle

• From data inside to data outside• Data and gamification

– The Gruen Tender

Page 3: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Data can turn things upside down

• Insurance • Retail• Banking • Telecommunications• Accommodation• Aviation and transport

– From stand by to advance purchase– load optimisation, price discrimination and risk sharing

• Medicine

Page 4: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 5: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

All That Data…

3 years of historical data for comparison

10 x 750 x 50 x 52 x 3 = 58,500,000 data points

4 regions to segregate the data

10 x 750 x 50 x 52 x 3 x 7 x 4 = 1,638,000,000 data points

50 states to segregate the data

10 x 750 x 50 x 52 x 3 x 7 x 4 x 50 = 81,900,000,000 data points

7 types of data to monitor (POS, Inventory, Marketing, Syndicated, etc)

10 x 750 x 50 x 52 x 3 x 7 = 409,500,000 data points

8 categories to aggregate the data

10 x 750 x 50 x 52 x 3 x 7 x 4 x 50 x 8 = 655,200,000,000 data points

10 Retailers to monitor

10 data points

750 Stores per retailer to monitor

10 x 750 = 7500 data points

50 products per store to monitor

10 x 750 x 50 = 375,000 data points

52 weeks of data per year for trend analysis

10 x 750 x 50 x 52 = 19,500,000 data points

655 Billion+ data points involved with managing the retail sales channel

Source: Marilyn and Terence Craig @ Strataconf

Page 6: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

http://www.dlib.org/dlib/may09/mestl/05mestl.html

Page 7: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature, when she made them, like fire, expansible over all space, without lessening their density in any point, and like the air in which we breathe, move, and have our physical being, incapable of confinement or exclusive appropriation. 

Thomas Jefferson to Isaac McPherson, August, 1813

Jefferson’s enlightenment dream

Page 8: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Public goods – goods that no-one will supply if the government doesn’t

Public goodsPublic goods . . . present seriousproblems in human organisation.

Vincent and Elenor Ostrom - 1977

Page 9: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

The Wealthof Nations (1776)

• Private Goods

The Theory ofMoral Sentiments (1759)

• The social preconditionsof markets(Public Goods)

Language

Adam Smith

Page 10: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Public Goods

Private Goods

[The public good of] Justice . . . is the main pillar that upholds the whole edifice. If it is removed, the great, the immense fabric of human society . . . must in a moment crumble into atoms.

Adam Smith

Page 11: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

From potential to actual public good

Page 12: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Web 2.0: explosion of emergent public goods

Web 2.0 platforms are public goods:

Google (1998)

Wikipedia (2001)

Blogs (early 2000s)

Facebook (2004)

Twitter (2006)

Government didn’t build any of them

These platforms generate data – By creating a context in which it means something– And so inducing us to produce it

Page 13: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

The economics of abundance: a new birth of ‘free’dom

Public goods . . . present seriousproblems in human organisation.

Vincent and Elenor Ostrom - 1977

The freedom of ideas is the liberation of our speciesPublic goods as a problemPublic goods as an opportunity

Page 14: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

The ecology of data

Data

Schema or Context

Information

An example from Web 2.0 …

Page 15: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Private goods => Public Goods: Software

Private Goods• Meeting private needs

Public Goods

• Many eyeballs• Free code

Page 16: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Making sense of data

• Release and the sense makers will come• Make sense of the data for your community

– And you may be able to monetise it – Whole businesses being built on data exhaust

• Find the people to analyse your data – Kaggle

Page 17: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 18: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 19: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 20: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 21: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 22: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 23: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 24: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 25: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 26: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

FlightCaster• Predicts flight delays. • We use an advanced algorithm that scours data on

every domestic flight for the past 10-years and matches it to real-time conditions. We help you evaluate alternative options and help connect you to the right person to make the change.

• FlightCaster uses data from:– Bureau of Transportation Statistics– FAA Air Traffic Control System Command Center– FlightStats– National Weather Service

Page 27: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Private goods => Public Goods: Data

Private Goods• Meeting private needs• Linking to other websites

Public Goods

• Google uses this information to rank sites

• Everyone benefits

Google monetises with ads

Page 28: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Private Goods• Platform for recording data

Public Goods

• PLM aggregates data and shares it back as public and private goods

Sales of data

Page 29: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Data exhaust

Page 30: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Where’s Wally?Where’s Wally?

Page 31: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Global Competitions

State of the art 70%

1½ weeks 70.8%

Competition closes 77%

Predicting HIV viral load

Accuracy of Prediction (1 – 100%)

• Revenue or sales forecasts

• Traffic forecasting• Energy demand• Predicting crime• Tax/social security fraud• Hospital casualty demand• Identifying great

• Teachers• Schools• Hospitals

• and their best practices

US$500

Page 32: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

We could not be happier with the result.  The Kaggle approach has set a new benchmark in Government for the development of successful predictive models, delivered quickly and very cost effectively. 

In particular, the flexibility of the winning predictive model will enable its application to other major transport routes to the CBD and allow for the addition of other factors such as weather and incident.

Susan CalvertDirector, Strategy and Project Delivery Unit

Page 33: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 34: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Forecast Eurovision Voting

Dr. Derek Gatherer, UK

Take on the Quants 1 & 2

John BlatzBaltimore

Edmund & AdrianLondon & USA

Jason TriggPennsylvania

Chih-Li Sung & Roy TsengPenghu & Taipei

Jure ZbontarLjubljana

Thomas MahonyCanberra

Emir DelicAustralia

Glen MaherCanberra

Predict HIV

Chris RaimondiBatimore

Claudio Perlich USA

Gzegorz SwiszczGera

Edmund & AdrianLondon & USA

Tourism Forecasting Part 1

RajstennajBarrabasUSA

Jason TriggPennsylvania

Felipe MaiaUppsala University

Lee BakerLas Cruces, New Mexico

INFORMS

Cole HarrisTexas

Nan ZhouPittsburgh

Chess Ratings

Uri BlassTel-Aviv

Giuseppe RagusaRome

Robert Warsaw

Tourism Forecasting Part 2

R Package Recommendation Engine

IvanRussian Federation

The top 3 competitors for:

Philipp Emanuel WidmannHeidelberg, DE

Dr. ChristopherHefele, New York

Chris DuBoisPortland

Where’s Wally?Where’s Wally?Where’s Jeremy?Where’s Jeremy?

Chris RaimondiBaltimore

Felipe MaiaUppsala University

Jeremy Howard

Page 35: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Where’s Wally from?Where’s Wally from?

Page 36: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

What are Wally’s Qualifications

Page 37: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Rebuilding an info-structure

• Global CrisisCommons  • Within 2 hours of #CCearthquake• Global volunteers parse 300,000

tweets. • “Shell 58 Barrack Rd out of petrol –

only diesel”. • Agencies fussed, helped and

obstructed. • Kaggle comp to triage tweets

Page 38: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 39: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

New routines to generate dataReal estate or other sales

Agent 1Agent 2Agent 3

Prognosis420,000$ 415,000$ 450,000$

Accuracy of past prognoses

5%-2%-15%

Expected price

441,000$ 406,700$ 382,500$

Indicated Service provider

Page 40: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Medical procedure

Hospital AHospital BHospital C

Raw Prognosis

2.0%4.0%1.5%

Correction for accuracy of past prognoses

-30%25%30%

Expected chance of adverse event

1.40%5.00%1.95%

Indicated Service provider

Page 41: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Litigation

Provider AProvider BProvider C

Raw Prognosis

85.0%91.0%95.0%

Correction for accuracy of past prognoses

13%0%-5%

Expected chance of success

96.00%91.00%90.00%

Indicated Service provider

Page 42: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

Gruen Tenders

• Forward looking data• Tailored to the specific case at hand • Enables innovation and data capture• Generate a mass of new data • Compares like with like• Minimises perverse incentives

Page 43: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

E [email protected]

T @nicholasgruen

Page 44: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

The Public Goods of Web 2.0

SEOGoogle

Page Rank

Page 45: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data

The Public Goods of Web 3.0

Ontologies followed - with tagging,

for same reasonsas SEO

Ontologies created

Page 46: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 47: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
Page 48: 2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data