Upload
dachis-group
View
1.370
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
The growing revolution in Data: A presentation to the Social
Business Summit
Nicholas GruenChairman, KaggleChairman, Government 2.0 Taskforce,
T @nicholasgruenSingapore, 6th April, 2011
Outline
• The changing landscape – and what’s behind it
• The ecology of data • Finding the people to find the value in your data
– Kaggle
• From data inside to data outside• Data and gamification
– The Gruen Tender
Data can turn things upside down
• Insurance • Retail• Banking • Telecommunications• Accommodation• Aviation and transport
– From stand by to advance purchase– load optimisation, price discrimination and risk sharing
• Medicine
All That Data…
3 years of historical data for comparison
10 x 750 x 50 x 52 x 3 = 58,500,000 data points
4 regions to segregate the data
10 x 750 x 50 x 52 x 3 x 7 x 4 = 1,638,000,000 data points
50 states to segregate the data
10 x 750 x 50 x 52 x 3 x 7 x 4 x 50 = 81,900,000,000 data points
7 types of data to monitor (POS, Inventory, Marketing, Syndicated, etc)
10 x 750 x 50 x 52 x 3 x 7 = 409,500,000 data points
8 categories to aggregate the data
10 x 750 x 50 x 52 x 3 x 7 x 4 x 50 x 8 = 655,200,000,000 data points
10 Retailers to monitor
10 data points
750 Stores per retailer to monitor
10 x 750 = 7500 data points
50 products per store to monitor
10 x 750 x 50 = 375,000 data points
52 weeks of data per year for trend analysis
10 x 750 x 50 x 52 = 19,500,000 data points
655 Billion+ data points involved with managing the retail sales channel
Source: Marilyn and Terence Craig @ Strataconf
http://www.dlib.org/dlib/may09/mestl/05mestl.html
He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature, when she made them, like fire, expansible over all space, without lessening their density in any point, and like the air in which we breathe, move, and have our physical being, incapable of confinement or exclusive appropriation.
Thomas Jefferson to Isaac McPherson, August, 1813
Jefferson’s enlightenment dream
Public goods – goods that no-one will supply if the government doesn’t
Public goodsPublic goods . . . present seriousproblems in human organisation.
Vincent and Elenor Ostrom - 1977
The Wealthof Nations (1776)
• Private Goods
The Theory ofMoral Sentiments (1759)
• The social preconditionsof markets(Public Goods)
Language
Adam Smith
Public Goods
Private Goods
[The public good of] Justice . . . is the main pillar that upholds the whole edifice. If it is removed, the great, the immense fabric of human society . . . must in a moment crumble into atoms.
Adam Smith
From potential to actual public good
Web 2.0: explosion of emergent public goods
Web 2.0 platforms are public goods:
Google (1998)
Wikipedia (2001)
Blogs (early 2000s)
Facebook (2004)
Twitter (2006)
Government didn’t build any of them
These platforms generate data – By creating a context in which it means something– And so inducing us to produce it
The economics of abundance: a new birth of ‘free’dom
Public goods . . . present seriousproblems in human organisation.
Vincent and Elenor Ostrom - 1977
The freedom of ideas is the liberation of our speciesPublic goods as a problemPublic goods as an opportunity
The ecology of data
Data
Schema or Context
Information
An example from Web 2.0 …
Private goods => Public Goods: Software
Private Goods• Meeting private needs
Public Goods
• Many eyeballs• Free code
Making sense of data
• Release and the sense makers will come• Make sense of the data for your community
– And you may be able to monetise it – Whole businesses being built on data exhaust
• Find the people to analyse your data – Kaggle
FlightCaster• Predicts flight delays. • We use an advanced algorithm that scours data on
every domestic flight for the past 10-years and matches it to real-time conditions. We help you evaluate alternative options and help connect you to the right person to make the change.
• FlightCaster uses data from:– Bureau of Transportation Statistics– FAA Air Traffic Control System Command Center– FlightStats– National Weather Service
Private goods => Public Goods: Data
Private Goods• Meeting private needs• Linking to other websites
Public Goods
• Google uses this information to rank sites
• Everyone benefits
Google monetises with ads
Private Goods• Platform for recording data
Public Goods
• PLM aggregates data and shares it back as public and private goods
Sales of data
Data exhaust
Where’s Wally?Where’s Wally?
Global Competitions
State of the art 70%
1½ weeks 70.8%
Competition closes 77%
Predicting HIV viral load
Accuracy of Prediction (1 – 100%)
• Revenue or sales forecasts
• Traffic forecasting• Energy demand• Predicting crime• Tax/social security fraud• Hospital casualty demand• Identifying great
• Teachers• Schools• Hospitals
• and their best practices
US$500
We could not be happier with the result. The Kaggle approach has set a new benchmark in Government for the development of successful predictive models, delivered quickly and very cost effectively.
In particular, the flexibility of the winning predictive model will enable its application to other major transport routes to the CBD and allow for the addition of other factors such as weather and incident.
Susan CalvertDirector, Strategy and Project Delivery Unit
Forecast Eurovision Voting
Dr. Derek Gatherer, UK
Take on the Quants 1 & 2
John BlatzBaltimore
Edmund & AdrianLondon & USA
Jason TriggPennsylvania
Chih-Li Sung & Roy TsengPenghu & Taipei
Jure ZbontarLjubljana
Thomas MahonyCanberra
Emir DelicAustralia
Glen MaherCanberra
Predict HIV
Chris RaimondiBatimore
Claudio Perlich USA
Gzegorz SwiszczGera
Edmund & AdrianLondon & USA
Tourism Forecasting Part 1
RajstennajBarrabasUSA
Jason TriggPennsylvania
Felipe MaiaUppsala University
Lee BakerLas Cruces, New Mexico
INFORMS
Cole HarrisTexas
Nan ZhouPittsburgh
Chess Ratings
Uri BlassTel-Aviv
Giuseppe RagusaRome
Robert Warsaw
Tourism Forecasting Part 2
R Package Recommendation Engine
IvanRussian Federation
The top 3 competitors for:
Philipp Emanuel WidmannHeidelberg, DE
Dr. ChristopherHefele, New York
Chris DuBoisPortland
Where’s Wally?Where’s Wally?Where’s Jeremy?Where’s Jeremy?
Chris RaimondiBaltimore
Felipe MaiaUppsala University
Jeremy Howard
Where’s Wally from?Where’s Wally from?
What are Wally’s Qualifications
Rebuilding an info-structure
• Global CrisisCommons • Within 2 hours of #CCearthquake• Global volunteers parse 300,000
tweets. • “Shell 58 Barrack Rd out of petrol –
only diesel”. • Agencies fussed, helped and
obstructed. • Kaggle comp to triage tweets
New routines to generate dataReal estate or other sales
Agent 1Agent 2Agent 3
Prognosis420,000$ 415,000$ 450,000$
Accuracy of past prognoses
5%-2%-15%
Expected price
441,000$ 406,700$ 382,500$
Indicated Service provider
Medical procedure
Hospital AHospital BHospital C
Raw Prognosis
2.0%4.0%1.5%
Correction for accuracy of past prognoses
-30%25%30%
Expected chance of adverse event
1.40%5.00%1.95%
Indicated Service provider
Litigation
Provider AProvider BProvider C
Raw Prognosis
85.0%91.0%95.0%
Correction for accuracy of past prognoses
13%0%-5%
Expected chance of success
96.00%91.00%90.00%
Indicated Service provider
Gruen Tenders
• Forward looking data• Tailored to the specific case at hand • Enables innovation and data capture• Generate a mass of new data • Compares like with like• Minimises perverse incentives
T @nicholasgruen
The Public Goods of Web 2.0
SEOGoogle
Page Rank
The Public Goods of Web 3.0
Ontologies followed - with tagging,
for same reasonsas SEO
Ontologies created