Upload
christopher-rice
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Managing Technical Talent: How to Find the Managing Technical Talent: How to Find the Right Analyst for Your ProblemRight Analyst for Your Problem
Presentation to the Wolfram Data SummitPresentation to the Wolfram Data SummitWashington DC, Friday, Sept 09, 2011Washington DC, Friday, Sept 09, 2011
[email protected] @nicholasgruen
• genetic algorithms• random forest• Monte Carlo methods• principal component analysis• Kalman filter• evolutionary fuzzy modelling
• neural networks• logistic regression• support vector machine• decision trees• ensemble methods• adaBoost• Bayesian networks
Different users - different techniques.
“A discovery is ... an accident meeting a prepared mind.”Albert Szent-Gyorgyi, 1937 Nobel Prize for Medicine
‣ Is the crown pure gold? ‣ We know its weight. ‣ How to measure its volume?
Eureka!
Finding the world’s most perfectly prepared mind
Our User Base
Competition Mechanics
Competitions are judged on objective criteria
1 2 3
Users create predictive models,
submit theseto Kaggle,
and are scored on their accuracy.
How Kaggle Works
Competitions are judged based on predictive accuracy
+ Genetic marker 4
Genetic marker 1
+ Genetic marker 3
+ Genetic marker 2Which HIV patients will be sicker next week?
HIV LoadStock PricesChess Ratings
Scouring the world for the best analysts for a problem.
Traffic flowGrant Forecasting
Dr. Derek GathererUK
John BlatzBaltimore
Edmund & AdrianLondon & USA
Jason TriggPennsylvania
Chih-Li Sung & Roy TsengPenghu & Taipei
Jure ZbontarLjubljana
Thomas MahonyCanberra
Emir DelicAustralia
Glen MaherCanberra
Chris RaimondiBatimore
Claudio Perlich USA
Gzegorz SwiszczGera
Edmund & AdrianLondon & USA
RajstennajBarrabasUSA
Jason TriggPennsylvaniaLee Baker
Las Cruces, NM
Cole HarrisTexas
Nan ZhouPittsburgh
Uri BlassTel-Aviv
Giuseppe RagusaRome
Robert Warsaw
IvanRussian Federation
Chris DuBoisPortland
Philipp Emanuel WidmannHeidelberg, DE
Dr. ChristopherHefele, New York
Jeremy Howard
Chris RaimondiBaltimore
Tim Salimans Erasmus U
Global competitions
1½ weeks 70.8%
Competition closes 77%
State of the art 70%
Predicting HIV progression
US$500
HIV LoadStock PricesChess Ratings
Where’s Wally? Scouring the world for the best analysts for a problem.
Traffic flowGrant Forecasting
Dr. Derek GathererUK
John BlatzBaltimore
Edmund & AdrianLondon & USA
Jason TriggPennsylvania
Chih-Li Sung & Roy TsengPenghu & Taipei
Jure ZbontarLjubljana
Chris RaimondiBatimore
Claudio Perlich USA
Gzegorz SwiszczGera
Edmund & AdrianLondon & USA
RajstennajBarrabasUSA
Jason TriggPennsylvaniaLee Baker
Las Cruces, NM
Cole HarrisTexas
Nan ZhouPittsburgh
Uri BlassTel-Aviv
Giuseppe RagusaRome
Robert Warsaw
IvanRussian Federation
Chris DuBoisPortland
Philipp Emanuel WidmannHeidelberg, DE
Dr. ChristopherHefele, New York
Chris RaimondiBaltimore
HIV LoadStock PricesChess Ratings
Where’s Wally? Scouring the world for the best analysts for a problem.
Traffic flowGrant Forecasting
Dr. Derek GathererUK
John BlatzBaltimore
Edmund & AdrianLondon & USA
Jason TriggPennsylvania
Chih-Li Sung & Roy TsengPenghu & Taipei
Jure ZbontarLjubljana
Chris RaimondiBatimore
Claudio Perlich USA
Gzegorz SwiszczGera
Edmund & AdrianLondon & USA
RajstennajBarrabasUSA
Jason TriggPennsylvaniaLee Baker
Las Cruces, NM
Cole HarrisTexas
Nan ZhouPittsburgh
Uri BlassTel-Aviv
Giuseppe RagusaRome
Robert Warsaw
IvanRussian Federation
Chris DuBoisPortland
Philipp Emanuel WidmannHeidelberg, DE
Dr. ChristopherHefele, New York
Tim Salimans Erasmus U R’dam
Martin O’Leary
“In less than a week … a PhD student in glaciology outperformed the state-of-the-art algorithms”
We could not be happier with the result. The Kaggle approach has set a new benchmark in Government for the development of successful predictive models, delivered quickly and very cost effectively.
In particular, the flexibility of the winning predictive model will enable its application to other major transport routes to the CBD and allow for the addition of other factors such as weather and incident.
Susan CalvertDirector, Strategy and Project Delivery Unit Department Premier and Cabinet
A Few Kaggle Projects
Take historical medical claims and predict who will go to hospital. This competition has a $3 million prize.
Predict which editors will stop contributing
New algorithm for chess ratings. Has wide gaming and ranking significance
Detect driver drowsiness
Predict the likelihood of claims given different vehicle models
Predict successful grant applications
Predict shoppers’ next visit to supermarket
User base: 14,107 registered data scientists
ForecastError
(MASE)
Combination of world’s best models
Aug 9 2 weeks later
1 month later
Competition End
This competition (to forecast tourism demand) used one of the most heavily studied sets of time series data. It had previously been modeled using the leading commercial software and academic algorithms. Competitors quickly surpassed world’s best practice and found the frontier of what’s possible.
Frontier reached after all information is extracted from the dataset
Kaggle Competition Results
Kaggle Competition Results
HIV LoadStock PricesChess Ratings
Where’s Wally? Scouring the world for the best analysts for a problem.
Traffic flowGrant Forecasting
Dr. Derek GathererUK
John BlatzBaltimore
Edmund & AdrianLondon & USA
Jason TriggPennsylvania
Chih-Li Sung & Roy TsengPenghu & Taipei
Jure ZbontarLjubljana
Chris RaimondiBatimore
Claudio Perlich USA
Gzegorz SwiszczGera
Edmund & AdrianLondon & USA
RajstennajBarrabasUSA
Jason TriggPennsylvaniaLee Baker
Las Cruces, NM
Cole HarrisTexas
Nan ZhouPittsburgh
Uri BlassTel-Aviv
Giuseppe RagusaRome
Robert Warsaw
IvanRussian Federation
Chris DuBoisPortland
Philipp Emanuel WidmannHeidelberg, DE
Dr. ChristopherHefele, New York
Jeremy Howard
Jeremy Howard
Jeremy Howard
From generating value => Making money
1. Open Comps: Unleashing the power of Crowdsourcing
$ Commission, consulting and performance fees
2. Consulting partnerships $ revenue share
3. The platform as marketplace for technical talent
$ revenue share
Our market
Business analytics = $107 bil marketOutsourced business analytics = $38b [IDC]
Public and third sector• Revenue forecasts• Traffic forecasting• Energy demand• Predicting crime• Tax/social security fraud• Hospital casualty demand• Identifying great
• Teachers• Hospitals
Private Sector • Sales forecasts• Credit scoring • Stock picking• Risk modelling and pricing• Identifying fraud• Identifying best practice• Production management • Inventory management • Logistic optimisation
First mover advantages of internet platforms
Clients Analysts
Kaggle not for profit
Kaggle public good competitions
“I keep saying the sexy job in the next ten years will be statisticians.”Hal VarianGoogle Chief Economist2009
No matter who you are, most of the smartest people work for someone else.Bill JoyeFounder, Sun Microsystems2009
Transforming the inefficient market for technical talent into the world’s largest meritocracy.
[email protected] @nicholasgruen
Wally Photos by William Murphy (Flickr: infomatique)
Who We Are
Anthony Goldbloom
CEO / Founder• Econometrician @ the
Australian Treasury & Reserve Bank of Australia
• Journalism Intern @ The Economist.
Nicholas GruenChairmanChairman of the Australian Gov. 2.0 taskforce
Jeff MoserCTODeveloper @ Raytheon and widely read blogger
Jeremy HowardChief Scientist• McKinsey and A.T. Kearney
alumnus
• Founder of 2 successful startups: FastMail (exit to Opera) and Optimal Decisions Group (exit to Choicepoint)