Upload
extract-data-conference
View
26.168
Download
1
Embed Size (px)
Citation preview
Kaggle
The home of data science
GE Flight Quest 2Optimize flight routes basedon weather & traffic
$250,000122 teams
Hewlett Foundation: Automated Essay ScoringDevelop an automated scoring algorithmfor student-written essays
$100,000155 teams
Allstate Purchase Prediction ChallengeDevelop an automated scoring algorithmfor student-written essays
$50,0001,570 teams
Merck Molecular Activity ChallengeHelp develop safe and effective medicinesby predicting molecular activity
$40,000236 teams
Higgs Boson Machine Learning ChallengeUse the ATLAS experiment toidentify the Higgs boson
$13,0001,302 teams
Age Income Default
58 $95,824 True73 $20,708 False59 $82,152 False66 $25,334 True
Age Income Default
73 $53,44561 $36,67947 $90,42244 $79,040
Training Data Test Data
The Kaggle Approach
Mapping Dark Matter
Competition Progress
Accuracy(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170Martin O’LearyPhD student in Glaciology, Cambridge U
“In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms”
“The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe”
Mapping Dark Matter
Competition Progress
Accuracy(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170
Martin O’LearyPhD student in Glaciology, Cambridge U
Marius CobzarencoGrad student in computer vision, UC London
Ali Haissaine & Eu Jin LocSignature Verification, Qatar U & Grad Student @ Deloitte
Other
deepZot (David Kirkby & Daniel Margala)Particle Physicist & Cosmologist
We’ve worked with many of the world’s largest companies
Healthcare & Pharma
Consumer Internet
Finance IndustrialConsumerMarketing
Oil& Gas
$50b+Beverage
Co.
Global Bank
Top CreditCard
Issuer
Top 5 E&P
Top 20 E&P
That submit over 100K machine learning models per month
May-10 May-11 May-12 May-13 May-14 May-150
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Monthly Submissions to Kaggle Competitions
There’s a cookbook for winning competitions on structured data. It starts with exploring the data.
2. Create and select features
3. Parameter tuning and ensembling
A second cookbook is emerging on computer vision and speech problems. It involves using convolutional neural networks.
The vast majority of time is spent training algorithms when CNNs are applied.
There are the problems that land in the middle…
Anthony [email protected] 283 9781