Lukas Biewald
2
The Effect of Better Algorithms
Naïve Bayes Maximum Entropy SVM0%
5%
10%
15%
20%
25%
Classifier Error Rate
Active Semi-Supervised Learning for Improving Word Alignment(Vamshi ACL ’10)
Real World Data
The Effect of Better Features
Unigrams Bigrams Unigrams+Bigrams0%
5%
10%
15%
20%
25%
30%
Classifier Error Rate
The Effect of More Data
Active Semi-Supervised Learning for Improving Word Alignment(Vamshi ACL ’10)
Real World Data
N 2N 4N0%
2%
4%
6%
8%
10%
12%
14%
Classifier Error Rate
The Effect of Cleaner Data
90% Accurate Data 95% Accurate Data 100% Accurate Data0%
2%
4%
6%
8%
10%
12%
14%
Classifier Error Rate
Where Do Data Scientists Spend Their Time?
Source: CrowdFlower Data Science Report 2015
8
CrowdFlower Data Enrichment Platform
9
Color Data
10
11
12
13
14
15
16
Apple Watch
17
Apple Watch
18
Apple Watch
19
Apple Watch
20
Collecting the Same Data Over and Over
21
Open Data
22
Make Your Data Public Setting
23
Data for Everyone
24
Data For Everyone Library
25
Data for Everyone
26
Data For Everyone
27
Open Data API
28
URL Categorization
29
Categorize URLs
30
Record Data
31
Extracting Names and Titles
32
Summarization
33
Is an Image Funny?
34
Classifying Medical Images
35
Attributes of People
36
37
396 Scripts