38
Lukas Biewald

Lukas Biewald, CrowdFlower // Enriching Your Data

Embed Size (px)

Citation preview

Page 1: Lukas Biewald, CrowdFlower // Enriching Your Data

Lukas Biewald

Page 2: Lukas Biewald, CrowdFlower // Enriching Your Data

2

Page 3: Lukas Biewald, CrowdFlower // Enriching Your Data

The Effect of Better Algorithms

Naïve Bayes Maximum Entropy SVM0%

5%

10%

15%

20%

25%

Classifier Error Rate

Active Semi-Supervised Learning for Improving Word Alignment(Vamshi ACL ’10)

Real World Data

Page 4: Lukas Biewald, CrowdFlower // Enriching Your Data

The Effect of Better Features

Unigrams Bigrams Unigrams+Bigrams0%

5%

10%

15%

20%

25%

30%

Classifier Error Rate

Page 5: Lukas Biewald, CrowdFlower // Enriching Your Data

The Effect of More Data

Active Semi-Supervised Learning for Improving Word Alignment(Vamshi ACL ’10)

Real World Data

N 2N 4N0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

Page 6: Lukas Biewald, CrowdFlower // Enriching Your Data

The Effect of Cleaner Data

90% Accurate Data 95% Accurate Data 100% Accurate Data0%

2%

4%

6%

8%

10%

12%

14%

Classifier Error Rate

Page 7: Lukas Biewald, CrowdFlower // Enriching Your Data

Where Do Data Scientists Spend Their Time?

Source: CrowdFlower Data Science Report 2015

Page 8: Lukas Biewald, CrowdFlower // Enriching Your Data

8

CrowdFlower Data Enrichment Platform

Page 9: Lukas Biewald, CrowdFlower // Enriching Your Data

9

Color Data

Page 10: Lukas Biewald, CrowdFlower // Enriching Your Data

10

Page 11: Lukas Biewald, CrowdFlower // Enriching Your Data

11

Page 12: Lukas Biewald, CrowdFlower // Enriching Your Data

12

Page 13: Lukas Biewald, CrowdFlower // Enriching Your Data

13

Page 14: Lukas Biewald, CrowdFlower // Enriching Your Data

14

Page 15: Lukas Biewald, CrowdFlower // Enriching Your Data

15

Page 16: Lukas Biewald, CrowdFlower // Enriching Your Data

16

Apple Watch

Page 17: Lukas Biewald, CrowdFlower // Enriching Your Data

17

Apple Watch

Page 18: Lukas Biewald, CrowdFlower // Enriching Your Data

18

Apple Watch

Page 19: Lukas Biewald, CrowdFlower // Enriching Your Data

19

Apple Watch

Page 20: Lukas Biewald, CrowdFlower // Enriching Your Data

20

Collecting the Same Data Over and Over

Page 21: Lukas Biewald, CrowdFlower // Enriching Your Data

21

Open Data

Page 22: Lukas Biewald, CrowdFlower // Enriching Your Data

22

Make Your Data Public Setting

Page 23: Lukas Biewald, CrowdFlower // Enriching Your Data

23

Data for Everyone

Page 24: Lukas Biewald, CrowdFlower // Enriching Your Data

24

Data For Everyone Library

Page 25: Lukas Biewald, CrowdFlower // Enriching Your Data

25

Data for Everyone

Page 26: Lukas Biewald, CrowdFlower // Enriching Your Data

26

Data For Everyone

Page 27: Lukas Biewald, CrowdFlower // Enriching Your Data

27

Open Data API

Page 28: Lukas Biewald, CrowdFlower // Enriching Your Data

28

URL Categorization

Page 29: Lukas Biewald, CrowdFlower // Enriching Your Data

29

Categorize URLs

Page 30: Lukas Biewald, CrowdFlower // Enriching Your Data

30

Record Data

Page 31: Lukas Biewald, CrowdFlower // Enriching Your Data

31

Extracting Names and Titles

Page 32: Lukas Biewald, CrowdFlower // Enriching Your Data

32

Summarization

Page 33: Lukas Biewald, CrowdFlower // Enriching Your Data

33

Is an Image Funny?

Page 34: Lukas Biewald, CrowdFlower // Enriching Your Data

34

Classifying Medical Images

Page 35: Lukas Biewald, CrowdFlower // Enriching Your Data

35

Attributes of People

Page 36: Lukas Biewald, CrowdFlower // Enriching Your Data

36

Page 37: Lukas Biewald, CrowdFlower // Enriching Your Data

37

396 Scripts

Page 38: Lukas Biewald, CrowdFlower // Enriching Your Data

Lukas [email protected]@L2K

Thank You