16
ALFRED: Crowd Assisted Data Extraction Valter Crescenzi, Paolo Merialdo, Disheng Qiu Dipartimento di Ingegneria Università degli Studi Roma Tre Via della Vasca Navale, 79, Rome [email protected]

ALFRED demo -

Embed Size (px)

DESCRIPTION

ALFRED: Crowd Assisted Data Extraction

Citation preview

Page 1: ALFRED demo -

ALFRED: Crowd Assisted Data Extraction

Valter Crescenzi, Paolo Merialdo, Disheng Qiu

Dipartimento di IngegneriaUniversità degli Studi Roma TreVia della Vasca Navale, 79, Rome

[email protected]

Page 2: ALFRED demo -

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

1/7

Page 3: ALFRED demo -

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

DB#Wrapper!

1/7

Page 4: ALFRED demo -

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

Inference algorithm!

DB#Wrapper!

1/7

Page 5: ALFRED demo -

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

Inference algorithm!

DB#Wrapper!

1/7

Page 6: ALFRED demo -

Extracting data

2M pages from IMDB, and we want to extract ... titles, directors etc ....

Inference algorithm!

DB#Wrapper!

1/7

Page 7: ALFRED demo -

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

2/7

Page 8: ALFRED demo -

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

Non-expert workers

• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)

2/7

Page 9: ALFRED demo -

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

Non-expert workers

• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)

• Active Learning to carefully select queries

Costs

2/7

Page 10: ALFRED demo -

Scaling Wrapper Inference

Scaling the number of workers with Crowdsourcing platforms opens new challenges:

Issues: Contributions:

Non-expert workers

• Simple interactions to reduce the worker error rate• Membership Query (yes/no answer)

• Active Learning to carefully select queries

Costs

2/7

Quality

• Bayesian Model to evaluate the expected wrapper quality• Sampling algorithms• Tolerant to inaccurate workers

Page 11: ALFRED demo -

Architecture

ALFRED is a wrapper inference system supervised by workers from a crowdsourcing platform.

*Research Track: A Framework for Learning Web Wrappers from the Crowd WWW 2013 3/7

Page 12: ALFRED demo -

Input and Rules Generation

4/7

Page 13: ALFRED demo -

Sample Set and Extracted Values

5/7

Page 14: ALFRED demo -

Sample Set and Extracted Values

page0 page1 page2

r1

r2

r3

Inception City of God Oblivion

Inception City of God null

Inception null Oblivion

6/7

Page 15: ALFRED demo -

Sample Set and Extracted Values

page0 page1 page2

r1

r2

r3

Inception City of God Oblivion

Inception City of God null

Inception null Oblivion

6/7

Page 16: ALFRED demo -

Probability and Noisy

7/7