28
The Palm-tree Index Indexing with the crowd Ahmed R Mahmood* Walid G. Aref* Eduard Dragut* Saleh Basalamah** *Purdue University **Umm AlQura University

The Palm-tree Index Indexing with the crowd Ahmed R Mahmood*Walid G. Aref* Eduard Dragut*Saleh Basalamah** *Purdue University**Umm AlQura University

Embed Size (px)

Citation preview

The Palm-tree Index Indexing with the crowdAhmed R Mahmood* Walid G. Aref*

Eduard Dragut* Saleh Basalamah**

*Purdue University **Umm AlQura University

Outline

• Motivation• Taxonomy for Crowd-based Indexing• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work

Motivation

500 1000 200 100 100 200 500 1000

Outline

• Motivation• Taxonomy for Crowd-based Indexing• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work

Taxonomy

Outline

• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work

Problem Definition

• Let S be a set of N keys (e.g., images or videos) and q be a query

• B+-tree-like index is constructed over S • Study how to use human workers to search the index• Workers perform subjective comparisons between the

query image and tree keys, and make subjective decisions, e.g., – Less than, greater than, almost the same– Better, worse, almost the same– Cheaper, more expensive, almost the same

Outline

• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work

Index StructureWhy B+-tree?

What is tree order and height?How to construct tree?What are performance metrics?

Index Structure

• Why B+-tree?– To obtain predictive query cost– Cost reduction with more keys per node

• How is the tree order and height determined?– Set by the ability of workers to process at once a specific

number of keys

Index height

Fixed order

Erro

r

Index order

Fixed height

Erro

r

Order increaseHeight decrease

Fixed dataset size

Erro

r

Index Construction: How to grow a palm tree?

• Key associated with some “Quantitative Value”– Keys have a subjective property and an associated

quantitative value– Index constructed based on the quantitative value– Example: Damaged car images with repair cost

• Key car image• Subjective property car damage• Qualitative value repair cost

500 1000 200 100 100 200 500 1000

Index Construction: How to grow a palm tree? (Cont’d)

• Key associated with some “Qualitative Property”

• Keys have a subjective property only• Index constructed by successive insertions • e.g. images of butterflies to be ordered based on

beauty

Performance Metrics

• What are performance metrics?– Error: Distance between ground truth and

selected result– Cost: Total number of tasks to complete a job

Cost

Erro

r

Outline

• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work

Traversal Algorithms

• How to descend the tree?– Leaf-only aggregation– All-level aggregation– All-level aggregation with backtracking

Leaf-Only Aggregation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2 4 6 8 10 12 14 16

3 7 11 15

5 13

9 w1w3w2Tasks per worker

444

• Even budget distribution – Number of workers = Budget/Tree Height

Budget: 12

All-Levels Aggregation

w1w2 w3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2 4 6 8 10 12 14 16

3 7 11 15

5 13

9

w1w2

w3

• Even budget distribution – Replication per level = Budget/Tree Height

Tasks per level

3333

Budget: 12

All-Levels Aggregation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2 4 6 8 10 12 14 16

3 7 11 15

5 13

9

• Uneven budget distribution based on – Probability of distance d error at level l: Pdl

– Expected Distance Error per level: EDE

tasks per level

63

EDE

3

1.5

1

.5

Budget: 12

21

Algorithms: Crowd-Search Backtracking All-Levels Aggregation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2 4 6 8 10 12 14 16

3 7 11 15

5 13

9Node A

Node BNode C

Node D

Outline

• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work

Preliminary Experimental ResultsExperimental Setup

• Squares dataset– Generated 200 images of squares with different sizes

• Cars dataset– 1300 image of used cars associated with desired selling

prices– Collected using a custom crawler from the Craigslist

Website• Crowd:– Students in the DB Group at Purdue (and their spouses) – (IRB Approval)

Preliminary Experimental Results Sample task

Preliminary Experimental Results Sample task

Preliminary Experimental Results

• Higher error on cars dataset

• Error increases as fanout increases

• Error decreases as number of replications increase

• All-levels aggregation has less error than leaf-only aggregation

• Mean Error while changing the tree fanout and the number of workers (replications)

Preliminary Experimental Results

• Mean Cost while changing the tree fanout and the number of workers (replications)

• The taller the tree the higher the cost

• Higher cost on the cars dataset (has more keys)

• More replications involve higher cost

Order increaseHeight decrease

Fixed dataset size

Erro

r

Outline

• Motivation• Taxonomy• Problem Definition• The Palm-tree Index Structure• Traversal Algorithms• Preliminary Experimental Results• Conclusions and Future Work

Conclusions and Future Work

• Conclusions– The Palm-tree allows employing humans to

perform index operations on keys that cannot be indexed by computer

• Future Work– More extensive experimental evaluation– Mathematical analysis – Multi-dimensional indexing

Questions?