7
Mayank Kejriwal Information Sciences Institute/USC [email protected] http://usc-isi-i2.github.io/kejriwal/

Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC [email protected]

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu

Mayank KejriwalInformation Sciences Institute/USC

[email protected]

http://usc-isi-i2.github.io/kejriwal/

Page 2: Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu
Page 3: Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu
Page 4: Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu

Given one or more attribute-rich graphs, a training set of linked node pairs, how do we avoid evaluating allnode pairs (O|V|2)?

Page 5: Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu

Blocks 1 2 3 4

5

Apply blocking keye.g. Tokens(LastName)

Generate candidate set (7 pairs), apply similarity function on each pair

?

?

?

?

?

?

?

Dataset 1

Dataset 2

‘Exhaustive’ set: 4 X 6=24 pairs

Idea: Candidate Generation via blocking

Page 6: Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu

Even better…learn candidate generation function• Doing it efficiently without losing (much) expressive power:

Disjunctive Normal Form (DNF) blocking keys

• Example:• CharTriGrams(Last_Name) U (Numbers(Address) X Last4Chars(SSN))

• Use functional elements like CharTriGrams to construct complex blocking keys

• Optimal search is NP-Complete, use greedy approximation with guarantees

Page 7: Mayank Kejriwalkejriwalresearch.azurewebsites.net/pdf/adaptive-candidate-generatio… · Mayank Kejriwal Information Sciences Institute/USC kejriwal@isi.edu

Some resultsDNF blocking for RDF Attribute Clustering (AC)

Name Recall Reduction FMeasure Recall Reduction FMeasure

Persons 1 100 99.75 99.88 100 98.86 99.43

Persons 2 99.00 99.79 99.39 99.75 99.02 99.38

Restaurants 100 99.73 99.87 100 95.57 99.79

Eprints-Rexa 98.16 99.28 98.72 99.60 99.37 99.48

IM-Similarity 100 98.14 99.06 100 62.79 77.14

IIMB-059 99.76 93.35 96.45 97.33 73.09 83.49

IIMB-062 47.73 98.11 64.22 77.27 90.80 83.49

Libraries 97.96 99.99 98.96 99.99 99.87 99.93

Parks 95.96 94.41 95.18 99.07 88.27 93.36

Video Game 98.73 99.96 99.34 99.72 99.85 99.79

Average 93.73 98.25 95.11 97.27 91.15 93.53