12
Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Embed Size (px)

Citation preview

Page 1: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik

Presented by Bryan Wilhelm

Page 2: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Problem DescriptionA single entity may be referenced in separate

records in textually dissimilar ways.For example “Robert” and “Bob”.

Traditional text similarity functions such as edit distance and jaccard coefficient cannot handle these cases.

Current research is looking at string transformation databases.

These databases can be extremely large.

Page 3: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Problem Description

Page 4: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Solution: DefinitionsRule Application

Example: {Olathe→Olathe, 7, 4}

AlignmentRule applications cannot

overlapOrder does not matter

Coverage

Page 5: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Solution: Algorithm

Page 6: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Solution: Algorithm

Page 7: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Solution: Algorithm

Page 8: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Solution: Algorithm

Page 9: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Solution: Algorithm

Page 10: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Record Matching ApplicationGenerating Example Pairs

Traditional text matching methods are used (such as jaccard coefficient).

Input from domain experts could also be considered but this is expensive.

A few incorrect pairs will not effect the end result.

Validation of TransformationsAll approaches involve confirmation by a

domain expert.

Page 11: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Analysis

Page 12: Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

Analysis