14
Diversifying Query Results on Semi- Structured Data Md. Mahbub Hasan University of California, Riverside

Md. Mahbub Hasan University of California, Riverside

Embed Size (px)

Citation preview

Diversifying Query Results on Semi-Structured Data

Md. Mahbub HasanUniversity of California,

Riverside

XML Document

School

UToronto

PhDThesis

First Name

Author

Last Name

MichalisFaloutso

s

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Paper

First Name

Author

Last Name

Michalis Faloutsos

Title

Networking

Bib

QueryFind all Bibliography records related to

Faloutsos

Bib

Faloutsos

//Bib//Faloutsos

Twig Pattern

XPath Expression

Results

School

UToronto

PhDThesis

First Name

Author

Last Name

Michalis Faloutsos

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Paper

First Name

Author

Last Name

Michalis Faloutsos

Title

Networking

Bib

ProblemSuppose we can return the user only two

results( k = 2)Which two results we should return?

Which Two Results We Should Return?

School

UToronto

PhDThesis

First Name

Author

Last Name

Michalis Faloutsos

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Paper

First Name

Author

Last Name

Michalis Faloutsos

Title

Networking

Bib

SolutionSuppose we can return the user only two

results( k = 2)Which two results we should return?

Return the results that are most diverse to each otherThe idea is to help the user to better

understand/explore the result set

Diversity ProblemCan be divided into two subproblems

How to compute the distance between two results?

How to find k most diverse results efficiently from the set of candidate answers?

How to Compute the Distance between Two Results?Two types of differences between results

Structural differenceContent difference

Structural Differences

School

UToronto

PhDThesis

First Name

Author

Last Name

MichalisFaloutso

s

Bib

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Bib

Paper

First Name

Author

Last Name

MichalisFaloutso

s

Title

Networking

Bib

Content Differences

School

UToronto

PhDThesis

First Name

Author

Last Name

MichalisFaloutso

s

Bib

PhDThesis

First Name

Author

Last Name

Christos Faloutsos

School

UToronto

Bib

Paper

First Name

Author

Last Name

MichalisFaloutso

s

Title

Networking

Bib

Finding Diverse ResultsNaïve Approach

Compute all pair-wise distances of the resultsFind the k-result subset with maximum

diversityChallenges to improve the naïve approach

Reduce the number of distance computationsPrune large fraction of k-result subsets

ConclusionDistance Measure for Structural Query results

Novel and EfficientConsiders both Structural and Content Information

Diversification AlgorithmHeuristic approach to improve the naïve algorithm

Future WorkConsider approximate matches

Approximation in structure Approximation in value

Thank You!