14
Diversifying Query Results on Semi- Structured Data Md. Mahbub Hasan University of California, Riverside

Diversifying Query Results on Semi-Structured Data

Embed Size (px)

DESCRIPTION

Diversifying Query Results on Semi-Structured Data. Md. Mahbub Hasan University of California, Riverside. XML Document. Bib. PhDThesis. PhDThesis. Paper. School. School. Title. Author. Author. Author. UToronto. First Name. First Name. First Name. Last Name. Last Name. - PowerPoint PPT Presentation

Citation preview

Page 1: Diversifying Query Results on Semi-Structured Data

Diversifying Query Results on Semi-Structured Data

Md. Mahbub HasanUniversity of California,

Riverside

Page 2: Diversifying Query Results on Semi-Structured Data

XML Document

School

UToronto

PhDThesis

First Name

Author

Last Name

MichalisFaloutso

s

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Paper

First Name

Author

Last Name

Michalis Faloutsos

Title

Networking

Bib

Page 3: Diversifying Query Results on Semi-Structured Data

QueryFind all Bibliography records related to

Faloutsos

Bib

Faloutsos

//Bib//Faloutsos

Twig Pattern

XPath Expression

Page 4: Diversifying Query Results on Semi-Structured Data

Results

School

UToronto

PhDThesis

First Name

Author

Last Name

Michalis Faloutsos

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Paper

First Name

Author

Last Name

Michalis Faloutsos

Title

Networking

Bib

Page 5: Diversifying Query Results on Semi-Structured Data

ProblemSuppose we can return the user only two

results( k = 2)Which two results we should return?

Page 6: Diversifying Query Results on Semi-Structured Data

Which Two Results We Should Return?

School

UToronto

PhDThesis

First Name

Author

Last Name

Michalis Faloutsos

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Paper

First Name

Author

Last Name

Michalis Faloutsos

Title

Networking

Bib

Page 7: Diversifying Query Results on Semi-Structured Data

SolutionSuppose we can return the user only two

results( k = 2)Which two results we should return?

Return the results that are most diverse to each otherThe idea is to help the user to better

understand/explore the result set

Page 8: Diversifying Query Results on Semi-Structured Data

Diversity ProblemCan be divided into two subproblems

How to compute the distance between two results?

How to find k most diverse results efficiently from the set of candidate answers?

Page 9: Diversifying Query Results on Semi-Structured Data

How to Compute the Distance between Two Results?Two types of differences between results

Structural differenceContent difference

Page 10: Diversifying Query Results on Semi-Structured Data

Structural Differences

School

UToronto

PhDThesis

First Name

Author

Last Name

MichalisFaloutso

s

Bib

PhDThesis

First Name

Author

Last Name

Christos

Faloutsos

School

UToronto

Bib

Paper

First Name

Author

Last Name

MichalisFaloutso

s

Title

Networking

Bib

Page 11: Diversifying Query Results on Semi-Structured Data

Content Differences

School

UToronto

PhDThesis

First Name

Author

Last Name

MichalisFaloutso

s

Bib

PhDThesis

First Name

Author

Last Name

Christos Faloutsos

School

UToronto

Bib

Paper

First Name

Author

Last Name

MichalisFaloutso

s

Title

Networking

Bib

Page 12: Diversifying Query Results on Semi-Structured Data

Finding Diverse ResultsNaïve Approach

Compute all pair-wise distances of the resultsFind the k-result subset with maximum

diversityChallenges to improve the naïve approach

Reduce the number of distance computationsPrune large fraction of k-result subsets

Page 13: Diversifying Query Results on Semi-Structured Data

ConclusionDistance Measure for Structural Query results

Novel and EfficientConsiders both Structural and Content Information

Diversification AlgorithmHeuristic approach to improve the naïve algorithm

Future WorkConsider approximate matches

Approximation in structure Approximation in value

Page 14: Diversifying Query Results on Semi-Structured Data

Thank You!