25
Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on Web Search and Data Mining) Fabruary 9, 2010 Presented by Hyo-jin Song

Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and

Embed Size (px)

Citation preview

Beyond Basic Faceted Search

Ori Ben-Yitzhak, …(10 people)IBM Research Lab & Yahoo! ResearchWSDM 2008 (ACM International Conference on Web Search and Data Mining)

Fabruary 9, 2010Presented by Hyo-jin Song

2

Contents

Introduction

Basic Faceted Search– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search– Business Intelligence– Dynamic Facets

Correlated Facets

Conclusion

SNU CSE Homepage Practice

3

Contents

Introduction Basic Faceted Search

– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search

Dynamic Facets

Correlated Facets

Conclusion

SNU CSE Homepage Practice

4

Faceted Search Overview– Used in search applications.– To improve the precision of the search results– Multidimensional or Vertical browsing

※ Nobel Prize Winners Search http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/nobel/Flamenco

Introduction (1/3)

The technique for access-ing

a collection of information represented using a faceted classification

(Wikipedia)

5

The two paradigms of the web search– Navigational Search

To use a hierarchy structure(taxonomy) Users can browse the information space by iteratively narrow-

ing the scope of their quest (Eg. Yahoo! Directory, DMOZ, etc)

– Direct Search Users write their queries as a bag of words in a text box. To be Popular by Web search engines, such as Google, Yahoo!

Search.

– Recently a new approach has emerged, combining both paradigms, namely the faceted search approach.

– multi-dimensional information space by combining text search with a progressive narrowing of choices in each dimension. ※ Source : SIGIR’2006 Workshop on Faceted Search Website

Introduction (2/3)

6

Introduction (3/3)

Facet comprises some attribute– Clearly defined– mutually exclusive– collectively exhaustive aspects, properties of a class– Eg. In a collection of books – author, subject, date

facets.

Beyond Basic Faceted Search– To extends traditional faceted search to support

richer information discovery tasks over more com-plex data models.

– Users enable to gain insight into their data.– A Faceted search engine to support correlated facets

To associate more complex information model with a document across multiple facets are not independent

7

Contents

Introduction

Basic Faceted Search– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search

Dynamic Facets

Correlated Facets

Conclusion

SNU CSE Homepage Practice

8

Apache Lucene Overview– The popular open-source search library.– High-performance, full-featured text search,

faceted search– written entirely in Java– Websites powered by Lucene

Apple, Disney, Eclipse, IBM, MIT DSpace, etc.

Apache Solr Overview– The popular, blazing fast open source enterprise

search platform from the Apache Lucene project– To feature powerful full-text search, hit highlight-

ing, faceted search, dynamic clustering, database integration, rich document(Word, PDF) handling.

Basic Faceted Search Lucean(1/3)

9

Apache Lucene structure (1)– To maintain a index that holds for each term(word)– a postings list : a list of document identifiers and

word offsets within those documents in which this term occurs.

– During search, Lucene uses these posting lists to quickly iterate over all documents.

– ※ Some example of the Postings list (source : lecture slides in IR Class)

Basic Faceted Search Lucean(2/3)

10

Apache Lucene structure (2)– Faceted search enablement requires some addi-

tional processing for each matching document– Adding its contribution to its associated facets.– Lucene makes it easy to plug such functionality

into its iteration over the hits, since it can call a hit collector.

-> The Lucene Stack

Basic Faceted Search Lucean(3/3)

11

Taxonomy VS Folksonomy in Faceted Search

Directed acyclic graph– Nodes represent facets– Directed edges denote the refinement relations be-

tween nodes.

The two approaches to ingesting documents in faceted collections– 1. To be given the full taxonomy before indexing

Documents must specify the taxonomy nodes.

– 2. Not to be given that and have to learn it ,while index-ing, from the ingested documents

Documents specify the taxonomy paths to which they corre-spond.

Basic Faceted Search Data Model and Document Ingestion (1/3)

12

The Process of the second approach– The application must add taxonomy nodes or facet

paths to each document prior to adding it to the index.– To infer the facet hierarchy from the plurality of paths

encountered when indexing the individual documents.– To collect all encountered facet-paths to a forest like

graph– This approach allows that new facet paths may be

seamlessly introduced without the need for any admin-istrative action, as new documents are ingested.

– Our inferred taxonomy will automatically expand to ac-commodate the new data.

Basic Faceted Search Data Model and Document Ingestion (2/3)

13

The Process of the second approach– The output of the indexing process

after ingesting two documentsdoc and doc2 is in Figure 1

– The resulting taxonomy is a forest of treesRather than a general DAG.

– Each associated with the facet pathsshown in Table 2

– The facet forest maintainedby the taxonomy index

Basic Faceted Search Data Model and Document Ingestion (3/3)

14

Contents

Introduction

Basic Faceted Search– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search Dynamic Facets

Correlated Facets

Conclusion

SNU CSE Homepage Practice

15

Business Intelligence– A broad category of applications and technologies for

gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions.

– Figure 2 shows an example of aggregations when searching for “world wide web” over a subset of Ama-zon’s book catalog.

Multifaceted search model– By allowing a faceted query to specify any number of

aggregation expressions that are to be calculated per target facet

– Returning the values of these aggregations for each path of the corresponding facet set in the result set.

Extending Multifaceted Search

16

Extending Multifaceted Search

17

Contents

Introduction

Basic Faceted Search– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search

Dynamic Facets Correlated Facets

Conclusion

SNU CSE Homepage Practice

18

The Dynamic Facets– Typical faceted search : over a set of predetermined

indexed facets i.e. the facets and attributes associated with each docu-

ment must be known at indexing time.

One such attribute might be the date of a document

– Dynamic facets search: To support dynamic time-based facets We can do this : “Sum { qtime-doc.time < 60*60*24*7 } “ In a similar fashion, one can support the categorization of

search results into spatial dynamic facets. E.g. count the number of results in certain radii around a

locationthat is specified by the query.

Dynamic Facets

19

Contents

Introduction

Basic Faceted Search– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search

Dynamic Facets

Correlated Facets Conclusion

SNU CSE Homepage Practice

20

The Correlated Search– The standard faceted search data model

Each document has a certain set of facet values E.g. a product (represented by a document) will have a certain

color, size, price The product is essentially available in all combinations of these

colors and sizes In the cross-product, {color/red, color/blue} x {size/small, size/

medium}

Correlated Search

21

Contents

Introduction

Basic Faceted Search– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search

Dynamic Facets

Correlated Facets

Conclusion SNU CSE Homepage Practice

22

Faceted Search– The more ambient web, The more needs for Faceted search

– Many complicated domains

– The combinations of many technology

For future work (some of which already started) – Aggregate cross products of dimensions (as in OLAP cubes)

– Update facet values and numeric attributes of documents

without requiring the re-indexing of the document

– Faceted search across a distributed index

Conclusion

23

Contents

Introduction

Basic Faceted Search– Lucene– Data Model and Document Ingestion

Extending Multifaceted Search

Dynamic Facets

Correlated Facets

Conclusion

SNU CSE Homepage Practice

24

SNU CSE Homepage Practice Basic Web Documents Search

– By Crawling All documents in CSE Homepage & All Laboro-tary Homepage

– The construction of the Inverted List– The Core source of Faceted Search or etc

Extension Search– Graph Visualization– Faceted Search– The domains of Department, Professor, Laboratary, Course– Web2.0 Technology – AJAX, Reverse AJAX, COMET, etc

Thank You!Any Questions or Comments?