Upload
bernard-caldwell
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Beyond Basic Faceted Search
Ori Ben-Yitzhak, …(10 people)IBM Research Lab & Yahoo! ResearchWSDM 2008 (ACM International Conference on Web Search and Data Mining)
Fabruary 9, 2010Presented by Hyo-jin Song
2
Contents
Introduction
Basic Faceted Search– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search– Business Intelligence– Dynamic Facets
Correlated Facets
Conclusion
SNU CSE Homepage Practice
3
Contents
Introduction Basic Faceted Search
– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search
Dynamic Facets
Correlated Facets
Conclusion
SNU CSE Homepage Practice
4
Faceted Search Overview– Used in search applications.– To improve the precision of the search results– Multidimensional or Vertical browsing
※ Nobel Prize Winners Search http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/nobel/Flamenco
Introduction (1/3)
The technique for access-ing
a collection of information represented using a faceted classification
(Wikipedia)
5
The two paradigms of the web search– Navigational Search
To use a hierarchy structure(taxonomy) Users can browse the information space by iteratively narrow-
ing the scope of their quest (Eg. Yahoo! Directory, DMOZ, etc)
– Direct Search Users write their queries as a bag of words in a text box. To be Popular by Web search engines, such as Google, Yahoo!
Search.
– Recently a new approach has emerged, combining both paradigms, namely the faceted search approach.
– multi-dimensional information space by combining text search with a progressive narrowing of choices in each dimension. ※ Source : SIGIR’2006 Workshop on Faceted Search Website
Introduction (2/3)
6
Introduction (3/3)
Facet comprises some attribute– Clearly defined– mutually exclusive– collectively exhaustive aspects, properties of a class– Eg. In a collection of books – author, subject, date
facets.
Beyond Basic Faceted Search– To extends traditional faceted search to support
richer information discovery tasks over more com-plex data models.
– Users enable to gain insight into their data.– A Faceted search engine to support correlated facets
To associate more complex information model with a document across multiple facets are not independent
7
Contents
Introduction
Basic Faceted Search– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search
Dynamic Facets
Correlated Facets
Conclusion
SNU CSE Homepage Practice
8
Apache Lucene Overview– The popular open-source search library.– High-performance, full-featured text search,
faceted search– written entirely in Java– Websites powered by Lucene
Apple, Disney, Eclipse, IBM, MIT DSpace, etc.
Apache Solr Overview– The popular, blazing fast open source enterprise
search platform from the Apache Lucene project– To feature powerful full-text search, hit highlight-
ing, faceted search, dynamic clustering, database integration, rich document(Word, PDF) handling.
Basic Faceted Search Lucean(1/3)
9
Apache Lucene structure (1)– To maintain a index that holds for each term(word)– a postings list : a list of document identifiers and
word offsets within those documents in which this term occurs.
– During search, Lucene uses these posting lists to quickly iterate over all documents.
– ※ Some example of the Postings list (source : lecture slides in IR Class)
Basic Faceted Search Lucean(2/3)
10
Apache Lucene structure (2)– Faceted search enablement requires some addi-
tional processing for each matching document– Adding its contribution to its associated facets.– Lucene makes it easy to plug such functionality
into its iteration over the hits, since it can call a hit collector.
-> The Lucene Stack
Basic Faceted Search Lucean(3/3)
11
Taxonomy VS Folksonomy in Faceted Search
Directed acyclic graph– Nodes represent facets– Directed edges denote the refinement relations be-
tween nodes.
The two approaches to ingesting documents in faceted collections– 1. To be given the full taxonomy before indexing
Documents must specify the taxonomy nodes.
– 2. Not to be given that and have to learn it ,while index-ing, from the ingested documents
Documents specify the taxonomy paths to which they corre-spond.
Basic Faceted Search Data Model and Document Ingestion (1/3)
12
The Process of the second approach– The application must add taxonomy nodes or facet
paths to each document prior to adding it to the index.– To infer the facet hierarchy from the plurality of paths
encountered when indexing the individual documents.– To collect all encountered facet-paths to a forest like
graph– This approach allows that new facet paths may be
seamlessly introduced without the need for any admin-istrative action, as new documents are ingested.
– Our inferred taxonomy will automatically expand to ac-commodate the new data.
Basic Faceted Search Data Model and Document Ingestion (2/3)
13
The Process of the second approach– The output of the indexing process
after ingesting two documentsdoc and doc2 is in Figure 1
– The resulting taxonomy is a forest of treesRather than a general DAG.
– Each associated with the facet pathsshown in Table 2
– The facet forest maintainedby the taxonomy index
Basic Faceted Search Data Model and Document Ingestion (3/3)
14
Contents
Introduction
Basic Faceted Search– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search Dynamic Facets
Correlated Facets
Conclusion
SNU CSE Homepage Practice
15
Business Intelligence– A broad category of applications and technologies for
gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions.
– Figure 2 shows an example of aggregations when searching for “world wide web” over a subset of Ama-zon’s book catalog.
Multifaceted search model– By allowing a faceted query to specify any number of
aggregation expressions that are to be calculated per target facet
– Returning the values of these aggregations for each path of the corresponding facet set in the result set.
Extending Multifaceted Search
17
Contents
Introduction
Basic Faceted Search– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search
Dynamic Facets Correlated Facets
Conclusion
SNU CSE Homepage Practice
18
The Dynamic Facets– Typical faceted search : over a set of predetermined
indexed facets i.e. the facets and attributes associated with each docu-
ment must be known at indexing time.
One such attribute might be the date of a document
– Dynamic facets search: To support dynamic time-based facets We can do this : “Sum { qtime-doc.time < 60*60*24*7 } “ In a similar fashion, one can support the categorization of
search results into spatial dynamic facets. E.g. count the number of results in certain radii around a
locationthat is specified by the query.
Dynamic Facets
19
Contents
Introduction
Basic Faceted Search– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search
Dynamic Facets
Correlated Facets Conclusion
SNU CSE Homepage Practice
20
The Correlated Search– The standard faceted search data model
Each document has a certain set of facet values E.g. a product (represented by a document) will have a certain
color, size, price The product is essentially available in all combinations of these
colors and sizes In the cross-product, {color/red, color/blue} x {size/small, size/
medium}
Correlated Search
21
Contents
Introduction
Basic Faceted Search– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search
Dynamic Facets
Correlated Facets
Conclusion SNU CSE Homepage Practice
22
Faceted Search– The more ambient web, The more needs for Faceted search
– Many complicated domains
– The combinations of many technology
For future work (some of which already started) – Aggregate cross products of dimensions (as in OLAP cubes)
– Update facet values and numeric attributes of documents
without requiring the re-indexing of the document
– Faceted search across a distributed index
Conclusion
23
Contents
Introduction
Basic Faceted Search– Lucene– Data Model and Document Ingestion
Extending Multifaceted Search
Dynamic Facets
Correlated Facets
Conclusion
SNU CSE Homepage Practice
24
SNU CSE Homepage Practice Basic Web Documents Search
– By Crawling All documents in CSE Homepage & All Laboro-tary Homepage
– The construction of the Inverted List– The Core source of Faceted Search or etc
Extension Search– Graph Visualization– Faceted Search– The domains of Department, Professor, Laboratary, Course– Web2.0 Technology – AJAX, Reverse AJAX, COMET, etc