31
Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar IDB Tagging Team, School of CSE, SNU Presented by Kangpyo Lee

Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

Embed Size (px)

Citation preview

Page 1: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

Social Search and Discovery Using a Unified Approach

Einat Amitay et al.IBM Research Lab in Haifa, IsraelHT’09

18 March 2011Presentation @ IDB Lab Seminar

IDB Tagging Team, School of CSE, SNUPresented by Kangpyo Lee

Page 2: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

2

A Variety of Web Search Types

Social SearchPersonalized

Search

Unified SearchUniversal Search

Multi-entity Search

Faceted SearchMulti-faceted

Search

Exploratory Search

Vertical Search

Page 3: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

3

Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary

Page 4: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

4

Introduction Recent Web 2.0 applications (e.g., web logs, collaborative

bookmarking systems, and social networks) introduce new entities & relations in addition to regular web pages

Web 2.0 entities relate to each other in several ways– Documents may relate to other documents by referencing each other– A user may relate to a document through authorship relation, as a

tagger, as an author, or as mentioned in the page’s content – A user may relate to other users through social relations – A tag relates to the bookmark it is associated with, and also to the

tagger

These entities & relations may prove valuable in enhancing the search experience – By serving as potential search results– By influencing ranking algorithms

Page 5: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

5

Introduction We present and evaluate novel methods for leverag-

ing social information to enhance search results and discover relations between Web 2.0 applications

Our approach leverages a unified representation of the entities and their relations

We then use this intricate heterogeneous collection to establish an all-encompassing social search solution

Page 6: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

6

Introduction Social search solution

– Allows users to query for specific entities and retrieve results of all relevant types

– The system returns, in addition to standard search results, users related to the query, as well as tags that are associ-ated with relevant documents

– These tags can be further used to categorize the search re-sults and to better refine the searcher’s information need

We use the term social search engine to describe this multi-entity search system based on “social” data

Our social search system is the only one that pro-vides a unified approach for searching and retrieving entities of all types

Page 7: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

7

Introduction

- Unified Approach

Our social data include records of users’ public activ-ity with documents – such as bookmarking, tagging, rating, or comments made to

other public Web 2.0 entities

Our system allows the search for any object type (e.g., documents, person, or tag) and the retrieval of all entity types

The system supports – Standard textual queries – Entity queries – Any combination of the two

Page 8: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

8

Introduction

- Unified Approach

The social search engine is based on the unified search approach

Unified search – A.k.a. heterogeneous interrelated entity search – An emerging paradigm within IR – The search space is expanded to represent heterogeneous

information about objects that may relate to each other in several ways

Direct relations Indirect relations

The system must be scalable, responsive, and reflect the rapid update patterns typical in Web 2.0 systems

Page 9: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

9

Introduction

- Unified Approach

We present a novel realization of unified search para-digm based on multifaceted search – Represents each of the system’s entities by a retrievable

document– Direct relations between entities are represented by marking

one of the elements as a “facet” of its counterpart

– The strength of the relationship between the two objects is represented by the strength of document-facet relationship

A BDirect Relation

- A is one of B’s facets- B is one of A’s facets

Page 10: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

10

Introduction

- Unified Approach

An efficient mechanism for updating relations be-tween objects as well as efficient search over the heterogeneous data – Only direct relations between objects need to be updated

when new entities are added – Indirect relations are dynamically induced from the direct re-

lations and computed on-the-fly during query execution time

– Directly-related objects are retrieved and scored during run-time using the search engine’s regular scoring mechanisms

– Indirectly-related entities are retrieved and scored using an implementation of faceted search

Page 11: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

11

Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary

Page 12: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

12

Related Work Social search

– The set of annotations provided by the public can be used to enrich the page content

– The # of annotations of a web page can be used as additional evidence of document quality for improved ranking of search results

– Social data enables users to search for other people with whom thy maintain relationships in the network

Social ranking – Ranking all entities retrieved by the social search engine – FolkRank and SocialPageRank – Applying PageRank-like computation depends heavily on the

graph size and is expected to be very slow – Different entity types provide different retrieval values for the

searcher, hence they should be ranked according to their own characteristics

Page 13: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

13

Related Work

- Multi-Entity Search

Multi-entity search – Extending basic search functionality by answering user

queries with many types of entities – Usually based on analysis of the relationship between enti-

ties and documents relevant to the query

Searching over a multi-entity graph – Nodes are entities (terms, documents, persons, annotations) – Edges are the relations between the entities

SimFusion uses a Unified Relationship Matrix (URM) to represent the multi-entity graph

Page 14: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

14

Related Work

- Multi-Entity Search

Unified Relationship Matrix (URM) – Relations between two object types are represented via a re-

lationship matrix Mij

– The (k, l) entry of matrix Mij represents the strength of the re-lation between the object pairs (ok, ol) of types Oi and Oj re-spectively

– The URM matrix U Encapsulates all matrices to provide a unified representation of

the unified search space Provides relationship strength between any two directly related

entities, along with a theoretically elegant way to calculate indi-rect relations through matrix multiplication

Page 15: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

15

Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary

Page 16: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

16

Implementation Our solution to unified search represents each object

in the system in two ways – (1) as a retrievable document – (2) as a facet (category) of all the objects to which it relates

A unified representation of a collaborative bookmark-ing system – Three object types – web pages, users, and tags– Each object type is associated with a corresponding docu-

ment – a web page document, a user document, and a tag document

– Three relationship types A user-type facet between a user & the tagged web page A tag-type facet between a tag & the associated web page A user-type facet between a user & a tag used for bookmarking

Page 17: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

17

Implementation

- Scoring Indirectly Related Objects

The strength of the indirect relation between object o1 & o2

– U(o, o’) – the corresponding entry in the URM matrix – Equivalent to squaring the URM matrix

Provides the relationship strength of order two between any two objects

Eq. 1 can be generalized to score objects based on their indirect relations with any query – The score vector s0(q) provides the direct scores of all N ob-

jects in the system to the query

– The score vector s1(q) provides the indirect scores of all ob-jects

Page 18: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

18

Implementation

- Scoring Indirectly Related Objects

In addition, objects can be scored according to their relative popularity, or authority – FolkRank or SocialPageRank can be used – Inverse entity frequency (ief) score

N – the # of all objects in the system No – the # of objects directly related to o Penalizes objects that are related to many objects in general

The final score of object o for a query q

Page 19: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

19

Implementation

- Multifaceted Search

Multifaceted search aims to combine the two main search approaches:– Direct search – Navigational search – offering navigational refinement on the

results by categorizing the search results into predefined facets along with the counts of results per facet

Multifaceted search has become the prevailing user interaction mechanism in e-commerce sites – Now being extended to deal with semi-structured data, con-

tinuous dimensions, and folksonomies

Page 20: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

20

Implementation

- Multifaceted Search

The scores of directly related objects are equivalent to the scores as represented by s0(q)

The score of an indirectly related object, o, is com-puted by aggregating its relationship strength with all matching documents, multiplied by their direct score

– w(o, oi) – the relationship strength between the document oi & its facet o

– Equivalent to Eq. 2 since w(o, oi) = U(o, oi)

Indirectly related objects are represented by accumu-lating all facets of the same type

Page 21: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

21

Implementation

- Efficiency Factors

Two issues regarding use of the URM matrix for social search – 1) the need for efficient computation of indirect relations – 2) efficient dynamic updates

The universal query (q = ‘*’) that retrieves all the ob-jects, indexed by the system as well as all objects re-lated to them, has a query runtime of less than four seconds

Dynamic updates are handled by a mechanism that is implemented by storing the changes in an external databases

Page 22: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

22

Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary

Page 23: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

23

Social Search within the Enterprise

Textual Query Entity Query

Page 24: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

24

Social Search within the Enterprise

- Social Data & Social Search Application

Web 2.0 services of IBM – Dogear – a collaborative bookmarking service (373,821

bookmarks, 234,856 web pages)– BlogCentral – a central blog service (77,930 blog threads) – BluePages – the enterprise directory and employee profile

application (15,779 IBMers)– About 700,000 unique entities – Cow Search – the social search application available to all

users of IBM’s intranet

Page 25: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

25

Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary

Page 26: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

26

User Study Our goal was to measure both the quality of the re-

turned document set and the related users and tags – The evaluation methodologies for documents are well known

and have standard measures– There are no standard ways of measuring the quality of re-

lated users of tags

A user study was thus used– The retrieved documents were examined and marked with

three relevance levels (0-not relevant, 1-marginally relevant, 2-highly relevant)

– The quality of search results was measured by the normal-ized discount cumulative gain (NDCG) measure

– To evaluate the effectiveness of the related people, we emailed and asked the 612 random users to rate on a Likert scale of 1 to 5

Page 27: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

27

User Study

- Results

Social data contribution to enterprise search – We measure the quality of search results using manual as-

sessments of the top-k search results for the 50 chosen queries

Page 28: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

28

User Study

- Results

Related users

Related tags

Page 29: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

29

Outline Introduction Related Work Implementation Social Search within the Enterprise User Study Summary

Page 30: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

30

Summary Social data is valuable

– 1. The high precision of top retrieved documents demon-strate that user feedback identifies high quality content in the corpus

– 2. User comments and tags are highly beneficial in general and augment the description of system entities, while provid-ing additional evidence for object popularity

Future research – Exploiting personal social networks for search personaliza-

tion – Documents or tags recommendations – Quantifying the contribution of social objects to the effec-

tiveness of the search system

Page 31: Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 Presentation @ IDB Lab Seminar

Thank You!Any Questions or Comments?