Download pdf - Supporting Exploration and Serendipity in Information Retrieval

Supporting Exploration and Serendipity in Information Retrieval

Nattiya Kanhabua

Department of Computer and Information Science Norwegian University of Science and Technology

24 February 2012

Nattiya Kanhabua 2 Trial lecture

• Typical search engines – Lookup-based paradigm – Known-item search

Motivation

World Wide Web

Document Index

query

results

Does this paradigm satisfy all types of information needs?


Two tasks when searching for unknown:

1. Exploratory Search – Users perform information seeking

• E.g., collection browsing or visualization – Human-computer interaction

2. Serendipitous IR – Systems predict/suggest interesting information

• E.g., recommender systems – Asynchronous manner

Beyond the lookup-based paradigm


The next generation of search

The movie: Minority Report 2002.

PART I – EXPLORATORY SEARCH

Trial lecture Nattiya Kanhabua 5


• Information-seeking task [Marchionini 2006, White 2006a] – Seek for unknown, or an open-end problem – Complex information needs – No knowledge about the contents

Exploratory search

Document Index

query

results

? ?


Exploratory search activities

G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41–46, 2006.


Features of exploratory search

Query (re)formulation in real-time

Exploiting search context

Facet-based and metadata result filtering

Learning and understanding support

Result visualization


• Help users to formulate information needs in an early stage [Manning 2008]

• Query suggestion

– Support by major search engines – Based on query logs analysis

• Query-by-example – Search using examples of documents

Query (re)formulation


• Effective systems must adapt to contextual constraints [Ingwersen 2005]

– Time, place, history of interaction, task in hand, etc.

• Types of context 1. Explicitly provided feedbacks

• E.g., select relevant documents 2. Implicitly obtained user information

• E.g., mine users’ interaction behaviors [Dumais 2004, Kelly 2004]

Leveraging search context


Facet-based result filtering

• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata

• Facet search provides an ability to: – Explore results via properties – Expand or refine the search


Facet-based result filtering

• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata

• Facet search provides an ability to: – Explore results via properties – Expand or refine the search

• No metadata? – Categorization – Clustering


• Provide overviews of the collection and search results – To understand and support an analysis

• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]











• Provide facilities for deriving meaning from search results • Examples

– Wikify!: linking documents to encyclopedic knowledge [Mihalcea 2007]

– Learning to link with Wikipedia [Milne 2008] – Generating links to background knowledge [He 2011]

Support learning and understanding


• Evaluation metrics for exploratory search [White 2006b]

1. Engagement and enjoyment • The degree to which users are engaged and are experiencing

2. Information novelty • The amount of new information encountered

3. Task success 4. Task time

• Time spent to reach a state of task completeness 5. Learning and cognition

• The amount of the topics covered, or and the number of insights users acquire

Evaluation of exploratory search


• Collaborative and social search – Support of task division and knowledge sharing – Allow the team to move rapidly toward task – Provide already encountered information

Future direction

PART II – SERENDIPITOUS IR

Trial lecture Nattiya Kanhabua 19

Nattiya Kanhabua

20

Trial lecture

• Serendipity [Andel 1994] – The act of encountering relevant information unexpectedly

• Task: Predict and suggest relevant information – E.g., recommender systems

Serendipitous IR

20


• Motivation [Adomavicius 2005, Jannach 2010] – Ease information overload – Business intelligence

• Increase the number of products sold • Sale products from the long tail • Improve users’ experience

• Real-world applications

– Book: Amazon.com – Movie: Netflix, IMDb – News: Yahoo, New York Times – Video & music: YouTube, Last.fm

Recommender systems


• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)

• Goal: – Predict the relevance score of items – Recommend k items based on the scores

Problem statements

Recommender System

Item collection

Item Score

I1 0.8

I2 0.6

I3 0.5

Non-personalized recommendation


• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)

• Goal: – Predict the relevance score of items – Recommend k items based on the scores

Problem statements

Recommender System

Item collection

Item Score

I1 0.8

I2 0.6

I3 0.5

Non-personalized recommendation Personalized recommendation

User information


• Two main approaches – Content-based – Collaborative filtering

Personalized recommendation

Item Score

I1 0.8

I2 0.6

I3 0.5

Recommender System

Item collection

User information

Title Genre Actor …

Product features

Content-based recommendation


• Two main approaches – Content-based – Collaborative filtering

Personalized recommendation

Item Score

I1 0.8

I2 0.6

I3 0.5

Recommender System

Item collection

User information

Collaborative filtering recommendation

Community data


• Basic idea

– Give me “more like this” – Exploit item descriptions (contents) and user preferences

• No rating data is needed


Genre

Director, Writers, Stars


• Basic idea

– Give me “more like this” – Exploit item descriptions (contents) and user preferences

• No rating data is needed • Approach

1. Represent information as bag-of-word 2. Compute the similarity between the preferences and an unseen item,

e.g., the Dice coefficient or the cosine similarity [Manning 2008]


User profiles

Contents Title Genre Director Writer Start

The Twilight Saga: Eclipse

Adventure, Drama, Fantasy

David Slade

Melissa Rosenber, Stephenie Meyer

Kristen Stewart, Robert

Pattinson

Harry Potter and the Deathly

Hallows: Part 1

Adventure, Drama, Fantasy

David Yates

Steve Kloves, J.K. Rowling

Daniel Radcliffe, Emma Watson

Title Genre Director Writer Start

The Lord of the Rings: The Return

of the King

Action, Adventure,

Drama

Peter Jackson J.R.R. Tolkien, Fran Walsh

Elijah Wood, Viggo Mortensen


• Basic idea [Balabanovic 1997]

– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste

• Basic approach – Use a matrix of user-item ratings (explicit or implicit)

Collaborative filtering (CF)




• Basic approach – Use a matrix of user-item ratings (explicit or implicit)


Implicit rating - Clicks - Page views - Time spent on a page




• Basic approach – Use a matrix of user-item ratings (explicit or implicit) – Predict a rating for an unseen item



• Given the active user and a matrix of user-item ratings • Goal: predict a rating for an unseen item by

1. Find a set of users (neighbors) with similar ratings

2. Estimate John’s rating of Item5 from neighbors’ ratings 3. Repeat for all unseen items and recommend top-N items

User-based nearest-neighbor CF

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1


• Measure user similarity, e.g., Pearson correlation – a, b : users – ra,p : rating of a for item p, , = users’ averaged ratings – P : set of items, rated by both a and b

Find neighbors


John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

sim = 0.85

sim = 0.70 sim = -0.79


• Prediction function – Combine the rating differences – Use the user similarity as a weight

Estimate a rating


John 5 3 4 4 4.87

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

sim = 0.85

sim = 0.70


• Basic idea – Use the similarity between items (instead of users) – Item-item similarity can computed offline

• Example – Look for items that are similar to Item5, or neighbors – Predict the rating of Item5 using John's ratings of neighbors

Item-based nearest-neighbor CF


John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1


• Sparse data – Users do not rate many items

• Cold start – No rating for new users or new items

• Scaling problem – Millions of users and thousands of items – m = #users and n = #items – User-based CF

• Space complexity O(m2) when pre-computed • Time complexity for computing Pearson O(m2n)

– Item-based CF • Space complexity is reduced to O(n2)

Problems of CF


• How to solve the sparse data problem? – Ask users to rate a set of items – Use other methods in the beginning

• E.g., content-based, or non-personalized

• How to solve the scaling problem? – Apply dimensionality reduction

• E.g. matrix factorization

Possible solutions


• Basic idea [Koren 2008] – Determine latent factors from ratings

• E.g., types of movies (drama or action) – Recommend items from the determined types

• Approach – Apply dimensionality reduction

• E.g., Singular value decomposition (SVD) [Deerwester 1990]

Matrix factorization


• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches

• Approach 1. Pipelined hybridization

• Use content-based to fill up entries, then use CF [Melville 2002]

Hybrid recommendation


• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches

• Approach 1. Pipelined hybridization

• Use content-based to fill up entries, then use CF [Melville 2002]

2. Parallel hybridization • Feature combination: ratings, user preferences and constraints

Hybrid recommendation


• Temporal dynamics of recommender systems – Items has short lifetimes, i.e., dynamic set of items – User behaviors depend on moods or time periods – Attention to breaking news stories decay over time – Challenge: how to capture /model temporal dynamics?

• TimeSVD++ [Koren 2009] • Tensor factorization [Xiong 2010]

• Temporal diversity [Lathia 2010]

Future directions


• Group recommendations [McCarthy 2006] – Recommendations for a group of users or friends – Challenge: how to model group preference?

• Context-aware recommendations [Adomavicius 2011] – Context, e.g., demographics, interests, time and place,

moods, weather, so on – Challenge: how to combine different context?

Future directions (cont’)


1. Exploratory Search – Users perform information seeking

• E.g., collection browsing or visualization – Human-computer interaction

2. Serendipitous IR – Systems predict/suggest interesting information

• E.g., recommender systems – Asynchronous manner

Conclusions


• [Dumais 2003] S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I’ve seen: A system for personal information retrieval and re-use. In Proceedings of SIGIR, pp. 72-79, 2003.

• [Dumais 2004] S. T. Dumais, E. Cutrell, R. Sarin and E. Horvitz. Implicit queries (IQ) for contextualized search. In Proceedings of SIGIR, p. 594, 2004.

• [Ingwersen 2005] P. Ingwersen and K. Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context. The Information Retrieval Series, Springer-Verlag, New York, 2005.

• [He 2011] J. He, M. de Rijke, M. Sevenster, R. C. van Ommering and Y. Qian. Generating links to background knowledge: a case study using narrative radiology reports. In Proceedings of CIKM, pp. 1867-1876, 2011.

• [Kelly 2004] D. Kelly, and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of SIGIR, pp. 377-384, 2004.

• [Manning 2008] C. D.Manning, P. Raghavan and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

• [Matthews 2010] M. Matthews, P. Tolchinsky, P. Mika, R. Blanco and H. Zaragoza. Searching through time in the New York Times. In HCIR Workshop, 2010.

• [Marchionini 2006] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41-46, 2006.

• [Mihalcea 2007] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of CIKM, pp. 233-242, 2007.

• [Milne 2008] D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proceedings of CIKM, pp. 509-518, 2008. • [Tunkelang 2009] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. • [Viégas 2007] F. B. Viégas, M. Wattenberg, F. van Ham, J. Kriss and M. M. McKeon. Many eyes: A site for visualization at

internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1121-1128, 2007. • [White 2006a] R. W. White, B. Kules, S. M. Drucker and m. c. schraefel. Supporting exploratory search: Introduction to

special section. Communications of the ACM, 49(4), pp. 36-39, 2006 • [White 2006b] R. W. White, G. Muresan, and G. Marchionini. Report on ACM SIGIR 2006 workshop on evaluating

exploratory search systems. SIGIR Forum, 40(2), pp. 52-60, 2006. • [White 2009] R. W. White and R. A. Roth. Exploratory Search: Beyond the Query-Response Paradigm. Morgan & Claypool

Publishers, 2009.

References


• [Agarwal 2010] D. Agarwal and B. C.Chen. Recommender Systems Tutorial. In ACM SIGKDD, 2010. • [Adomavicius 2005] G. Adomavicius and A. Tuzhilin: Toward the Next Generation of Recommender Systems: A Survey of

the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), pp. 734-749, 2005 • [Adomavicius 2011] G. Adomavicius and A. Tuzhilin. Context-Aware Recommender Systems. In Recommender Systems

Handbook, pp. 217-253, 2011. • [Andel 1994] P. V. Andel. Anatomy of the Unsought Finding. Serendipity: Origin, history, domains, traditions, appearances,

patterns and programmability. The British Journal for the Philosophy of Science45(2), pp. 631-648, 1994. • [Balabanovic 1997] M. Balabanovic and Y. Shoham. Content-based, collaborative recommendation. Communication of

ACM 40(3), pp. 66-72, 1997. • [Deerwester 1990] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harshman. Indexing by Latent

Semantic Analysis. In JASIS 41(6), pp. 391-407, 1990. • [Jannach 2010] D. Jannach, M. Zanker, A. Felfernig and G. Friedrich. Recommender Systems: An Introduction. Cambridge

University Press, 2010[Koren 2008] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of KDD, pp. 426-434, 2008.

• [Koren 2009] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of KDD, pp. 447-456, 2009. • [Lathia 2010] N. Lathia, S. Hailes, L. Capra and X. Amatriain. Temporal Diversity in Recommender Systems. In Proceedings

of SIGIR, pp. 210-217, 2010. • [McCarthy 2006] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth and P. Nixon. Group recommender systems: a

critiquing based approach. In Proceedings of IUI, pp. 267-269, 2006. • [Melville 2002] P. Melville, R. J. Mooney and R. Nagarajan. Content-Boosted Collaborative Filtering for Improved

Recommendations. In Proceedings of AAAI, pp. 187-192, 2002. • [Xiong 2010] L. Xiong, X. Chen, T. K. Huang, J. G. Schneider and J. G. Carbonell. Temporal Collaborative Filtering with

Bayesian Probabilistic Tensor Factorization. In Proceedings of SDM, pp. 211-222, 2010.

References (con’t)