Transcript

Supporting Exploration and Serendipity in Information Retrieval

Nattiya Kanhabua

Department of Computer and Information Science Norwegian University of Science and Technology

24 February 2012

Nattiya Kanhabua 2 Trial lecture

• Typical search engines – Lookup-based paradigm – Known-item search

Motivation

World Wide Web

Document Index

query

results

Does this paradigm satisfy all types of information needs?

Nattiya Kanhabua 3 Trial lecture

Two tasks when searching for unknown:

1. Exploratory Search – Users perform information seeking

• E.g., collection browsing or visualization – Human-computer interaction

2. Serendipitous IR – Systems predict/suggest interesting information

• E.g., recommender systems – Asynchronous manner

Beyond the lookup-based paradigm

Nattiya Kanhabua 4 Trial lecture

The next generation of search

The movie: Minority Report 2002.

PART I – EXPLORATORY SEARCH

Trial lecture Nattiya Kanhabua 5

Nattiya Kanhabua 6 Trial lecture

• Information-seeking task [Marchionini 2006, White 2006a] – Seek for unknown, or an open-end problem – Complex information needs – No knowledge about the contents

Exploratory search

Document Index

query

results

? ?

Nattiya Kanhabua 7 Trial lecture

Exploratory search activities

G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41–46, 2006.

Nattiya Kanhabua 8 Trial lecture

Features of exploratory search

Query (re)formulation in real-time

Exploiting search context

Facet-based and metadata result filtering

Learning and understanding support

Result visualization

Nattiya Kanhabua 9 Trial lecture

• Help users to formulate information needs in an early stage [Manning 2008]

• Query suggestion

– Support by major search engines – Based on query logs analysis

• Query-by-example – Search using examples of documents

Query (re)formulation

Nattiya Kanhabua 10 Trial lecture

• Effective systems must adapt to contextual constraints [Ingwersen 2005]

– Time, place, history of interaction, task in hand, etc.

• Types of context 1. Explicitly provided feedbacks

• E.g., select relevant documents 2. Implicitly obtained user information

• E.g., mine users’ interaction behaviors [Dumais 2004, Kelly 2004]

Leveraging search context

Nattiya Kanhabua 11 Trial lecture

Facet-based result filtering

• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata

• Facet search provides an ability to: – Explore results via properties – Expand or refine the search

Nattiya Kanhabua 12 Trial lecture

Facet-based result filtering

• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata

• Facet search provides an ability to: – Explore results via properties – Expand or refine the search

• No metadata? – Categorization – Clustering

Nattiya Kanhabua 13 Trial lecture

• Provide overviews of the collection and search results – To understand and support an analysis

• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]

Result visualization

Nattiya Kanhabua 14 Trial lecture

• Provide overviews of the collection and search results – To understand and support an analysis

• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]

Result visualization

Nattiya Kanhabua 15 Trial lecture

• Provide overviews of the collection and search results – To understand and support an analysis

• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]

Result visualization

Nattiya Kanhabua 16 Trial lecture

• Provide facilities for deriving meaning from search results • Examples

– Wikify!: linking documents to encyclopedic knowledge [Mihalcea 2007]

– Learning to link with Wikipedia [Milne 2008] – Generating links to background knowledge [He 2011]

Support learning and understanding

Nattiya Kanhabua 17 Trial lecture

• Evaluation metrics for exploratory search [White 2006b]

1. Engagement and enjoyment • The degree to which users are engaged and are experiencing

2. Information novelty • The amount of new information encountered

3. Task success 4. Task time

• Time spent to reach a state of task completeness 5. Learning and cognition

• The amount of the topics covered, or and the number of insights users acquire

Evaluation of exploratory search

Nattiya Kanhabua 18 Trial lecture

• Collaborative and social search – Support of task division and knowledge sharing – Allow the team to move rapidly toward task – Provide already encountered information

Future direction

PART II – SERENDIPITOUS IR

Trial lecture Nattiya Kanhabua 19

Nattiya Kanhabua

20

Trial lecture

• Serendipity [Andel 1994] – The act of encountering relevant information unexpectedly

• Task: Predict and suggest relevant information – E.g., recommender systems

Serendipitous IR

20

Nattiya Kanhabua 21 Trial lecture

• Motivation [Adomavicius 2005, Jannach 2010] – Ease information overload – Business intelligence

• Increase the number of products sold • Sale products from the long tail • Improve users’ experience

• Real-world applications

– Book: Amazon.com – Movie: Netflix, IMDb – News: Yahoo, New York Times – Video & music: YouTube, Last.fm

Recommender systems

Nattiya Kanhabua 22 Trial lecture

• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)

• Goal: – Predict the relevance score of items – Recommend k items based on the scores

Problem statements

Recommender System

Item collection

Item Score

I1 0.8

I2 0.6

I3 0.5

Non-personalized recommendation

Nattiya Kanhabua 23 Trial lecture

• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)

• Goal: – Predict the relevance score of items – Recommend k items based on the scores

Problem statements

Recommender System

Item collection

Item Score

I1 0.8

I2 0.6

I3 0.5

Non-personalized recommendation Personalized recommendation

User information

Nattiya Kanhabua 24 Trial lecture

• Two main approaches – Content-based – Collaborative filtering

Personalized recommendation

Item Score

I1 0.8

I2 0.6

I3 0.5

Recommender System

Item collection

User information

Title Genre Actor …

Product features

Content-based recommendation

Nattiya Kanhabua 25 Trial lecture

• Two main approaches – Content-based – Collaborative filtering

Personalized recommendation

Item Score

I1 0.8

I2 0.6

I3 0.5

Recommender System

Item collection

User information

Collaborative filtering recommendation

Community data

Nattiya Kanhabua 26 Trial lecture

• Basic idea

– Give me “more like this” – Exploit item descriptions (contents) and user preferences

• No rating data is needed

Content-based recommendation

Genre

Director, Writers, Stars

Nattiya Kanhabua 27 Trial lecture

• Basic idea

– Give me “more like this” – Exploit item descriptions (contents) and user preferences

• No rating data is needed • Approach

1. Represent information as bag-of-word 2. Compute the similarity between the preferences and an unseen item,

e.g., the Dice coefficient or the cosine similarity [Manning 2008]

Content-based recommendation

User profiles

Contents Title Genre Director Writer Start

The Twilight Saga: Eclipse

Adventure, Drama, Fantasy

David Slade

Melissa Rosenber, Stephenie Meyer

Kristen Stewart, Robert

Pattinson

Harry Potter and the Deathly

Hallows: Part 1

Adventure, Drama, Fantasy

David Yates

Steve Kloves, J.K. Rowling

Daniel Radcliffe, Emma Watson

Title Genre Director Writer Start

The Lord of the Rings: The Return

of the King

Action, Adventure,

Drama

Peter Jackson J.R.R. Tolkien, Fran Walsh

Elijah Wood, Viggo Mortensen

Nattiya Kanhabua 28 Trial lecture

• Basic idea [Balabanovic 1997]

– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste

• Basic approach – Use a matrix of user-item ratings (explicit or implicit)

Collaborative filtering (CF)

Nattiya Kanhabua 29 Trial lecture

• Basic idea [Balabanovic 1997]

– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste

• Basic approach – Use a matrix of user-item ratings (explicit or implicit)

Collaborative filtering (CF)

Implicit rating - Clicks - Page views - Time spent on a page

Nattiya Kanhabua 30 Trial lecture

• Basic idea [Balabanovic 1997]

– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste

• Basic approach – Use a matrix of user-item ratings (explicit or implicit) – Predict a rating for an unseen item

Collaborative filtering (CF)

Nattiya Kanhabua 31 Trial lecture

• Given the active user and a matrix of user-item ratings • Goal: predict a rating for an unseen item by

1. Find a set of users (neighbors) with similar ratings

2. Estimate John’s rating of Item5 from neighbors’ ratings 3. Repeat for all unseen items and recommend top-N items

User-based nearest-neighbor CF

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

Nattiya Kanhabua 32 Trial lecture

• Measure user similarity, e.g., Pearson correlation – a, b : users – ra,p : rating of a for item p, , = users’ averaged ratings – P : set of items, rated by both a and b

Find neighbors

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

sim = 0.85

sim = 0.70 sim = -0.79

Nattiya Kanhabua 33 Trial lecture

• Prediction function – Combine the rating differences – Use the user similarity as a weight

Estimate a rating

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 4.87

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

sim = 0.85

sim = 0.70

Nattiya Kanhabua 34 Trial lecture

• Basic idea – Use the similarity between items (instead of users) – Item-item similarity can computed offline

• Example – Look for items that are similar to Item5, or neighbors – Predict the rating of Item5 using John's ratings of neighbors

Item-based nearest-neighbor CF

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

Nattiya Kanhabua 35 Trial lecture

• Sparse data – Users do not rate many items

• Cold start – No rating for new users or new items

• Scaling problem – Millions of users and thousands of items – m = #users and n = #items – User-based CF

• Space complexity O(m2) when pre-computed • Time complexity for computing Pearson O(m2n)

– Item-based CF • Space complexity is reduced to O(n2)

Problems of CF

Nattiya Kanhabua 36 Trial lecture

• How to solve the sparse data problem? – Ask users to rate a set of items – Use other methods in the beginning

• E.g., content-based, or non-personalized

• How to solve the scaling problem? – Apply dimensionality reduction

• E.g. matrix factorization

Possible solutions

Nattiya Kanhabua 37 Trial lecture

• Basic idea [Koren 2008] – Determine latent factors from ratings

• E.g., types of movies (drama or action) – Recommend items from the determined types

• Approach – Apply dimensionality reduction

• E.g., Singular value decomposition (SVD) [Deerwester 1990]

Matrix factorization

Nattiya Kanhabua 38 Trial lecture

• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches

• Approach 1. Pipelined hybridization

• Use content-based to fill up entries, then use CF [Melville 2002]

Hybrid recommendation

Nattiya Kanhabua 39 Trial lecture

• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches

• Approach 1. Pipelined hybridization

• Use content-based to fill up entries, then use CF [Melville 2002]

2. Parallel hybridization • Feature combination: ratings, user preferences and constraints

Hybrid recommendation

Nattiya Kanhabua 40 Trial lecture

• Temporal dynamics of recommender systems – Items has short lifetimes, i.e., dynamic set of items – User behaviors depend on moods or time periods – Attention to breaking news stories decay over time – Challenge: how to capture /model temporal dynamics?

• TimeSVD++ [Koren 2009] • Tensor factorization [Xiong 2010]

• Temporal diversity [Lathia 2010]

Future directions

Nattiya Kanhabua 41 Trial lecture

• Group recommendations [McCarthy 2006] – Recommendations for a group of users or friends – Challenge: how to model group preference?

• Context-aware recommendations [Adomavicius 2011] – Context, e.g., demographics, interests, time and place,

moods, weather, so on – Challenge: how to combine different context?

Future directions (cont’)

Nattiya Kanhabua 42 Trial lecture

1. Exploratory Search – Users perform information seeking

• E.g., collection browsing or visualization – Human-computer interaction

2. Serendipitous IR – Systems predict/suggest interesting information

• E.g., recommender systems – Asynchronous manner

Conclusions

Nattiya Kanhabua 43 Trial lecture

• [Dumais 2003] S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I’ve seen: A system for personal information retrieval and re-use. In Proceedings of SIGIR, pp. 72-79, 2003.

• [Dumais 2004] S. T. Dumais, E. Cutrell, R. Sarin and E. Horvitz. Implicit queries (IQ) for contextualized search. In Proceedings of SIGIR, p. 594, 2004.

• [Ingwersen 2005] P. Ingwersen and K. Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context. The Information Retrieval Series, Springer-Verlag, New York, 2005.

• [He 2011] J. He, M. de Rijke, M. Sevenster, R. C. van Ommering and Y. Qian. Generating links to background knowledge: a case study using narrative radiology reports. In Proceedings of CIKM, pp. 1867-1876, 2011.

• [Kelly 2004] D. Kelly, and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of SIGIR, pp. 377-384, 2004.

• [Manning 2008] C. D.Manning, P. Raghavan and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

• [Matthews 2010] M. Matthews, P. Tolchinsky, P. Mika, R. Blanco and H. Zaragoza. Searching through time in the New York Times. In HCIR Workshop, 2010.

• [Marchionini 2006] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41-46, 2006.

• [Mihalcea 2007] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of CIKM, pp. 233-242, 2007.

• [Milne 2008] D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proceedings of CIKM, pp. 509-518, 2008. • [Tunkelang 2009] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. • [Viégas 2007] F. B. Viégas, M. Wattenberg, F. van Ham, J. Kriss and M. M. McKeon. Many eyes: A site for visualization at

internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1121-1128, 2007. • [White 2006a] R. W. White, B. Kules, S. M. Drucker and m. c. schraefel. Supporting exploratory search: Introduction to

special section. Communications of the ACM, 49(4), pp. 36-39, 2006 • [White 2006b] R. W. White, G. Muresan, and G. Marchionini. Report on ACM SIGIR 2006 workshop on evaluating

exploratory search systems. SIGIR Forum, 40(2), pp. 52-60, 2006. • [White 2009] R. W. White and R. A. Roth. Exploratory Search: Beyond the Query-Response Paradigm. Morgan & Claypool

Publishers, 2009.

References

Nattiya Kanhabua 44 Trial lecture

• [Agarwal 2010] D. Agarwal and B. C.Chen. Recommender Systems Tutorial. In ACM SIGKDD, 2010. • [Adomavicius 2005] G. Adomavicius and A. Tuzhilin: Toward the Next Generation of Recommender Systems: A Survey of

the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), pp. 734-749, 2005 • [Adomavicius 2011] G. Adomavicius and A. Tuzhilin. Context-Aware Recommender Systems. In Recommender Systems

Handbook, pp. 217-253, 2011. • [Andel 1994] P. V. Andel. Anatomy of the Unsought Finding. Serendipity: Origin, history, domains, traditions, appearances,

patterns and programmability. The British Journal for the Philosophy of Science45(2), pp. 631-648, 1994. • [Balabanovic 1997] M. Balabanovic and Y. Shoham. Content-based, collaborative recommendation. Communication of

ACM 40(3), pp. 66-72, 1997. • [Deerwester 1990] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harshman. Indexing by Latent

Semantic Analysis. In JASIS 41(6), pp. 391-407, 1990. • [Jannach 2010] D. Jannach, M. Zanker, A. Felfernig and G. Friedrich. Recommender Systems: An Introduction. Cambridge

University Press, 2010[Koren 2008] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of KDD, pp. 426-434, 2008.

• [Koren 2009] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of KDD, pp. 447-456, 2009. • [Lathia 2010] N. Lathia, S. Hailes, L. Capra and X. Amatriain. Temporal Diversity in Recommender Systems. In Proceedings

of SIGIR, pp. 210-217, 2010. • [McCarthy 2006] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth and P. Nixon. Group recommender systems: a

critiquing based approach. In Proceedings of IUI, pp. 267-269, 2006. • [Melville 2002] P. Melville, R. J. Mooney and R. Nagarajan. Content-Boosted Collaborative Filtering for Improved

Recommendations. In Proceedings of AAAI, pp. 187-192, 2002. • [Xiong 2010] L. Xiong, X. Chen, T. K. Huang, J. G. Schneider and J. G. Carbonell. Temporal Collaborative Filtering with

Bayesian Probabilistic Tensor Factorization. In Proceedings of SDM, pp. 211-222, 2010.

References (con’t)