44
Supporting Exploration and Serendipity in Information Retrieval Nattiya Kanhabua Department of Computer and Information Science Norwegian University of Science and Technology 24 February 2012

Supporting Exploration and Serendipity in Information Retrieval

Embed Size (px)

Citation preview

Supporting Exploration and Serendipity in Information Retrieval

Nattiya Kanhabua

Department of Computer and Information Science Norwegian University of Science and Technology

24 February 2012

Nattiya Kanhabua 2 Trial lecture

• Typical search engines – Lookup-based paradigm – Known-item search

Motivation

World Wide Web

Document Index

query

results

Does this paradigm satisfy all types of information needs?

Nattiya Kanhabua 3 Trial lecture

Two tasks when searching for unknown:

1. Exploratory Search – Users perform information seeking

• E.g., collection browsing or visualization – Human-computer interaction

2. Serendipitous IR – Systems predict/suggest interesting information

• E.g., recommender systems – Asynchronous manner

Beyond the lookup-based paradigm

Nattiya Kanhabua 4 Trial lecture

The next generation of search

The movie: Minority Report 2002.

PART I – EXPLORATORY SEARCH

Trial lecture Nattiya Kanhabua 5

Nattiya Kanhabua 6 Trial lecture

• Information-seeking task [Marchionini 2006, White 2006a] – Seek for unknown, or an open-end problem – Complex information needs – No knowledge about the contents

Exploratory search

Document Index

query

results

? ?

Nattiya Kanhabua 7 Trial lecture

Exploratory search activities

G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41–46, 2006.

Nattiya Kanhabua 8 Trial lecture

Features of exploratory search

Query (re)formulation in real-time

Exploiting search context

Facet-based and metadata result filtering

Learning and understanding support

Result visualization

Nattiya Kanhabua 9 Trial lecture

• Help users to formulate information needs in an early stage [Manning 2008]

• Query suggestion

– Support by major search engines – Based on query logs analysis

• Query-by-example – Search using examples of documents

Query (re)formulation

Nattiya Kanhabua 10 Trial lecture

• Effective systems must adapt to contextual constraints [Ingwersen 2005]

– Time, place, history of interaction, task in hand, etc.

• Types of context 1. Explicitly provided feedbacks

• E.g., select relevant documents 2. Implicitly obtained user information

• E.g., mine users’ interaction behaviors [Dumais 2004, Kelly 2004]

Leveraging search context

Nattiya Kanhabua 11 Trial lecture

Facet-based result filtering

• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata

• Facet search provides an ability to: – Explore results via properties – Expand or refine the search

Nattiya Kanhabua 12 Trial lecture

Facet-based result filtering

• Facets are properties of a document [Tunkelang 2009] – Usually obtain from metadata

• Facet search provides an ability to: – Explore results via properties – Expand or refine the search

• No metadata? – Categorization – Clustering

Nattiya Kanhabua 13 Trial lecture

• Provide overviews of the collection and search results – To understand and support an analysis

• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]

Result visualization

Nattiya Kanhabua 14 Trial lecture

• Provide overviews of the collection and search results – To understand and support an analysis

• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]

Result visualization

Nattiya Kanhabua 15 Trial lecture

• Provide overviews of the collection and search results – To understand and support an analysis

• Applications – manyEyes [Viégas 2007] – Stuff I’ve seen [Dumais 2003] – TimeExplorer [Matthews 2010]

Result visualization

Nattiya Kanhabua 16 Trial lecture

• Provide facilities for deriving meaning from search results • Examples

– Wikify!: linking documents to encyclopedic knowledge [Mihalcea 2007]

– Learning to link with Wikipedia [Milne 2008] – Generating links to background knowledge [He 2011]

Support learning and understanding

Nattiya Kanhabua 17 Trial lecture

• Evaluation metrics for exploratory search [White 2006b]

1. Engagement and enjoyment • The degree to which users are engaged and are experiencing

2. Information novelty • The amount of new information encountered

3. Task success 4. Task time

• Time spent to reach a state of task completeness 5. Learning and cognition

• The amount of the topics covered, or and the number of insights users acquire

Evaluation of exploratory search

Nattiya Kanhabua 18 Trial lecture

• Collaborative and social search – Support of task division and knowledge sharing – Allow the team to move rapidly toward task – Provide already encountered information

Future direction

PART II – SERENDIPITOUS IR

Trial lecture Nattiya Kanhabua 19

Nattiya Kanhabua

20

Trial lecture

• Serendipity [Andel 1994] – The act of encountering relevant information unexpectedly

• Task: Predict and suggest relevant information – E.g., recommender systems

Serendipitous IR

20

Nattiya Kanhabua 21 Trial lecture

• Motivation [Adomavicius 2005, Jannach 2010] – Ease information overload – Business intelligence

• Increase the number of products sold • Sale products from the long tail • Improve users’ experience

• Real-world applications

– Book: Amazon.com – Movie: Netflix, IMDb – News: Yahoo, New York Times – Video & music: YouTube, Last.fm

Recommender systems

Nattiya Kanhabua 22 Trial lecture

• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)

• Goal: – Predict the relevance score of items – Recommend k items based on the scores

Problem statements

Recommender System

Item collection

Item Score

I1 0.8

I2 0.6

I3 0.5

Non-personalized recommendation

Nattiya Kanhabua 23 Trial lecture

• Given: – Set of items (e.g., products, movies, or news) – User information (e.g., rating or user preference)

• Goal: – Predict the relevance score of items – Recommend k items based on the scores

Problem statements

Recommender System

Item collection

Item Score

I1 0.8

I2 0.6

I3 0.5

Non-personalized recommendation Personalized recommendation

User information

Nattiya Kanhabua 24 Trial lecture

• Two main approaches – Content-based – Collaborative filtering

Personalized recommendation

Item Score

I1 0.8

I2 0.6

I3 0.5

Recommender System

Item collection

User information

Title Genre Actor …

Product features

Content-based recommendation

Nattiya Kanhabua 25 Trial lecture

• Two main approaches – Content-based – Collaborative filtering

Personalized recommendation

Item Score

I1 0.8

I2 0.6

I3 0.5

Recommender System

Item collection

User information

Collaborative filtering recommendation

Community data

Nattiya Kanhabua 26 Trial lecture

• Basic idea

– Give me “more like this” – Exploit item descriptions (contents) and user preferences

• No rating data is needed

Content-based recommendation

Genre

Director, Writers, Stars

Nattiya Kanhabua 27 Trial lecture

• Basic idea

– Give me “more like this” – Exploit item descriptions (contents) and user preferences

• No rating data is needed • Approach

1. Represent information as bag-of-word 2. Compute the similarity between the preferences and an unseen item,

e.g., the Dice coefficient or the cosine similarity [Manning 2008]

Content-based recommendation

User profiles

Contents Title Genre Director Writer Start

The Twilight Saga: Eclipse

Adventure, Drama, Fantasy

David Slade

Melissa Rosenber, Stephenie Meyer

Kristen Stewart, Robert

Pattinson

Harry Potter and the Deathly

Hallows: Part 1

Adventure, Drama, Fantasy

David Yates

Steve Kloves, J.K. Rowling

Daniel Radcliffe, Emma Watson

Title Genre Director Writer Start

The Lord of the Rings: The Return

of the King

Action, Adventure,

Drama

Peter Jackson J.R.R. Tolkien, Fran Walsh

Elijah Wood, Viggo Mortensen

Nattiya Kanhabua 28 Trial lecture

• Basic idea [Balabanovic 1997]

– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste

• Basic approach – Use a matrix of user-item ratings (explicit or implicit)

Collaborative filtering (CF)

Nattiya Kanhabua 29 Trial lecture

• Basic idea [Balabanovic 1997]

– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste

• Basic approach – Use a matrix of user-item ratings (explicit or implicit)

Collaborative filtering (CF)

Implicit rating - Clicks - Page views - Time spent on a page

Nattiya Kanhabua 30 Trial lecture

• Basic idea [Balabanovic 1997]

– Give me “popular items among my friends” – Users with similar tastes tend to have also a similar taste

• Basic approach – Use a matrix of user-item ratings (explicit or implicit) – Predict a rating for an unseen item

Collaborative filtering (CF)

Nattiya Kanhabua 31 Trial lecture

• Given the active user and a matrix of user-item ratings • Goal: predict a rating for an unseen item by

1. Find a set of users (neighbors) with similar ratings

2. Estimate John’s rating of Item5 from neighbors’ ratings 3. Repeat for all unseen items and recommend top-N items

User-based nearest-neighbor CF

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

Nattiya Kanhabua 32 Trial lecture

• Measure user similarity, e.g., Pearson correlation – a, b : users – ra,p : rating of a for item p, , = users’ averaged ratings – P : set of items, rated by both a and b

Find neighbors

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

sim = 0.85

sim = 0.70 sim = -0.79

Nattiya Kanhabua 33 Trial lecture

• Prediction function – Combine the rating differences – Use the user similarity as a weight

Estimate a rating

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 4.87

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

sim = 0.85

sim = 0.70

Nattiya Kanhabua 34 Trial lecture

• Basic idea – Use the similarity between items (instead of users) – Item-item similarity can computed offline

• Example – Look for items that are similar to Item5, or neighbors – Predict the rating of Item5 using John's ratings of neighbors

Item-based nearest-neighbor CF

Item1 Item2 Item3 Item4 Item5

John 5 3 4 4 ?

User1 3 1 2 3 3

User2 4 3 4 3 5

User3 1 5 5 2 1

Nattiya Kanhabua 35 Trial lecture

• Sparse data – Users do not rate many items

• Cold start – No rating for new users or new items

• Scaling problem – Millions of users and thousands of items – m = #users and n = #items – User-based CF

• Space complexity O(m2) when pre-computed • Time complexity for computing Pearson O(m2n)

– Item-based CF • Space complexity is reduced to O(n2)

Problems of CF

Nattiya Kanhabua 36 Trial lecture

• How to solve the sparse data problem? – Ask users to rate a set of items – Use other methods in the beginning

• E.g., content-based, or non-personalized

• How to solve the scaling problem? – Apply dimensionality reduction

• E.g. matrix factorization

Possible solutions

Nattiya Kanhabua 37 Trial lecture

• Basic idea [Koren 2008] – Determine latent factors from ratings

• E.g., types of movies (drama or action) – Recommend items from the determined types

• Approach – Apply dimensionality reduction

• E.g., Singular value decomposition (SVD) [Deerwester 1990]

Matrix factorization

Nattiya Kanhabua 38 Trial lecture

• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches

• Approach 1. Pipelined hybridization

• Use content-based to fill up entries, then use CF [Melville 2002]

Hybrid recommendation

Nattiya Kanhabua 39 Trial lecture

• Basic idea – Different approaches have their shortcomings – Hybrid: combine different approaches

• Approach 1. Pipelined hybridization

• Use content-based to fill up entries, then use CF [Melville 2002]

2. Parallel hybridization • Feature combination: ratings, user preferences and constraints

Hybrid recommendation

Nattiya Kanhabua 40 Trial lecture

• Temporal dynamics of recommender systems – Items has short lifetimes, i.e., dynamic set of items – User behaviors depend on moods or time periods – Attention to breaking news stories decay over time – Challenge: how to capture /model temporal dynamics?

• TimeSVD++ [Koren 2009] • Tensor factorization [Xiong 2010]

• Temporal diversity [Lathia 2010]

Future directions

Nattiya Kanhabua 41 Trial lecture

• Group recommendations [McCarthy 2006] – Recommendations for a group of users or friends – Challenge: how to model group preference?

• Context-aware recommendations [Adomavicius 2011] – Context, e.g., demographics, interests, time and place,

moods, weather, so on – Challenge: how to combine different context?

Future directions (cont’)

Nattiya Kanhabua 42 Trial lecture

1. Exploratory Search – Users perform information seeking

• E.g., collection browsing or visualization – Human-computer interaction

2. Serendipitous IR – Systems predict/suggest interesting information

• E.g., recommender systems – Asynchronous manner

Conclusions

Nattiya Kanhabua 43 Trial lecture

• [Dumais 2003] S. T. Dumais, E. Cutrell, J. J. Cadiz, G. Jancke, R. Sarin and D. C. Robbins. Stuff I’ve seen: A system for personal information retrieval and re-use. In Proceedings of SIGIR, pp. 72-79, 2003.

• [Dumais 2004] S. T. Dumais, E. Cutrell, R. Sarin and E. Horvitz. Implicit queries (IQ) for contextualized search. In Proceedings of SIGIR, p. 594, 2004.

• [Ingwersen 2005] P. Ingwersen and K. Järvelin. The Turn: Integration of Information Seeking and Retrieval in Context. The Information Retrieval Series, Springer-Verlag, New York, 2005.

• [He 2011] J. He, M. de Rijke, M. Sevenster, R. C. van Ommering and Y. Qian. Generating links to background knowledge: a case study using narrative radiology reports. In Proceedings of CIKM, pp. 1867-1876, 2011.

• [Kelly 2004] D. Kelly, and N. J. Belkin. Display time as implicit feedback: understanding task effects. In Proceedings of SIGIR, pp. 377-384, 2004.

• [Manning 2008] C. D.Manning, P. Raghavan and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.

• [Matthews 2010] M. Matthews, P. Tolchinsky, P. Mika, R. Blanco and H. Zaragoza. Searching through time in the New York Times. In HCIR Workshop, 2010.

• [Marchionini 2006] G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4), pp. 41-46, 2006.

• [Mihalcea 2007] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of CIKM, pp. 233-242, 2007.

• [Milne 2008] D. Milne and I. H. Witten. Learning to link with Wikipedia. In Proceedings of CIKM, pp. 509-518, 2008. • [Tunkelang 2009] D. Tunkelang. Faceted Search. Morgan & Claypool Publishers, 2009. • [Viégas 2007] F. B. Viégas, M. Wattenberg, F. van Ham, J. Kriss and M. M. McKeon. Many eyes: A site for visualization at

internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), pp. 1121-1128, 2007. • [White 2006a] R. W. White, B. Kules, S. M. Drucker and m. c. schraefel. Supporting exploratory search: Introduction to

special section. Communications of the ACM, 49(4), pp. 36-39, 2006 • [White 2006b] R. W. White, G. Muresan, and G. Marchionini. Report on ACM SIGIR 2006 workshop on evaluating

exploratory search systems. SIGIR Forum, 40(2), pp. 52-60, 2006. • [White 2009] R. W. White and R. A. Roth. Exploratory Search: Beyond the Query-Response Paradigm. Morgan & Claypool

Publishers, 2009.

References

Nattiya Kanhabua 44 Trial lecture

• [Agarwal 2010] D. Agarwal and B. C.Chen. Recommender Systems Tutorial. In ACM SIGKDD, 2010. • [Adomavicius 2005] G. Adomavicius and A. Tuzhilin: Toward the Next Generation of Recommender Systems: A Survey of

the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), pp. 734-749, 2005 • [Adomavicius 2011] G. Adomavicius and A. Tuzhilin. Context-Aware Recommender Systems. In Recommender Systems

Handbook, pp. 217-253, 2011. • [Andel 1994] P. V. Andel. Anatomy of the Unsought Finding. Serendipity: Origin, history, domains, traditions, appearances,

patterns and programmability. The British Journal for the Philosophy of Science45(2), pp. 631-648, 1994. • [Balabanovic 1997] M. Balabanovic and Y. Shoham. Content-based, collaborative recommendation. Communication of

ACM 40(3), pp. 66-72, 1997. • [Deerwester 1990] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harshman. Indexing by Latent

Semantic Analysis. In JASIS 41(6), pp. 391-407, 1990. • [Jannach 2010] D. Jannach, M. Zanker, A. Felfernig and G. Friedrich. Recommender Systems: An Introduction. Cambridge

University Press, 2010[Koren 2008] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of KDD, pp. 426-434, 2008.

• [Koren 2009] Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of KDD, pp. 447-456, 2009. • [Lathia 2010] N. Lathia, S. Hailes, L. Capra and X. Amatriain. Temporal Diversity in Recommender Systems. In Proceedings

of SIGIR, pp. 210-217, 2010. • [McCarthy 2006] K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth and P. Nixon. Group recommender systems: a

critiquing based approach. In Proceedings of IUI, pp. 267-269, 2006. • [Melville 2002] P. Melville, R. J. Mooney and R. Nagarajan. Content-Boosted Collaborative Filtering for Improved

Recommendations. In Proceedings of AAAI, pp. 187-192, 2002. • [Xiong 2010] L. Xiong, X. Chen, T. K. Huang, J. G. Schneider and J. G. Carbonell. Temporal Collaborative Filtering with

Bayesian Probabilistic Tensor Factorization. In Proceedings of SDM, pp. 211-222, 2010.

References (con’t)