61
An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan [email protected] Department of Computer Science University of Illinois, Urbana-Champaign

An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan [email protected] Department of Computer Science University of Illinois, Urbana-Champaign

Embed Size (px)

Citation preview

Page 1: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

An Overview of

Information Retrieval

Nov. 10, 2009

Maryam Karimzadehgan

[email protected]

Department of Computer Science

University of Illinois, Urbana-Champaign

Page 2: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Outline

• Limitations of Database systems (Motivation for IR systems)

• Information Retrieval – Indexing

– Similarity Measures

– Evaluation

– Other IR applications

• Web Search- PageRank Algorithm

• News Recommender system on Facebook

04/20/23 2Introduction to Information Retrieval

Page 3: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

A (Simple) Database Example

Department ID DepartmentEE Electrical EngineeringCE Computer EngineeringCLIS Information Studies

Course ID Course Namelbsc690 Information Technologyee750 Communicationce098 Computer Architecture

Student ID Course ID Grade1 lbsc690 901 ee750 952 lbsc690 952 hist405 803 hist405 904 lbsc690 98

Student ID Last Name First Name Department ID email1 Maryam KarimzadehganCSCS [email protected] Peters jordan EE [email protected] Smith Chris CE [email protected] Smith John CLIS [email protected]

Student Table

Department Table Course Table

Enrollment Table

04/20/233

Page 4: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Databases vs. IR

• Format of data:– DB: Structured data. Clear semantics based on a formal

model.

– IR: Mostly unstructured. Free text.

• Queries:– DB: Formal (like SQL)

– IR: often expressed in natural language (keywords search)

• Result: – DB: exact result

– IR: Sometimes relevant, often not

04/20/23 4Introduction to Information Retrieval

Page 5: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

5

Short vs. Long Term Info Need• Short-term information need (Ad hoc retrieval)

– “Temporary need”

– Information source is relatively static

– User “pulls” information

– Application example: library search, Web search

• Long-term information need (Filtering)

– “Stable need”, e.g., new data mining algorithms

– Information source is dynamic

– System “pushes” information to user

– Applications: news filter

04/20/23 Introduction to Information Retrieval

Page 6: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

What is Information Retrieval?

• Goal: Find the documents most relevant to a certain query (information need)

• Dealing with notions of:

– Collection of documents

– Query (User’s information need)

– Notion of Relevancy

04/20/23 6Introduction to Information Retrieval

Page 7: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

What Types of Information?

• Text (Documents)

• XML and structured documents

• Images

• Audio (sound effects, songs, etc.)

• Video

• Source code

• Applications/Web services

04/20/23 7Introduction to Information Retrieval

Page 8: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

The Information Retrieval CycleSource

Selection

Search

Query

Selection

Ranked List

result

Documents

QueryFormulation

Resource

query reformulation,relevance feedback

Slide is from Jimmy Lin’s tutorial

04/20/23 8Introduction to Information Retrieval

Page 9: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

The IR Black Box

DocumentsQuery

Results

Slide is from Jimmy Lin’s tutorial

04/20/23 9Introduction to Information Retrieval

Page 10: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Inside The IR Black Box

DocumentsQuery

Results

Representation Representation

Query Representation Document Representation

ComparisonFunction

Index

Slide is from Jimmy Lin’s tutorial

04/20/23 10Introduction to Information Retrieval

Page 11: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

11

Typical IR System Architecture

User

querydocs

results

Query RepDoc Rep (Index)

ScorerIndexer

Tokenizer

Index

judgmentsFeedback

Slide is from ChengXiang Zhai’s CS410

04/20/23 Introduction to Information Retrieval

Page 12: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

12

1) Indexing

• Making it easier to match a query with a document

• Query and document should be represented using the same units/terms

This is a document in information

retrieval

documentinformation

retrievalis

this

DOCUMENTINDEX

Bag of Word Representation

04/20/23 Introduction to Information Retrieval

Page 13: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

13

What is a good indexing term?

• Specific (phrases) or general (single word)?

• Words with middle frequency are most useful

– Not too specific (low utility, but still useful!)

– Not too general (lack of discrimination, stop words)

– Stop word removal is common, but rare words are kept in modern search engines

• Stop words are words such as:

– a, about, above, according, across, after, afterwards, again, against, albeit, all, almost, alone, already, also, although, always, among, as, at

04/20/23 Introduction to Information Retrieval

Page 14: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Inverted Index

This is a sample document

with one samplesentence

Doc 1

This is another sample document

Doc 2

Dictionary Postings

Term # docs

Total freq

This 2 2

is 2 2

sample 2 3

another 1 1

… … …

Doc id Freq

1 1

2 1

1 1

2 1

1 2

2 1

2 1

… …

… …

04/20/23 14Introduction to Information Retrieval

Page 15: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

15

2) Tokenization/Stemming

• Stemming: Mapping all inflectional forms of words to the same root form, e.g.

– computer -> compute

– computation -> compute

– computing -> compute

• Porter’s Stemmer is popular for English

04/20/23 Introduction to Information Retrieval

Page 16: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

16

3) Relevance Feedback

Updatedquery

Feedback

Judgments:d1 +d2 -d3 +

…dk -...

Query RetrievalEngine

Results:d1 3.5d2 2.4…dk 0.5...

UserDocumentcollection

04/20/23 Introduction to Information Retrieval

Slide is from ChengXiang Zhai’s CS410

Page 17: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

4) Scorer/Similarity Methods

1) Boolean model

2) Vector-space model

3) Probabilistic model

4) Language model

04/20/23 17Introduction to Information Retrieval

Page 18: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Boolean Model

• Each index term is either present or absent

• Documents are either Relevant or Not Relevant(no ranking)

• Advantages– Simple

• Disadvantages– No notion of ranking (exact matching only)

– All index terms have equal weight

04/20/23 18Introduction to Information Retrieval

Page 19: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Vector Space Model

• Query and documents are represented as vectors of index terms

• Similarity calculated using COSINE similarity between two vectors

– Ranked according to similarity

04/20/23 19Introduction to Information Retrieval

Page 20: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

TF-IDF in Vector Space model

ikiki df

Nftfidf log*,,

TF part IDF part

IDF: A term is more discriminative if it occurs only in fewer documents

04/20/23 20Introduction to Information Retrieval

Page 21: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Language Models for Retrieval

Document

Text miningpaper

Food nutritionpaper

Language Model

…text ?

mining ?assocation ?clustering ?

…food ?

…food ?

nutrition ?healthy ?

diet ?

Query = “data mining algorithms”

? Which model would most likely have generated

this query?

Slide is from ChengXiang Zhai’s CS410

04/20/23 21Introduction to Information Retrieval

Page 22: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Retrieval as Language Model Estimation

• Document ranking based on query likelihood

n

ii

wwwqwhere

dwpdqp

...,

)|(log)|(log

21

• Retrieval problem Estimation of p(wi|d)

• Smoothing is an important issue, and distinguishes different approaches

Document language model

Slide is from ChengXiang Zhai’s CS410

04/20/23 22Introduction to Information Retrieval

Page 23: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Information Retrieval Evaluation

– Coverage of information

– Form of presentation

– Effort required/ease of Use

– Time and space efficiency

– Recall• proportion of relevant material actually retrieved

– Precision• proportion of retrieved material actually relevant

04/20/23 23Introduction to Information Retrieval

Page 24: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Precision vs. Recall

24

Relevant

Retrieved

|Collectionin Rel|

|edRelRetriev| Recall

|Retrieved|

|edRelRetriev| Precision

All docs

04/20/23 Introduction to Information Retrieval

Page 25: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Web Search – Google PageRank Algorithm

04/20/23 25Introduction to Information Retrieval

Page 26: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

26

Characteristics of Web Information

• “Infinite” size– Static HTML pages

– Dynamically generated HTML pages (DB)

• Semi-structured – Structured = HTML tags, hyperlinks, etc

– Unstructured = Text

• Different format (pdf, word, ps, …)

• Multi-media (Textual, audio, images, …)

• High variances in quality (Many junks)

04/20/23 Introduction to Information Retrieval

Page 27: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

27

Exploiting Inter-Document Links

Description(“anchor text”)

Hub Authority

“Extra text”/summary for a doc

Links indicate the utility of a doc

Slide is from ChengXiang Zhai’s CS410

04/20/23 Introduction to Information Retrieval

Page 28: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

28

PageRank: Capturing Page “Popularity”

[Page & Brin 98]

• Intuitions– Links are like citations in literature

– A page that is cited often can be expected to be more useful in general

– Consider “indirect citations” (being cited by a highly cited paper counts a lot…)

– Smoothing of citations (every page is assumed to have a non-zero citation count)

04/20/23 Introduction to Information Retrieval

Page 29: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

29

The PageRank Algorithm (Page et al. 98)

d1

d2

d4

“Transition matrix”

d3

iterate until converge

N= # pagesp(di): PageRank score (average probability of

visiting page di);

Random surfing model: At any page, With prob. , randomly jumping to a pageWith prob. (1-), randomly picking a link to follow.

Iij = 1/N

Initial value p(d)=1/N,

Mij = probability of going from di to dj

002/12/1

0010

0001

2/12/100

M

Slide is from ChengXiang Zhai’s CS410

04/20/23 Introduction to Information Retrieval

Page 30: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

30

PageRank: Exampled1

d2

d4

d3

iterate until converge Initial value p(d)=1/N,

pMIp

dpMdp

T

N

iiijNj

))1((

)(])1([)(1

1

)(

)(

)(

)(

05.005.005.045.0

05.005.005.045.0

45.085.005.005.0

45.005.085.005.0

)(

)(

)(

)(

)(

)(

)(

)(

4/14/14/14/1

4/14/14/14/1

4/14/14/14/1

4/14/14/14/1

2.0

002/12/1

0010

0001

2/12/100

8.02.0)2.01(

4

3

2

1

4

3

2

1

41

31

21

11

dp

dp

dp

dp

dp

dp

dp

dp

A

dp

dp

dp

dp

IMA

n

n

n

n

n

n

n

n

T

n

n

n

n

04/20/23 Introduction to Information Retrieval

Page 31: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Beyond Just Search – Information Retrieval

Applications

04/20/23 31Introduction to Information Retrieval

Page 32: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Examples of Text Management Applications

• Search

– Web search engines (Google, Yahoo, …)

– Library systems

– …

• Filtering

– News filter

– Spam email filter

– Literature/movie recommender

• Categorization

– Automatically sorting emails

– …

• Mining/Extraction

– Discovering major complaints from email in customer service

– Business intelligence

– Bioinformatics

– …04/20/23 32Introduction to Information Retrieval

Page 33: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

33

Sample Applications

•1) Text Categorization

•2) Document/Term Clustering

•3) Text Summarization

•4) Filtering

04/20/23 Introduction to Information Retrieval

Page 34: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

34

1) Text Categorization

• Pre-given categories and labeled document examples (Categories may form hierarchy)

• Classify new documents

• A standard supervised learning problem

CategorizationSystem

Sports

Business

Education

Science…Sports

Business

EducationSlide is from ChengXiang Zhai’s CS410

04/20/23

Introduction to Information Retrieval

Page 35: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

35

K-Nearest Neighbor Classifier

• Keep all training examples

• Find k examples that are most similar to the new document (“neighbor” documents)

• Assign the category that is most common in these neighbor documents (neighbors vote for the category)

• Can be improved by considering the distance of a neighbor ( A closer neighbor has more influence)

• Technical elements (“retrieval techniques”)– Document representation

– Document distance measure

Slide is from ChengXiang Zhai’s CS410

04/20/23 Introduction to Information Retrieval

Page 36: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

36

Example of K-NN Classifier

(k=1)(k=4)

04/20/23 Introduction to Information Retrieval

Slide is from ChengXiang Zhai’s CS410

Page 37: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

37

Examples of Text Categorization

• News article classification

• Meta-data annotation

• Automatic Email sorting

• Web page classification

04/20/23 Introduction to Information Retrieval

Page 38: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

38

2) The Clustering Problem

• Group similar objects together

• Object can be document, term, passages

• Example

04/20/23 Introduction to Information Retrieval

Slide is from ChengXiang Zhai’s CS410

Page 39: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

39

Similarity-based Clustering

• Define a similarity function to measure similarity between two objects

• Gradually group similar objects together in a bottom-up fashion

• Stop when some stopping criterion is met

04/20/23 Introduction to Information Retrieval

Page 40: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

40

Examples of Doc/Term Clustering

• Clustering of retrieval results

• Clustering of documents in the whole collection

• Term clustering to define “concept” or “theme”

04/20/23 Introduction to Information Retrieval

Page 41: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

41

3) Summarization - Simple Discourse Analysis----------------------------------------------------------------------------------------------------------------------------------------------------------------

vector 1vector 2vector 3……

vector n-1vector n

similarity

similarity

similarity

Slide is from ChengXiang Zhai’s CS41004/20/23 Introduction to Information Retrieval

Page 42: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

42

A Simple Summarization Method

----------------------------------------------------------------------------------------------------------------------------------------------------------------

sentence 1

sentence 2

sentence 3

summary

Doc vector

Most similarin each segment

Slide is from ChengXiang Zhai’s CS41004/20/23 Introduction to Information Retrieval

Page 43: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

43

Examples of Summarization

• News summary

• Summarize retrieval results

– Single doc summary

– Multi-doc summary

04/20/23 Introduction to Information Retrieval

Page 44: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

4) Filtering

•Content-based filtering (adaptive filtering)

•Collaborative filtering (recommender systems)

04/20/23 44Introduction to Information Retrieval

Page 45: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

45

Examples of Information Filtering

• News filtering

• Email filtering

• Movie/book recommenders such as Amazon.com

• Literature recommenders

04/20/23 Introduction to Information Retrieval

Page 46: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

46

Content-based Filtering vs. Collaborative Filtering

• Basic filtering question: Will user U like item X?

• Two different ways of answering it

– Look at what U likes

– Look at who likes X

• Can be combined

=> characterize X => content-based filtering

=> characterize U => collaborative filtering

Collaborative filtering is also called “Recommender Systems”

Slide is from ChengXiang Zhai’s CS410

04/20/23 Introduction to Information Retrieval

Page 47: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

47

Adaptive Information Filtering

• Stable & long term interest

• System must make a delivery decision immediately as a document “arrives”

FilteringSystem

my interest:

04/20/23 Introduction to Information Retrieval

Slide is from ChengXiang Zhai’s CS410

Page 48: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

48

Collaborative Filtering

• Making filtering decisions for an individual user based on the judgments of other users

• Inferring individual’s interest/preferences from that of other similar users

• General idea

– Given a user u, find similar users {u1, …, um}

– Predict u’s preferences based on the preferences of u1, …, um

04/20/23 Introduction to Information Retrieval

Page 49: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

49

Collaborative Filtering: Assumptions

• Users with a common interest will have similar preferences

• Users with similar preferences probably share the same interest

• Examples– “interest is IR” => “favor SIGIR papers”

– “favor SIGIR papers” => “interest is IR”

• Sufficiently large number of user preferences are available

04/20/23 Introduction to Information Retrieval

Page 50: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

50

Collaborative Filtering: Intuitions

• User similarity (user X and Y)

– If X liked the movie, Y will like the movie

• Item similarity– Since 90% of those who liked Star Wars also

liked Independence Day, and, you liked Star Wars

– You may also like Independence Day

04/20/23 Introduction to Information Retrieval

Page 51: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

51

A Formal Framework for Rating

u1

u2

ui

...

um

Users: U

Objects: O

o1 o2 … oj … on

3 1.5 …. … 2

2

1

3

Xij=f(ui,oj)=?

?

The task

Unknown function f: U x O R

• Assume known f values for some (u,o)’s

• Predict f values for other (u,o)’s

• Essentially function approximation, like other learning problems

04/20/23Introduction to Information Retrieval

Slide is from ChengXiang Zhai’s CS410

Page 52: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

News Recommendation on Facebook

http://sifaka.cs.uiuc.edu/ir/proj/rec/

04/20/23 52Introduction to Information Retrieval

Page 53: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Motivation

53

NewsletterOrganizer

www

04/20/23 Introduction to Information Retrieval

Page 54: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Facebook as a medium for recommendations

• Provides a great platform with in built social networks.

• More than 120 million users log on to Facebook at least once each day.

• More than 95% of the users have used at least one application built on the Facebook Platform.

• Possible to make applications that deeply integrate into a user's Facebook experience.

– FBML (Facebook Markup language)

– FBJS (Facebook Javascript)

– FQL (Facebook Query Language)

– Facebook API 5404/20/23 Introduction to Information Retrieval

Page 55: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

System Architecture

Crawler

Indexer

MetaData

(RDBMS)

Date-wise Index

Query Index

Clustering

Newsletter (RDBMS)

Register Community

Facebook Application

5504/20/23 Introduction to Information Retrieval

Page 56: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

56Application Main

page04/20/23 Introduction to Information Retrieval

Page 57: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Collaborative User Feedback

57

• Three kinds of user feedback captured

– Clickthroughs

– Explicit Ratings

– Inter-person recommendations

• They are linearly combined as follows:

Where Fij is aggregating all kinds of feedback for article aj

from user ui

04/20/23 Introduction to Information Retrieval

Page 58: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Demo

•News Recommender on Facebook

04/20/23 58Introduction to Information Retrieval

Page 59: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Application Information

• For more information about the application:

– http://sifaka.cs.uiuc.edu/ir/proj/rec/

• http://apps.facebook.com/news_letters/

04/20/23 59Introduction to Information Retrieval

Page 60: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

•We are looking for motivated students to work on this application.

•Requirements:

– DataBase Knowledge

– PHP

– Perl

•Contact me if you are interested:

[email protected]

04/20/23 60Introduction to Information Retrieval

Page 61: An Overview of Information Retrieval Nov. 10, 2009 Maryam Karimzadehgan mkarimz2@illinois.edu Department of Computer Science University of Illinois, Urbana-Champaign

Thanks

04/20/23 61Introduction to Information Retrieval