20
Learning Techniques for Information Retrieval We cover 1. Perceptron algorithm 2. Least mean square algorithm 3. Chapter 5.2 User relevance feedback (pp.118-123) 4. Chapter 5.3.1 Query expansion through local clustering (pp. 124-126)

Learning Techniques for Information Retrieval

  • Upload
    lluvia

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Learning Techniques for Information Retrieval . We cover Perceptron algorithm Least mean square algorithm Chapter 5.2 User relevance feedback (pp.118-123) Chapter 5.3.1 Query expansion through local clustering (pp. 124-126). Adaptive linear model. - PowerPoint PPT Presentation

Citation preview

Page 1: Learning Techniques for Information Retrieval

Learning Techniques for Information Retrieval We cover 1. Perceptron algorithm2. Least mean square algorithm3. Chapter 5.2 User relevance feedback

(pp.118-123)4. Chapter 5.3.1 Query expansion through

local clustering (pp. 124-126)

Page 2: Learning Techniques for Information Retrieval

Adaptive linear model• Let X1, X2, …, Xn be n vectors (of n

documents). • D1D2={X1, X2, …, Xn}, where D1 be the set of

relevant documents and D2 be the set of ir-relevant documents.

• D1 and D2 are obtained from users feedback.• Question: find a vector w such that WiXij+1>0 for each XjD1 and i=1 to m

WiXij>+1<0 for each XjD2 i=1 to m

Page 3: Learning Techniques for Information Retrieval

2D

1D

Page 4: Learning Techniques for Information Retrieval

W0W1

W2

W3

Wn

+1

-1

Threshold

Output=sign(y)

X0

X1

X2

X3

Xn

Page 5: Learning Techniques for Information Retrieval

Remarks:• W is the new vector for query.• W is computed based on the feedback, i.e., D1

and D2.• The following is a hyper-plane: wiXi+d=0, where W=(w1, w2, …, wm) i=1 to m

• The hyper-plane cuts the whole space into two parts and hopefully one part contains relevant docs and the other contains non-relevant docs.

Page 6: Learning Techniques for Information Retrieval

Perceptron Algorithm

(1) For each XD1, if X·W+d<0 then increase the weight vector at the next iteration:

W=Wold+CX . d=d+C.(2) For each XD2 if X·W+d>0 then decrease the

weight vector at the next iteration: W=Wold -CX . d=d-C. C is a constant. Repeat until X·W>0 for each XD1 and X·W<0 for

each XD2 .

Page 7: Learning Techniques for Information Retrieval

Preceptron Convergence Theorem

• The perceptron algorithm finds a W in finite iterations if the training set {X1, X2, …, Xn} is linearly separable.

• References: • Wong, S.K.M., Yao, Y.Y., Salton, G., and Buckley, C., Evaluation of

an adaptive linear model, Journal of the American Society for Information Science, Vol. 42, No. 10, pp. 723-730, 1991.

• Wong, S.K.M. and Yao, Y.Y., Query formulation in linear retrieval models, Journal of the American Society for Information Science, Vol. 41, No. 5, pp. 334-341, 1990.

Page 8: Learning Techniques for Information Retrieval

An Example of the perception AlgorithmX1=(2,0), X2 =(2,2), X3=(-1,2), X4=(-2,1), X5=(-1,-1), X6=(1/2,-3/4)

D1={X1,X2 ,X3} , D1={X4,X5,X6} ,W=(-1,0). Set d=0.5

X3 X2

X1

X4

X5X6

W

WX1+0.5=-0.5,

W= Wold+X1=(1,0)

Page 9: Learning Techniques for Information Retrieval

X3 X2

X1

X4

X6

W

WX2+0.5= 2.5>0,

WX3+0.5=-0.5,W=Wold+ X3=(0,2)

X3 X2

X1

X5 X6

X1

X3 X2

X4

X5 X6

W=(0,2)

WX4+0.5=2.5>0,

W=Wold- X4=(2,1)

Page 10: Learning Techniques for Information Retrieval

X3 X2

X1

X4

X5 X6

W=(2,1)

WX5+0.5=-2.5<0,

WX6+0.5=3/4>0, W=Wold-X6=(3/2,7/4)

WX1+0.5=3.5, WX2+0.5=7, WX3+0.5=2.5,

WX4+0.5= -3/4,WX5+0.5= -11/4,

WX6+0.5=-1/16,

The algorithm stops here.

X3 X2

X1

X4

X5

W=(3/2,7/4)

X6

Page 11: Learning Techniques for Information Retrieval

LMS Learning Algorithm

Given a set of input vectors {x1, x2, …, xL}, each has its own desired output dk, for k=1, 2, …, L,

Find a vector w such that L (dk-w·xk)2 is minimized.

K=1 For IR, dk is just the order the user gives.From “Neural networks: algorithms, applications and

programming techniques, by James A. Freeman, David M. Skapura. 1991. Addison-Wesley Publishing Company.”

Page 12: Learning Techniques for Information Retrieval

The algorithm 1. choose a vector w(1)=(1, 1, .., 1).2. For each xk, compute 3. k 2 (t)=(dk-w·xk) 2 4. W(t+1)=w(t)+2 k xk. 5. Repeat 1-4 until the error is reduced to be

acceptable.--a parameter. If is too large, the algorithm will

never converge. If is too small, the speed is slow. Choose a number between 1.0 and 0.1 in practice. You can choose a bigger number at the beginning and reduce gradually.

Page 13: Learning Techniques for Information Retrieval
Page 14: Learning Techniques for Information Retrieval

Query Expansion and Term Reweighting for the Vector Model • : set of relevant documents, as identified by the user,

among the retrieved documents;• : set of non-relevant documents among the retrieved

documents;• : set of relevant documents among all documents in the

collection;• : number of documents in the sets

respectively;• : tuning constants.

, ,r n rD D C

rD

nD

rC

,

| |,| |,| |r n rD D C ,

, ,

1 1| | | |

r r

j joptd C d Cr r

q d dC N C

Page 15: Learning Techniques for Information Retrieval

Query Expansion and Term Reweighting for the Vector Model

| | | |j jr n

j jmd D d Dr n

q q d dD D

j jr n

j jmd D d D

q q d d

max ( )j r

j jnon relevantmd D

q q d d

Standard_Rochio :

Ide_Regular :

Ide_Dec_Hi :

Where is a reference to the highest ranked non-relevant document.

max ( )jnon relevant d

Page 16: Learning Techniques for Information Retrieval

Evaluation of Relevance Feedback Strategies (Chapter 5)• Simple way: use the new query to search

the database and recalculate the results• Problem: used feedback information, it is

not fare.• Better way: just consider the unused

documents.

Page 17: Learning Techniques for Information Retrieval

Query Expansion Through Local Clustering

• Definition Let be a non-empty subset of words which are grammatical variants of each other. A canonical form from of is called a stem. For instance, if then

• Definition For a given query , the set of documents retrieved is called the local document set. Further, the set of all distinct words in the local document set is called the local vocabulary. The set of all distinct stems derived from the set is referred to as .

( )V s

s ( )V s( ) { , , }V s polish polishing polished

s polish

q lD

lV

lV lS

Page 18: Learning Techniques for Information Retrieval

Association Clusters

• Definition The frequency of a stem in a document , , is referred to as . Let be an association matrix with rows and columns, where . Let be the transpose of . The matrix is a local stem-stem association matrix. Each element in expresses a correlation between the stems and namely,

is

jd j ld D,is jf ,( )i jm m

| |lS | |lD

, ,ii j s jm f tm m

t

s mm

us vs

, , ,u v

j l

u v s j s jd D

c f f

(5.5)

, ,u v u vs c (5.6)

,u vs s

Page 19: Learning Techniques for Information Retrieval

Association Clusters

• Normalize

• Definition Consider the -th row in the association matrix (i.e., the row with all the associations for the stem ). Let be a function which takes the -th row and returns the set of largest values , where varies over the set of local stems and . Then defines a local association cluster around the stem . If is given by equation (5.6), the association cluster is said to be unnormalized. If is given by equation 5.7, the association cluster is said to be normalized.

,,

, , ,

u vu v

u u v v u v

cs

c c c

us

us ( )uS n un ,u vs

vv u ( )uS n

us ,u vs

,u vs

(5.7)

Page 20: Learning Techniques for Information Retrieval

Interactive Search formulation

• A stem su that belongs to a cluster associated to another stem sv is said to be a neighbor of sv.

• Reformulation of query, for each Sv, in the query, select m neighbor stems

from the cluster Sv(n) and add them to the query.