27
1 CAASL2 2007 July 21-22 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud Masoud Rahgozar Rahgozar School of Electrical and Computer Engineering School of Electrical and Computer Engineering University of Tehran University of Tehran Farhad Oroumchian Farhad Oroumchian University of Wollongong in Dubai University of Wollongong in Dubai

CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

Embed Size (px)

Citation preview

Page 1: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

1

CAASL2 2007July 21-22

Using OWA Fuzzy Operator to Merge Retrieval System Results

Tehran University

Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud RahgozarMasoud RahgozarSchool of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

University of TehranUniversity of Tehran

Farhad OroumchianFarhad OroumchianUniversity of Wollongong in DubaiUniversity of Wollongong in Dubai

Page 2: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

2 University of Tehran - Database Research Group

OutlineThe Persian Language

Used MethodsVector Space Model

Language Modeling

OWA Operator

The test collections

Experiment results

Conclusion

Page 3: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

3 University of Tehran - Database Research Group

Outline

The Persian Language

Used MethodsVector Space Model

Language Modeling

OWA Operator

The test collections

Experiment results

Conclusion

Page 4: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

4 University of Tehran - Database Research Group

The Persian LanguageIt is Spoken in countries like Iran, Tajikistan and Afghanistan

It has Arabic like script for writing and consists of 32 characters that are written continuously from right to left

It’s morphological analyzers need to deal with many forms of words that are not actually Farsi

Example• The word “کافر” (singular) “کفار” (plural)

• Or “عادت” that has two plural forms in Farsi: – Farsi form“ ها ”عادت– Arabic form“عادات”

Page 5: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

5 University of Tehran - Database Research Group

Outline

The Persian Language

Used MethodsVector Space Model

Language Modeling

OWA Operator

The test collections

Experimental results

Conclusion

Page 6: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

6 University of Tehran - Database Research Group

Name Weighting

tf.idf tf*log(N/n) / ((tf2) * (qtf2))

lnc.ltc (1+log(tf))*(1+log(qtf))*log((1+N)/n) / ((tf2) * (qtf2))

nxx.bpx (0.5+0.5*tf/max tf)+log((N-n)/n)

tfc.nfc tf*log(N/n)*(0.5+0.5*qtf/max qtf)*log(N/n) / ((tf2) * (qtf2))

tfc.nfx1 tf* log(N/n)*(0.5+0.5*qtf/max qtf) *log(N/n) / ((tf * log(N/n))2)

tfc.nfx2 tf*log(N/n)*(0.5+0.5*qtf/max qtf)*log(N/n) / ((tf2))

Lnu.ltu ((1+log(tf))*(1+log(qtf))*log((1+N)/n))/((1+log(average tf)) * ((1-s) + s * N.U.W/ average N.U.W)2)

List of Weights that produced the best resultsVector Space Model

Best

We used Lnu.ltu and Lnc.btc weighting schemasWe used Lnu.ltu and Lnc.btc weighting schemas

Page 7: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

7 University of Tehran - Database Research Group

Outline

The Persian Language

Used MethodsVector Space Model

Language Modeling

OWA Operator

The test collections

Experimental results

Conclusion

Page 8: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

8 University of Tehran - Database Research Group

Language Modeling

Hiemstra (2001) proposed four ways to specify the rank of document d against query qConsidering P(D=d) as the prior probability of relevance of the document d to the query q with query terms t1,...,tn.Lambda (λ ) is a smoothing parameter and is equal for each query term. Hiemstra (2002) emphasizes if there is no previous relevance information available for a query, each query term will be considered equally important.

Page 9: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

9 University of Tehran - Database Research Group

Language Modeling- Cont.

n

i t

t

dttfticf

tcfdtitfdLM

1

))),(()()1(

))((),(1log()(1

n

i t

t

dttftidf

tdfdtitfdLM

1

))),(()()1(

))((),(1log()(2

n

i t

tt dttfticf

tcfdtitfdttfdLM

1

))),(()()1(

))((),(1log()),(log()(3

n

i t

tt dttftidf

tdfdtitfdttfdLM

1

))),(()()1(

))((),(1log()),(log()(4

Page 10: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

10 University of Tehran - Database Research Group

Outline

The Persian Language

Used MethodsVector Space Model

Language Modeling

OWA Operator

The test collections

Experimental results

Conclusion

Page 11: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

11 University of Tehran - Database Research Group

OWA Operator- Cont.We used OWA operator as the merge operator. The OWA operator with n dimensions is a nonlinear aggregation operator OWA: [0, 1]n[0, 1] with a weighting vector W=[w1,w2,…,wn] such that Sigma(Wi)=1 with wi in [0, 1]. The OWA weight of each document d is defined as:

where xi indicated the score of document d in the ith list. Each score xi is assigned by Ri (ith search engine) to document d. If d is not present in the ith list then xi=0.

.

),...,,()( 21 nxxxOWAdOWA

Page 12: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

12 University of Tehran - Database Research Group

OWA Operator- Cont.

The OWA weight of each document is computed by this Equation:

in which WT is the transpose vector of W that defines the semantics of associated with theOWA operator and B=[b1,b2,..,bn] is the vectorX=[ x1, x2,…, xn] reordered so that bj=Minj(x1, x2,…, xn), that is the jth smallest element of all the x1, x2,…, xn. we used a simple function to bring the scores ({xi, i=1,…,n}) into a same scale

n

i

Tin biwBxxxOWAdOWA

1

T21 *.W),..,,()(

Page 13: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

13 University of Tehran - Database Research Group

OWA Operator- Weighting Method

Quantifier Based Weighting

Degree of Importance Based Weighting

Page 14: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

14 University of Tehran - Database Research Group

Quantifier Based Weighting

linguistic quantifiers All, Mostn, Fewn, and At-Least-One as the weighting schemas, All: consider documents appearing in all retrieval engines’ lists. This quantifier is suitable when the user is looking for precise answer Most: a fuzzy majority operator that assumes the retrieval by the most of the engines to be sufficient for inclusion in the fused list. Few: is a weaker weighting schemas in which it is enough for a document to be retrieved by a few number of retrieval engines. At-Least-One: is the weakest weighting schemas in which it is enough for a document to appear in only one retrieval engine’s list to be included in the fused list.

Hence the All quantifier has an AND semantic and the At-Least-Onequantifier has an OR semantic. The Most and Few quantifiers have thesemantics in between an AND and an OR operators.

Page 15: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

15 University of Tehran - Database Research Group

Degree of Importance Based Weighting

As the second weighting schema we use the position of the documents in the retrieved lists to produce the weighting vector W= [w1, w2,…, wn]. The weight of each document d in the Li,q is defined by

in which Ni is the number of elements in the ith list, Li,q, and POSi is the position of document d in Li,q.

.

i

iii N

POSNw

1

Page 16: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

16 University of Tehran - Database Research Group

Outline

The Persian Language

Used MethodsVector Space Model

Language Modeling

OWA Operator

The test collections

Experimental results

Conclusion

Page 17: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

17 University of Tehran - Database Research Group

Test Collections1. Qvanin Collection

Documents: Iranian Law Collection • 177089 passages • 41 queries and Relevance Judgments

2. Hamshari CollectionDocuments: 600+ MB News from Hamshari Newspaper

• 160000+ news articles• 60 queries and Relevance Judgments

3. BijanKhan Tagged CollectionDocuments: 100+ MB from different sources

• A tag set of 41 tags• 2590000+ tagged words

Page 18: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

18 University of Tehran - Database Research Group

Hamshahri CollectionWe used HAMSHAHRI (a test collection for Persian text

prepared and distributed by DBRG (IR team) of University of Tehran)

The 3rd version:

– contains about 160000+ distinct textual news

articles in Farsi

– 60 queries and relevance judgments for top 20 relevant documents for each query

Page 19: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

19 University of Tehran - Database Research Group

OutlineThe Persian Language

Used MethodsPivoted normalization

N-Gram approach

Local Context Analysis

Our test collections

Experimental results

Conclusion

Page 20: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

20 University of Tehran - Database Research Group

Experiment results

0.32

0.37

0.42

0.47

0.52

0.57

0.62

5 10 15 20

Document Cut-Off

Pre

cis

ion

Lnc.btc Lnu.ltu .025 LM1 LM2 LM3 LM4

Page 21: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

21 University of Tehran - Database Research Group

Quantifier Based OWA Weighting

Method Weighting Vector Orness Degree

All W=[1,0,0,0,0,0] 0.00

Most2 W=[0,.5,.5,0,0,0] 0.30

Most3 W=[0,.33,.33,.33,0,0] 0.40

Most4 W=[0,.25,.25,.25,.25,0] 0.50

Few3 W=[0,0,.33,.33,.33,0] 0.59

Few2 W=[0,0,0,.5,.5,0] 0.70

At-Least-One W=[0,0,0,0,0,1]1.00

Page 22: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

22 University of Tehran - Database Research Group

Experiment results

0.500

0.520

0.540

0.560

0.580

0.600

0.620

5 10 15 20

Most2 Most3 Most4 Few2Few3 DOI

Page 23: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

23 University of Tehran - Database Research Group

Experiment results

0.500

0.520

0.540

0.560

0.580

0.600

0.620

5 10 15 20

Most3 Most4 LM4 Lnu.ltu .025

Page 24: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

24 University of Tehran - Database Research Group

Statistical significance tests

Wilcoxon Signed Rank

T-Test

Method Most3 Most4 LM4

LM4 0.861 0.894 ---

Lnu.ltu 0.25 0.081 0.077 0.444

Method Most3 Most4 LM4

LM4 0.456 0.383 --

Lnu.ltu 0.25 0.028 0.027 0.147

Page 25: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

25 University of Tehran - Database Research Group

Statistical significance tests

Based on T Test, both Mos3 and Most4 methods are significantly better than LM4 method which is a confirmation of The Wilconxon Signed Rank test.

However, with the T-Test we can not confirm the significance of the Mos3 and Most4 methods over the Lnu.ltu with slope of 0.25 method.

Page 26: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

26 University of Tehran - Database Research Group

Conclusion

We used two weighting namely quantifier based and degree-of-importance based weighting methods The experimental results show that the best OWA operator, Most3 and Most4 (quantifier based OWA operators), only marginally improve over the best retrieval method on Persian text the LM4 methods. However seems they produce better ranking since they push the relevant documents to higher ranks.The significant tests we conducted seem to confirm that Most3 and Most4 are significantly better than all other methods but Lnu.ltu with slope of 0.25.However, the superiority over the Lnu.ltu with slope of 0.25 was not confirmed by T-Test.

Page 27: CAASL2 2007 July 21-22 1 Using OWA Fuzzy Operator to Merge Retrieval System Results Tehran University Hadi Amiri, Abolfazl AleAhmad, Caro Lucas, Masoud

27 University of Tehran - Database Research Group

Thanks, Questions

?http://ece.ut.ac.ir/dbrg