45
CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Embed Size (px)

Citation preview

Page 1: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

CMPS 561Fuzzy Set Retrieval

Ryan Benton

September 1, 2010

Page 2: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Agenda

• Fuzzy Sets• Fuzzy Operations• Fuzzy Set Retrieval• Processing Fuzzy Queries• Set Issues• -level Sets

Page 3: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy Sets

Page 4: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Issue

• Problem– Boolean assumes documents

• Exist in 1 of 2 states• States are Non-Overlapping

– CRISP!

– Other crisp states may exist• Grades: A, B, C, D, F

– Can be in one and not the other.

– Often, exact boundaries are not known• Hot, Warm, Luke-warm, Cool, Chilly, Cold

– When does Hot become Warm?

Page 5: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Issue

• Fuzzy Sets– Informal Definition

• Fuzzy Set is a class with fuzzy boundaries.• Alternatively, fuzzy set is a class with non-crisp

boundaries.

Page 6: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy Sets – Slightly More Formal

• Regular (Crisp) set– Let X = {xi} be the universe of objects of interest.

• Fuzzy Subset of X A.– A is a set of order pairs

• A = { < x, (x)> }

– (x) Grade of membership of x in A

– : X [0,1]

Page 7: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Example 1

• Assume Crisp Set is– X = {x1, x2, x3, x4}

• Fuzzy Subsets– A = {<x1,1.0>,<x2,0.5>< x3,0.8>}

– B = {<x2,0.8>}

Page 8: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Example 2Membership Functions

4’6” 6’

short medium tall

1

0

1

0

.75

.2

4’

tall4’medium(4’)=0.2, short(4’)=.75

Crisp

Fuzzy

Page 9: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy Operations

Page 10: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Operations 1

• Let– A = { < x, A(x)> }

– B = { < x, B(x)> }

• Equality– A = B iff

• A(x) = B(x), for all x X.

Page 11: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Operations 2

• Containment– A B iff

• A(x) B(x), for all x X.

• Complement– Ā = { < x, Ā(x)> } iff

• Ā(x) = 1 - A(x), for all x X.

Page 12: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Operations 3

• Union– A B = {< x, AB(x)> } iff

• AB = max(A(x), B(x)), for all x X.

• Intersection– A B = {< x, AB(x)> } iff

• AB = min(A(x), B(x)), for all x X.

Page 13: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy Set Retrieval Model

Page 14: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Document Representation Boolean

• Relation– D = { < d, ti, D(d, ti)> }

– D: D x T {0,1}

– D(d, ti)

• 1, if d contains ti

• 0, otherwise

• Dt = {d D | D(d, t) = 1}

• d Dd = {ti T | D(d, ti) = 1}

Page 15: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Document RepresentationFuzzy

• D = { < d, ti, D(d, ti)> }

• D: D x T [0,1]

• D(d, ti)

– The degree of membership of ti to d

• Dt = {<d, D(d, t)> | d D >}

• Dd = {<ti, D(d, ti)> | ti T>}

Page 16: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Processing Boolean Query (Method 2)

• Dt1t2 = {d D | D(d, t1) D(d, t2) = 1}

• Dt1t2 = {d D | D(d, t1) D(d, t2) = 1}

• Dt = set of Documents containing term t

• T = {a,b,c,d,e}

• Da, Db, Dc, Dd, De,

Page 17: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Processing Fuzzy Query (Method 2)

• Dt1t2 = {<d , D(d, t1) D(d, t2) > | d D }

• Dt1t2 = {<d , D(d, t1) D(d, t2) > | d D }

• Dt = set of <Document, membership value> pairs containing term t

• T = {a, b, c, d, e}

• Da, Db, Dc, Dd, De

Page 18: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Retrieval FunctionBoolean

• RSV F

• RSVt(d) = D(d, t)

• RSVe(d) = 1 - RSVe(d)

• RSVe1e2(d) = RSVe1(d) RSVe2(d)

• RSVe1e2(d) = RSVe1(d) RSVe2(d)

Page 19: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Retrieval FunctionFuzzy

• RSVt(d) = D(d, t)

• RSVe(d) = 1 - RSVe(d)

• RSVe1e2(d) = min(RSVe1(d),RSVe2(d))

• RSVe1e2(d) = max(RSVe1(d),RSVe2(d))

Page 20: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Processing Fuzzy Queries

Page 21: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Term-Document Incidence Matrix

d1 d2 d3 d4 d5 d6 d7 d8

t1 0.1 0.7 0.6 0.4 0.8 0.6 0.3 0.6

t2 0.3 0.8 0.9 0.8 0.6 0.5 0.7 0.2

t3 0.8 1 0.2 0.6 0.8 0.2 0.8 0.5

t4 0.4 0.6 0.7 0.5 0.7 0.3 0.1 0.9

t5 0.7 0.2 0.8 0.4 0.6 0.2 0.8 0.4

Page 22: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4

Page 23: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example(Method 1)

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.10.50.6

0.70.30.2

0.70.50.6

Page 24: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Inverted File Processing(Method 2)

• Dt = {<d, Dt(d)> | Dt(d) > 0}

• De = {<d, 1-RSVe(d)> | 1-RSVe(d) > 0}

• De1e2 = {<d, RSVe1e2(d)> | RSVe1e2(d) > 0}

• De1e2 = {<d, RSVe1e2(d)> | RSVe1e2(d) > 0}

Page 25: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Processing Fuzzy Query (Method 2)

Output

• Dt

• De1

• De2

• De1 De2

• De1 De2

• D\De1

Query• t• e1• e2• e1e2• e1e2• e1

Page 26: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Set Issues

Page 27: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Set Identities

• A B = B A

• (A B) C = A (B C)

• A B C) =

(A B) C)

• A = A

• A A’ = S

• A B = B A

• (A B) C = A (B C)

• A B C) =

(A B) C)

• A S = A

• A A’ =

Page 28: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Definitions of Union and Intersection

Name Union Intersection

max, min max, product ·

probabilistic sum, product ·

Einstein

bold

Hamacher

Yager

Schweizer-Sklard ⊥p ⊤p

+ ·

+

v

v

Page 29: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Operations – Union & Intersection

• a · b = ab• a b = a + b – ab• a b = (a + b) / (1 + ab)• a b = (ab) / ( 1 + (1-a)(1-b)• a b = min(a + b, 1)• a b = max(0, a + b - 1)

+

Page 30: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Operations – Union & Intersection

• a b = (a + b – (1 – )ab) / (1- ab– ∞)

• a b = (ab) / (a+b– ∞)

• a b = min(1, (av + bv)(1/v))– v∞)

• a b = 1 - min(1, [(1-a)v + (1-b)v](1/v))– v∞)

+

·

v

v

Page 31: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Operations – Union & Intersection

1 – ((1-a)-p + (1-b)-p -1)(1/p) p > 0

a b p = 0

1 – ((1-a)-p + (1-b)-p -1)(1/p) p < 0,(1-a)-p + (1-b)-p >1

1 p < 0,(1-a)-p +(1-b)-p ≤ 1

aT

p b =

Page 32: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Operations – Union & Intersection

(a-p + b-p -1)(1/p) p > 0

ab p = 0

(a-p + b-p -1)(1/p) p < 0,a-p + b-p >1

0 p < 0,a-p +b-p ≤ 1

a Tp b =

Page 33: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Example of Set Operations

• Assume – Union = Probabilistic Sum

• = a b = a + b – ab

– Intersection = Product• = a · b = ab

Page 34: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Term-Document Incidence Matrix

d1 d2 d3 d4 d5 d6 d7 d8

t1 0.1 0.7 0.6 0.4 0.8 0.6 0.3 0.6

t2 0.3 0.8 0.9 0.8 0.6 0.5 0.7 0.2

t3 0.8 1 0.2 0.6 0.8 0.2 0.8 0.5

t4 0.4 0.6 0.7 0.5 0.7 0.3 0.1 0.9

t5 0.7 0.2 0.8 0.4 0.6 0.2 0.8 0.4

Page 35: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4

Page 36: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example (Method 1)Max, Min

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.10.50.6

0.70.30.2

0.70.50.6

Page 37: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example (Method 1)probabilistic sum and product

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.060.30.48

0.5040.120.09

0.53160.3840.4998

Page 38: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Differences

Document Max, Min Prob. Sum, Product

d3 0.7 0.5316

d6 0.5 0.384

d8 0.6 0.4998

Page 39: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

-level Sets

Page 40: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

-level Sets - Definitions

• -level Set– D() = {<d,t> | D (d, t) ≥ }– Dt() = { d | <d, t> D ()}

• Then meaning of Dt() of descriptor tTT* – Dt() = {<d, Dt()

(d) = Dt()(d) > | d Dt()}

Note: T* has same meaning as E used in Boolean IR

• If t’T*, then meaning of Dt() of descriptor t =t’– Dt() = {<d, Dt()

(d)=1-Dt’()(d)> |

1-Dt’()(d) > ,d Dt’() }

Page 41: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

-level Sets - Definitions

• If t’, t’’T*, then meaning of Dt() of descriptor t = t’ t’’– Dt() = {<d, Dt()

(d)=min(Dt’()(d), Dt’’()

(d))> |d Dt’() Dt’’()

• If t’, t’’T*, then meaning of Dt() of descriptor t = t’ t’’– Dt() = {<d, Dt()

(d)=max(Dt’()(d), Dt’’()

(d))> |d Dt’() Dt’’()

Page 42: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4

Page 43: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example Max, Min

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.10.50.6

0.70.30.2

0.70.50.6

Page 44: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Fuzzy example Max, Min, *=0.3

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2 (0)

0.90.50.0

0.2 (0)0.2 (0)0.5

0.70.30.9

0.1 (0)0.51.0

1.01.00.5

0.0 0.50.6

0.70.30.0

0.70.50.6

Page 45: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Questions?