Upload
posy-norton
View
218
Download
0
Embed Size (px)
Citation preview
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Supporting personalized ranking over categorical attributes
Presenter : Lin, Shu-Han
Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu
Information Sciences 178(2008)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Motivation Objective Methodology Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
3
Categorical attributes’ problem of information retrieval's personal ranking
Categorical attributes do not have an inherent ordering. How to rank the relevant data by categorical attribute.
For example, how can we…
Find old female with the preference of soda drink.
Name age Gender FavoriteDrink
Buy
Jane 30 Female Coke Coke, Milk
Mary 25 Female Pepsi Coke, Pepsi
Tom 21 Male Water Milk, Water
Denny 26 Male Coke Milk, Juice
Tina 11 Female Pepsi Red Wine, Pepsi
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
Enable a uniform ranked retrieval over a combination of categorical attributes and numerical attributes.
Support ranking of binary representation of categorical attribute Binary encoding
Sparsity
4
Name Female
Jane 1
Mary 1
Tom 0
Denny 0
Tina 1
Name Coke Pepsi Water
Jane 1 0 0
Mary 0 1 0
Tom 0 0 1
Name Coke Pepsi Water
Jane 1 0 1
Mary 1 1 0
Tom 1 0 1
Multi-valued attribute with bounded cardinality (item set, bc=2)Single-valued attribute
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Rank processing (TA)
A Simple example query:Find old female with the preference of soda drink.
Transform into
F= age + female
1. Candidate identification1. Sorted Access age and female
2. Find top-k sa(age) and sa(female), e.g., k=1, sa(age)={o1}; sa(female)={o2}
2. Candidate reduction1. O1=30+0
2. O2=25+1
3. O1 with the highest F score
3. Termination1. O1 !> F(30,1)=31 // upper bound score
2. Another round of sorted access to consider more candidates, e.g., sa(age)={O4}; sa(female)={O3}
7
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Bitmap – binary encoding
F=v1+v2+v3+v4, k=2
1) K={}, C={1111}( Initailization)2) OID=excute(C)
3) OID={o4},|OID|>0,K={[o4,4]}
4) C={0111/1011/1101/1110} ( Expansion)5) K.count < k, Back to 2)
6) …
8
v1 v2 v3 v4
O1 1 0 1 1
O2 0 1 0 0
O3 0 1 1 1
O4 1 1 1 1
o5 1 0 1 1
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Bitmap– sparsity
Single-valued attributeF=w1v1+w2v2+…+w6v6
ranked weightw1 w2 w3;w4 w5 w6≧ ≧ ≧ ≧for simple, all w=1,k=2
1) K={}, C={100.100.100} ( Initailization)2) OID=excute(C)
3) OID={o4},|OID|>0, K=OID={[o4,2]}
4) C={010.100.100/ 100.010.100/100.100.010} ( Expansion)5) K.count<k, Back to 2)
6) …
9
Attribute1 Attribute2 Attribute2
v1 V2 V3 V4 V5 V6 V4 V5 V6
O1 1 0 0 0 0 1 0 1 0
O2 0 1 0 0 1 0 1 0 0
O3 0 1 0 1 0 0 0 1 0
O4 1 0 0 1 0 0 1 0 0
o5 0 0 1 0 0 1 0 1 0
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Bitmap– sparsity
Multi-valued attribute with bounded cardinality
10
Attribute1 Attribute2
v1-1 V1-2 V1-3 V1-4 V2-1 V2-2 V2-3
O1 1 0 0 1 1 0 1
O2 0 1 0 1 1 1 0
O3 0 1 1 0 1 1 0
O4 1 0 0 1 1 0 1
o5 0 0 1 1 0 1 1
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
11
• UCI’s sparsity of indicating variable
• 22% of dataset consist only the categorical attributes.
• 56% of combination of numerical & categorical attributes.
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
14
Conclusions
This paper studies How to support rank formulation
Processing over data with categorical attributes
Instead of adopting existing numerical algorithms, develop a bitmap-based approach to Binary encoding Sparsity
Single-valued Multi-valued with bounded cardinality