32
Leveraging Collaborative Tagging for Web Item Design Mahashweta Das, Gautam Das , Vagelis Hristidis 07/03/2022 1 Presenter : Ajith C Ajjarani [1000-727269]

Leveraging collaborativetaggingforwebitemdesign ajithajjarani

Embed Size (px)

DESCRIPTION

Presentation for Data Exploration

Citation preview

Page 1: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

1

Leveraging Collaborative Tagging for Web Item Design

Mahashweta Das, Gautam Das , Vagelis Hristidis

04/12/2023

Presenter : Ajith C Ajjarani[1000-727269]

Page 2: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 2

Outline : Organization of Presentation!

Motivation & Problem Definition

Tag Maximization : NP Complete

Approximation Algorithm

Exact 2 Tier : Top K Algorithm

Experiment & result Tabulation

Moderate Instances Larger Instances

Naïve Bayes

Classifier

Page 3: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 3

Motivation

Can I design a New Camera Which Attracts

& maximizes the Tags ??

Lets Define this Opportunity as

problem !

Page 4: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 4

Problem Construction ? Attributes are product definition Tags are user-defined

Now, given subset of subjective “Desired“ Tags predict a New Item( a combination of Attribute values) Extend this to “Top K” version for potential k Items with

highest expected number of desirable Tags.

Training Data

Page 5: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

• Given a database of tagged products, task is to design k new products (attribute values) that are likely to attract maximum number of desirable tags– tag-desirability is just one aspect of product design consideration

• Applications– electronics, autos, apparel– musical artist, blogger

Problem Statement

Resolution?

Zoom? Flash?

Shooting mode?

Light Sensitivity?

Page 6: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 6

Tag Maximization

Technically challenging, as complex dependencies exist between tags and items

Difficult to determine a combination of attribute values that maximizes the expected number of desirable tags.

“Naïve Bayes” Classifier for Tag Prediction. Even for this Classifier(assumption of simplistic Conditional

Independence), Tag maximization problem is NP- Complete.Researchers have NOT resorted to Heuristics

Developed Principal Algorithms

Page 7: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

Proposed Solution

Exact – Top K Algorithm (ETT) performs significantly better than naïve brute force algorithm.

(No need to compute all possible products ) Application of Rank-Join and TA top-k algorithm in a two-tier architecture In the worst case, may have exponential running time

Approximation Algorithm (Poly Time Approximation Scheme) with provable Error bounds

The algorithm’s overall running time is exponential only in the (constant) size of the groups, but can be reduced to a polynomial time complexity.

For Large datasets

Page 8: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 8

Problem Framework • D = {o1, o2, ..., on}• A = {A1,A2, ..., Am}• T = {T1, T2, ..., Tr }

Each item is thus a vector of size (m + r) Eg :

• Above such dataset has been used as a training set to build Naive Bayes Classifiers (NBC) & compute P (Tag | Attributes)

BooleanDataset

Suren
Talk on Naive Bayes Classifer
Page 9: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 9

Derived Results

The probability that a new item o is annotated by the tag Tj

Probability Pr(Tj ‘ | o) of an item o not having tag Tj :

Page 10: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

Derived ResultsDerived :

Expected number of desirable tags Td = {T1, . . . , Tz} T .⊆ new Item(o) is annotated with:

Rj : Convenience

Page 11: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

Exact Algorithm

• Naïve brute-force– Consider all possible 2m products and compute for each

possible product– Exponential Complexity

• Exact two-tier top-k (ETT)– Application of Rank-Join and TA top-k algorithm in a two-tier

architecture– Does not need to compute all possible products

• performs significantly better than naïve brute-force– Works well for moderate data instances, does not scale to larger data

• In the worst case, may have exponential running time

Page 12: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

ETT: Two Tier Architecture

Z – desirable Tagsm‘ =m / l

Match these Items in tier-2 to compute global best product across all tags

Determine “best” Item for each tag(T1,T2..Tz) in tier-1

Page 13: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

ETT Algorithm(Exemplification)

• Database: {A1, A2, A3, A4 } and {T1, T2} and top-1– Partition attributes into 2 groups {A1, A2} and {A3, A4 } to form 2 lists of

partial products– Each list has ( 2m‘ ) 22= 4 entries (partial products)– Compute score for each partial product for each tag using and sort in descending order

Run NBC & Calculate

Page 14: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

GetNext() = GetNext() =

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L1 L2

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L1 L2

T1 T2

Tier 2

Tier 1

Join Product PartialScore

MPFS

.. .. .. ..

.. .. .. ..

BufferTop-K ()

Product Complete Score

.. ..

.. ..

MUS: sum of last seen score from all GetNext()

MPFS:

Actual/Complete :Score

Page 15: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

GetNext( ) = 1111 GetNext( ) = 1010

BufferTop-K ()

Product Complete Score

1111 1.75

1010 1.70

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L11 L12

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L21 L22

Join Product PartialScore

MPFS

1 1010 0.95 0.95

2 ..

.. ..

T1 T2

RankJoin

Join

Tier 2

Tier 1

Return to Tier 1

MinK (1.75) <= MUS (1.88)

Join Product PartialScore

MPFS

1 1111 0.93 0.93

.. ..

.. ..

>= >=

Iteration 1

Page 16: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

GetNext( ) = 1110GetNext( ) = 1011

BufferTop-K ()

Product Complete Score

1110 1.77

1011 1.76

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L11 L12

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L21 L22

Join Product PartialScore

MPFS

1 1010 0.95 0.95

2 1011 0.92 0.92

.. ..

T1 T2

RankJoin

Join

Tier 2

Tier 1

Return to Tier 1

MinK (1.77) <= MUS (1.79)

Join Product PartialScore

MPFS

1 1111 0.93 0.93

.2 1110 0.88 0.88

.. ..>= >=

Iteration 2

Page 17: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

GetNext( ) = 0111GetNext( ) = 0010

BufferTop-K ()

Product Complete Score

0111 1.77

0010 1.76

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L11 L12

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L21 L22

Join Product PartialScore

MPFS

1 1010 0.95 0.95

2 1011 0.92 0.92

3 0010 0.89 0.89

T1 T2

RankJoin

Join

Tier 2

Tier 1

ETT Terminates

MinK (1.77) <= MUS (1.74)

Join Product PartialScore

MPFS

1 1111 0.93 0.93

.2 1110 0.88 0.88

3 0111 0.84 0.84>= >=

Iteration 3

Thus, ETT returns the Best Item

(0111 or 1110) in Just 6 Item Look -up

Page 18: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 18

Approximation AlgorithmZ Desirable tags

T3,T4 … Tz ‘ T1,T2… Tz ‘

O11,O12…O1k

Z ‘ Tags

Z/Z‘ Subgroups

Z ‘ Tags

Top K Items for Each Subgroup

O21,O22…O2k

O1,O2…Ok

Overall Top K Items

Solved using PTAS in polynomial time defined for Approximation factor €

€ = 2σm σ = Compression factor

Page 19: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 19

PTAS Algorithm DesignZ Desirable tags

T1,T2… Tz ‘

Oa

Z =Z ‘ Tags

Top K =1 Item for This Subgroup

Oa PTAS returned ItemOg Optimal Item

For K = 1 & 1 Sub Group € > 0

PTAS Should run in Polynomial Time & Invariant Exact Score (Oa) >= (1- €) Exact Score (Og)

Page 20: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 20

PTAS Algorithm DesignSimple exponential time exact top-1 algorithm for the sub-problem is created & then deduced to PTAS

Given (m ) Boolean attributes and Z ‘ tags, the exponential time algorithm makes m iterations

Initial step : Produces the set S0 consisting of the single item {0m} along with its Z ‘ scores, one for each tag.

first iteration,it produces the set containing two items S1 = {0m, 10m−1}each accompanied by its Z ‘ scores, one for each tag.

ith iteration, it produces the set of itemsSi = {{0, 1}i×0m−1} along with their z scores, one for each tag.

final set Sm contains all 2m items along exact scores, from which the top-1 item can be returned,

Page 21: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 21

PTAS Algorithm Design

Consider this TableZ = Z‘ = 2σ = 0.5m = 4€ = (2σm) = 4

Og = {1110} [1.77] = [0.89+0.88]

Oa = {1111} [1.75] = [0.82+0.93]

Page 22: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 22

PTAS Algorithm DesignCluster’s item’s exact underlined scoreshould be close to the deleted item’s exact score.

Page 23: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 23

Experiment

Synthetic and real datasets for quantitative and qualitative analysis of proposed algorithms

Quantitative performance indicators are : efficiency of the proposed exact and approximation

algorithm. Obtained Approximation factor of results produced by

the approximation algorithm

Qualitative results of algorithms :Amazon Mechanical Turk user study to assess the results of algorithms.

Page 24: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 24

ExperimentReal Camera Dataset : Crawled a real dataset of 100 cameras listed at Amazon .

The listed camera’s contain technical details (attributes) & tags customers associate with each camera.

The tags are sanitized to remove synonyms, unintelligent and undesirable tags such as Nikon coolpix, quali, bad, etc.

Synthetic Dataset : Boolean matrix of dimension 10,000 (items) × 100 (50 attributes +50 tags)

50 independent distributed attributes into 4 groups, where the value is set to 1 with probabilities of 0.75, 0.15, 0.10 and 0.05

50 tags, predefined relations by randomly picking a set of attributes that are correlated

Page 25: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 25

Quantitative : PerformanceExact Algorithm:• Synthetic dataset having 1000 items, 16 attributes and 8 tags (Naïve Vs ETT)

Page 26: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 26

Quantitative : PerformanceBelow figure, reveals that ETT is extremely slow beyond number of attributes (m) = 16

PA with an approximation factor =0.5, continues to return guaranteed results in reasonable time with increasing number of attributes m

Page 27: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 27

Quantitative : PerformanceExecution time & obtained approximation factor Synthetic dataset1000 items, 20 attributes & 8 tags

Top 1 Item is considered.

Page 28: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 28

Qualitative : User StudyFirst part of User study :

PA algorithm with an approximation factor =0.5, by considering tag sets corresponding to compact cameras and slr cameras respectively.

Built 4 new cameras (2 digital compact & 2digital slr) PA algorithm € =0. 5 Vs

4 existing popular cameras

65% of users choose the new cameras

Page 29: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 29

Qualitative : User StudySecond part of the study :

Built 6 new cameras designed for three groups : 1. young students2. old retired3. professional photographers.

2 potential new cameras for each Group

When asked with users to assign at least five tags : observation : majority of the users rightly classify the six cameras into the three groups

Page 30: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 30

Conclusion Define the Tag Maximization problem & investigate its

computational complexity. Propose 2 novel Algorithms & shown the practicability This work is a preliminary look at a very novel area of

research & promises exciting directions of future research.

Decision trees, SVMs, and regression trees classifiers are to used & Conduct the experiment

Page 31: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

04/12/2023 31

Referenceshttp://crystal.uta.edu/~gdas/Courses/websitepages/fall10DBIR.html

Page 32: Leveraging collaborativetaggingforwebitemdesign ajithajjarani

Questions?

Thank You