Leveraging collaborativetaggingforwebitemdesign ajithajjarani

1

Leveraging Collaborative Tagging for Web Item Design

Mahashweta Das, Gautam Das , Vagelis Hristidis

04/12/2023

Presenter : Ajith C Ajjarani[1000-727269]

04/12/2023 2

Outline : Organization of Presentation!

Motivation & Problem Definition

Tag Maximization : NP Complete

Approximation Algorithm

Exact 2 Tier : Top K Algorithm

Experiment & result Tabulation

Moderate Instances Larger Instances

Naïve Bayes

Classifier

04/12/2023 3

Motivation

Can I design a New Camera Which Attracts

& maximizes the Tags ??

Lets Define this Opportunity as

problem !

04/12/2023 4

Problem Construction ? Attributes are product definition Tags are user-defined

Now, given subset of subjective “Desired“ Tags predict a New Item( a combination of Attribute values) Extend this to “Top K” version for potential k Items with

highest expected number of desirable Tags.

Training Data

• Given a database of tagged products, task is to design k new products (attribute values) that are likely to attract maximum number of desirable tags– tag-desirability is just one aspect of product design consideration

• Applications– electronics, autos, apparel– musical artist, blogger

Problem Statement

Resolution?

Zoom? Flash?

Shooting mode?

Light Sensitivity?

04/12/2023 6

Tag Maximization

Technically challenging, as complex dependencies exist between tags and items

Difficult to determine a combination of attribute values that maximizes the expected number of desirable tags.

“Naïve Bayes” Classifier for Tag Prediction. Even for this Classifier(assumption of simplistic Conditional

Independence), Tag maximization problem is NP- Complete.Researchers have NOT resorted to Heuristics

Developed Principal Algorithms

Proposed Solution

Exact – Top K Algorithm (ETT) performs significantly better than naïve brute force algorithm.

(No need to compute all possible products ) Application of Rank-Join and TA top-k algorithm in a two-tier architecture In the worst case, may have exponential running time

Approximation Algorithm (Poly Time Approximation Scheme) with provable Error bounds

The algorithm’s overall running time is exponential only in the (constant) size of the groups, but can be reduced to a polynomial time complexity.

For Large datasets

04/12/2023 8

Problem Framework • D = {o1, o2, ..., on}• A = {A1,A2, ..., Am}• T = {T1, T2, ..., Tr }

Each item is thus a vector of size (m + r) Eg :

• Above such dataset has been used as a training set to build Naive Bayes Classifiers (NBC) & compute P (Tag | Attributes)

BooleanDataset

Suren

Talk on Naive Bayes Classifer

04/12/2023 9

Derived Results

The probability that a new item o is annotated by the tag Tj

Probability Pr(Tj ‘ | o) of an item o not having tag Tj :

Derived ResultsDerived :

Expected number of desirable tags Td = {T1, . . . , Tz} T .⊆ new Item(o) is annotated with:

Rj : Convenience

Exact Algorithm

• Naïve brute-force– Consider all possible 2m products and compute for each

possible product– Exponential Complexity

• Exact two-tier top-k (ETT)– Application of Rank-Join and TA top-k algorithm in a two-tier

architecture– Does not need to compute all possible products

• performs significantly better than naïve brute-force– Works well for moderate data instances, does not scale to larger data

• In the worst case, may have exponential running time

ETT: Two Tier Architecture

Z – desirable Tagsm‘ =m / l

Match these Items in tier-2 to compute global best product across all tags

Determine “best” Item for each tag(T1,T2..Tz) in tier-1

ETT Algorithm(Exemplification)

• Database: {A1, A2, A3, A4 } and {T1, T2} and top-1– Partition attributes into 2 groups {A1, A2} and {A3, A4 } to form 2 lists of

partial products– Each list has ( 2m‘ ) 22= 4 entries (partial products)– Compute score for each partial product for each tag using and sort in descending order

Run NBC & Calculate

GetNext() = GetNext() =

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L1 L2

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L1 L2

T1 T2

Tier 2

Tier 1

Join Product PartialScore

MPFS

.. .. .. ..

.. .. .. ..

BufferTop-K ()

Product Complete Score

.. ..

.. ..

MUS: sum of last seen score from all GetNext()

MPFS:

Actual/Complete :Score

GetNext( ) = 1111 GetNext( ) = 1010

BufferTop-K ()


1111 1.75

1010 1.70

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L11 L12

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L21 L22


MPFS

1 1010 0.95 0.95

2 ..

.. ..

T1 T2

RankJoin

Join

Tier 2

Tier 1

Return to Tier 1

MinK (1.75) <= MUS (1.88)


MPFS

1 1111 0.93 0.93

.. ..

.. ..

>= >=

Iteration 1

GetNext( ) = 1110GetNext( ) = 1011

BufferTop-K ()


1110 1.77

1011 1.76

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L11 L12

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L21 L22


MPFS

1 1010 0.95 0.95

2 1011 0.92 0.92

.. ..

T1 T2

RankJoin

Join

Tier 2

Tier 1

Return to Tier 1

MinK (1.77) <= MUS (1.79)


MPFS

1 1111 0.93 0.93

.2 1110 0.88 0.88

.. ..>= >=

Iteration 2

GetNext( ) = 0111GetNext( ) = 0010

BufferTop-K ()


0111 1.77

0010 1.76

(A1 A2)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

(A3 A4)

10, 1.97

00, 0.84

11, 0.84

01, 0.36

L11 L12

(A1 A2)

11, 2.76

01, 1.18

10, 1.18

00, 0.51

(A3 A4)

11, 4.57

10, 2.53

01, 0.91

00, 0.51

L21 L22


MPFS

1 1010 0.95 0.95

2 1011 0.92 0.92

3 0010 0.89 0.89

T1 T2

RankJoin

Join

Tier 2

Tier 1

ETT Terminates

MinK (1.77) <= MUS (1.74)


MPFS

1 1111 0.93 0.93

.2 1110 0.88 0.88

3 0111 0.84 0.84>= >=

Iteration 3

Thus, ETT returns the Best Item

(0111 or 1110) in Just 6 Item Look -up

04/12/2023 18

Approximation AlgorithmZ Desirable tags

T3,T4 … Tz ‘ T1,T2… Tz ‘

O11,O12…O1k

Z ‘ Tags

Z/Z‘ Subgroups

Z ‘ Tags

Top K Items for Each Subgroup

O21,O22…O2k

O1,O2…Ok

Overall Top K Items

Solved using PTAS in polynomial time defined for Approximation factor €

€ = 2σm σ = Compression factor

04/12/2023 19

PTAS Algorithm DesignZ Desirable tags

T1,T2… Tz ‘

Oa

Z =Z ‘ Tags

Top K =1 Item for This Subgroup

Oa PTAS returned ItemOg Optimal Item

For K = 1 & 1 Sub Group € > 0

PTAS Should run in Polynomial Time & Invariant Exact Score (Oa) >= (1- €) Exact Score (Og)

04/12/2023 20

PTAS Algorithm DesignSimple exponential time exact top-1 algorithm for the sub-problem is created & then deduced to PTAS

Given (m ) Boolean attributes and Z ‘ tags, the exponential time algorithm makes m iterations

Initial step : Produces the set S0 consisting of the single item {0m} along with its Z ‘ scores, one for each tag.

first iteration,it produces the set containing two items S1 = {0m, 10m−1}each accompanied by its Z ‘ scores, one for each tag.

ith iteration, it produces the set of itemsSi = {{0, 1}i×0m−1} along with their z scores, one for each tag.

final set Sm contains all 2m items along exact scores, from which the top-1 item can be returned,

04/12/2023 21

PTAS Algorithm Design

Consider this TableZ = Z‘ = 2σ = 0.5m = 4€ = (2σm) = 4

Og = {1110} [1.77] = [0.89+0.88]

Oa = {1111} [1.75] = [0.82+0.93]

04/12/2023 22

PTAS Algorithm DesignCluster’s item’s exact underlined scoreshould be close to the deleted item’s exact score.

04/12/2023 23

Experiment

Synthetic and real datasets for quantitative and qualitative analysis of proposed algorithms

Quantitative performance indicators are : efficiency of the proposed exact and approximation

algorithm. Obtained Approximation factor of results produced by

the approximation algorithm

Qualitative results of algorithms :Amazon Mechanical Turk user study to assess the results of algorithms.

04/12/2023 24

ExperimentReal Camera Dataset : Crawled a real dataset of 100 cameras listed at Amazon .

The listed camera’s contain technical details (attributes) & tags customers associate with each camera.

The tags are sanitized to remove synonyms, unintelligent and undesirable tags such as Nikon coolpix, quali, bad, etc.

Synthetic Dataset : Boolean matrix of dimension 10,000 (items) × 100 (50 attributes +50 tags)

50 independent distributed attributes into 4 groups, where the value is set to 1 with probabilities of 0.75, 0.15, 0.10 and 0.05

50 tags, predefined relations by randomly picking a set of attributes that are correlated

04/12/2023 25

Quantitative : PerformanceExact Algorithm:• Synthetic dataset having 1000 items, 16 attributes and 8 tags (Naïve Vs ETT)

04/12/2023 26

Quantitative : PerformanceBelow figure, reveals that ETT is extremely slow beyond number of attributes (m) = 16

PA with an approximation factor =0.5, continues to return guaranteed results in reasonable time with increasing number of attributes m

04/12/2023 27

Quantitative : PerformanceExecution time & obtained approximation factor Synthetic dataset1000 items, 20 attributes & 8 tags

Top 1 Item is considered.

04/12/2023 28

Qualitative : User StudyFirst part of User study :

PA algorithm with an approximation factor =0.5, by considering tag sets corresponding to compact cameras and slr cameras respectively.

Built 4 new cameras (2 digital compact & 2digital slr) PA algorithm € =0. 5 Vs

4 existing popular cameras

65% of users choose the new cameras

04/12/2023 29

Qualitative : User StudySecond part of the study :

Built 6 new cameras designed for three groups : 1. young students2. old retired3. professional photographers.

2 potential new cameras for each Group

When asked with users to assign at least five tags : observation : majority of the users rightly classify the six cameras into the three groups

04/12/2023 30

Conclusion Define the Tag Maximization problem & investigate its

computational complexity. Propose 2 novel Algorithms & shown the practicability This work is a preliminary look at a very novel area of

research & promises exciting directions of future research.

Decision trees, SVMs, and regression trees classifiers are to used & Conduct the experiment

04/12/2023 31

Referenceshttp://crystal.uta.edu/~gdas/Courses/websitepages/fall10DBIR.html

http://crystal.uta.edu/~gdas/Courses/websitepages/fall10DBIR.html

Questions?

Thank You

Technology

Leveraging collaborativetaggingforwebitemdesign ajithajjarani