28
EPG content recommendation in large scale: a case study on interactive TV platform D. Zibriczky, Z. Petres, M. Waszlavik, D. Tikk ICMLA 2013 - Machine Learning with Multimedia Data 7th December 2013. Miami. United States

EPG content recommendation in large scale: a case study on interactive TV platform

Embed Size (px)

Citation preview

Page 1: EPG content recommendation in large scale: a case study on interactive TV platform

EPG content recommendation in

large scale: a case study on

interactive TV platform

D. Zibriczky, Z. Petres, M. Waszlavik, D. Tikk

ICMLA 2013 - Machine Learning with Multimedia Data

7th December 2013. Miami. United States

Page 2: EPG content recommendation in large scale: a case study on interactive TV platform

Outline

• Introduction

• Problem

• Solution

• Offline results

• Online results

• Conclusion

Enter date in master2 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 3: EPG content recommendation in large scale: a case study on interactive TV platform

Introduction / Consumption trends

3 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 4: EPG content recommendation in large scale: a case study on interactive TV platform

Introduction / Electronic Program Guide

4 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 5: EPG content recommendation in large scale: a case study on interactive TV platform

Introduction / Goal

• SaskTel

• Finding relevant contents with minimal effort

• Time-shifting

• Multiple devices per household

• Graphical User Interface

• Increasing content consumption / watching length

• Increasing click through rate (CTR) using Gravity’s GUI

Enter date in master5 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 6: EPG content recommendation in large scale: a case study on interactive TV platform

Problem / Recommendation concept

• User: Device Users cannot be distinguished explicitly

More than one device per household

• Item: Scheduled contents (time, program id, channel id) Typically series or programs without episodes

Metadata: Information about the items

• Event: Remote controller / set-top-box based implicit feedbacks Switching channel, set to record, rewind, replay, stop, pause

Next schedule, watching duration

• Recommendable items Set of series or programs that are broadcasted at the moment of

recommendation request or later (on now, on later scenario)

• Recommendation Sorting recommendable items by prediction values

Other recommendation logic (randomization, mixing, etc..)

Enter date in master6 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 7: EPG content recommendation in large scale: a case study on interactive TV platform

Problem / Difficulties

• Implicit feedbacks only (no explicit data)

• Huge but noisy data set (zapping, leave-on, irrelevant events, …)

• Cold start problem (new items, short lifetime)

• Small recommendable set at a time

• Context dependency (time, multiple users per household)

• Difference between offline and online optimization

Enter date in master7 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 8: EPG content recommendation in large scale: a case study on interactive TV platform

Solution / Baselines

• Most popular channels

• Most popular contents (series or programs)

• Users’ favourite channels

• Users’ favourite contents (series or programs)

Enter date in master8 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 9: EPG content recommendation in large scale: a case study on interactive TV platform

Solution / Content-based filtering

• Cosine Similarity

• User model: Weighted average of meta vectors

• Prediction: Cosine similarity of vectors

• Improvement: Term frequency based weighting (TFIDF)

Enter date in master9 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

M The SimpsonsHow I Met

Your MotherFuturama …

Genre = Animation 1 0 1 …

Genre = Comedy 1 1 1 …

… … … … …

Director = Matt Groening 1 0 1 …

Director = Carter Bays 0 1 0 …

Actor = Dan Castellaneta 1 0 0 …

Actor = Billy West 0 0 1 …

… … … … …

User 1

0.53

0.81

0.18

0.00

0.18

0.00

Page 10: EPG content recommendation in large scale: a case study on interactive TV platform

Solution / Collaborative Filtering

• Matrix Factorization

• User model: User factors

• Prediction: Dot product of latent factors

• Solver: Alternating Least Squares with Coordinate Descent (IALS1)

Enter date in master10 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

R The SimpsonsHow I Met

Your MotherFuturama …

User 1 1 …

User 2 1 1 u2*i3 …

User 3 1 …

… … … … …

Item factorsi11 i21 i31 …

i21 i22 i32 …

User factors

u11 u12

u21 u22

u31 u32

… …

Page 11: EPG content recommendation in large scale: a case study on interactive TV platform

Solution / Hybrid filtering

Enter date in master11 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

R* The SimpsonsHow I Met

Your MotherFuturama …

User 1 1 0 0 …

User 2 1 1 0 …

User 3 0 1 0 …

… … … … …

Genre = Animation 1 0 1 …

Genre = Comedy 1 1 1 …

… … … … …

Director = Matt Groening 1 0 1 …

Director = Carter Bays 0 1 0 …

Actor = Dan Castellaneta 1 0 0 …

Actor = Billy West 0 0 1 …

… … … … …

User factors

u11 u12

u21 u22

u31 u32

… …

pu11 pu12

pu22 pu22

… …

… …

… …

… …

… …

… …

Item factorsi11 i21 I31 …

i21 i22 I32 …

• Hybrid IALS1

Page 12: EPG content recommendation in large scale: a case study on interactive TV platform

TP factors

Solution / Channel recommendation

• Tensor factorization (ITALS1)

• Prediction: Hadamard product of latent factors

• Improvement: Watching duration based weighting

Enter date in master12 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

R (4:00-12:00)ChannelSports 1

ChannelSports 2

ChannelNews 1

User 1 1 1 …

User 2 1 …

User 3 1 …

… … … … …

R (12:00-20:00)ChannelSports 1

ChannelSports 2

ChannelNews 1

User 1 1 …

User 2 1 1 …

User 3 1 …

… … … … …

R (20:00-4:00)ChannelSports 1

ChannelSports 2

ChannelNews 1

User 1 1 …

User 2 1 …

User 3 u3°i2°t3 …

… … … … …

User factors

u11 u12

u21 u22

u31 u32

… …

Item factorsi11 i21 i31 …

i21 i22 i32 …

t11

t12t21

t22t31

t32

Page 13: EPG content recommendation in large scale: a case study on interactive TV platform

Solution / Item grouping

Enter date in master13 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 14: EPG content recommendation in large scale: a case study on interactive TV platform

Solution / Preprocessing

Enter date in master14 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Original set

308M

82M

23M

Train set

22M

Test set

676K

Event type based filtering by significance

1

2

3 3

1

2

3

Filtering by leave-on and short duration

Splitting by time

Page 15: EPG content recommendation in large scale: a case study on interactive TV platform

Offline results / Measurement

Enter date in master15 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

• Metrics:

Recall@N

Mean Reciprocial Rank (MRR)

• Item splits:

Having events on training set or not

o old items

o new items

Popularity 20-80 split

o popular items

o tail items

Episode of a series or not

o series

o non-series

Page 16: EPG content recommendation in large scale: a case study on interactive TV platform

• Recall@15

* Items are grouped by series ids or program ids

** Items are grouped by channel ids

*** Blend: Combination of CosineSim, IALS1, ITALS1, HybridIALS1 and favourite programs/series

Offline results / Comparison

Enter date in master16 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Algorithm Type All itemsOld

items

New

items

Popular

items

Tail

itemsSeries

Non-

series

Most popular channels BL 17.45% 17.81% 10.59% 19.95% 3.90% 18.58% 2.91%

Most popular series BL 25.68% 27.06% 0.00% 30.35% 0.44% 27.69% 0.00%

Favourite channels BL 30.98% 31.61% 19.20% 32.68% 21.80% 32.52% 11.28%

Favourite series / programs BL 48.58% 51.13% 0.00% 53.55% 21.83% 52.34% 1.09%

CosineSim CBF 52.02% 52.92% 34.94% 53.58% 43.65% 53.69% 30.93%

IALS1* CF 46.75% 49.26% 0.00% 52.30% 16.78% 50.28% 1.65%

ITALS1** CF 41.68% 42.60% 24.48% 44.53% 26.26% 43.84% 14.06%

Hybrid IALS1* HF 51.08% 53.82% 6.78% 56.46% 22.01% 54.95% 1.63%

Blend*** 55.48% 56.98% 26.15% 57.64% 43.61% 57.91% 24.41%

Blend*** (MRR) 0.1038 0.1070 0.0405 0.1097 0.0712 0.1094 0.0322

Page 17: EPG content recommendation in large scale: a case study on interactive TV platform

• Recall@15

* Items are grouped by series ids or program ids

** Items are grouped by channel ids

*** Blend: Combination of CosineSim, IALS1, ITALS1, HybridIALS1 and favourite programs/series

Offline results / Comparison

Enter date in master17 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Algorithm Type All itemsOld

items

New

items

Popular

items

Tail

itemsSeries

Non-

series

Most popular channels BL 17.45% 17.81% 10.59% 19.95% 3.90% 18.58% 2.91%

Most popular series BL 25.68% 27.06% 0.00% 30.35% 0.44% 27.69% 0.00%

Favourite channels BL 30.98% 31.61% 19.20% 32.68% 21.80% 32.52% 11.28%

Favourite series / programs BL 48.58% 51.13% 0.00% 53.55% 21.83% 52.34% 1.09%

CosineSim CBF 52.02% 52.92% 34.94% 53.58% 43.65% 53.69% 30.93%

IALS1* CF 46.75% 49.26% 0.00% 52.30% 16.78% 50.28% 1.65%

ITALS1** CF 41.68% 42.60% 24.48% 44.53% 26.26% 43.84% 14.06%

Hybrid IALS1* HF 51.08% 53.82% 6.78% 56.46% 22.01% 54.95% 1.63%

Blend*** 55.48% 56.98% 26.15% 57.64% 43.61% 57.91% 24.41%

Blend*** (MRR) 0.1038 0.1070 0.0405 0.1097 0.0712 0.1094 0.0322

Page 18: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / User Interface

Enter date in master18 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 19: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / Measurement

Enter date in master19 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

• Metrics:

Click-through rate (CTR)

Watching Length Ratio (WR): The average watching length of the contents

that were watched at least 1 minutes by the user.

Completed Watched Ratio (CWR): The average ratio of the events in the

content was watched at least 90% of it’s remaining length.

• Methods:

EPG-Z: Standard consumption method (EPG and channel zapping)

R4U: Recommended 4 U

Page 20: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / Measurement

Enter date in master20 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

• Metrics:

Click-through rate (CTR)

Watching Length Ratio (WR): The average watching length of the contents

that were watched at least 1 minutes by the user.

Completed Watched Ratio (CWR): The average ratio of the events in the

content was watched at least 90% of it’s remaining length.

• Methods:

EPG-Z: Standard consumption method (EPG and channel zapping)

R4U: Recommended 4 U

EPG-Z vs. R4U?

Page 21: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / Clicks

Enter date in master21 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

0%

10%

20%

30%

40%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Distribution of clicks by position

• Users like to click on the first item.

• 80% of the clicks comes from one of the Top5 positions.

• More clicks in the 15th position (2.2%) than in the 14th (1.3%).

Page 22: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / CTR

Enter date in master22 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

33,20%

35,30%

33,12%

26,49%

39,05%

52,07%

33,16%

All items

Non-series

Series

Tail items

Popular items

New items

Old items

Click-through rate by different item splits

Page 23: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / CTR by usage

Enter date in master23 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

0%

10%

20%

30%

40%

50%

60%

70%

80%

1 10 100

Average CTR vs. # of rec. requests from the first use of R4U

Page 24: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / Watching behavior

Enter date in master24 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Watching Length Ratio Completed Watching Ratio

Item splits EPG-Z R4U EPG-Z R4U

Old items 30.02% 42.04% 16.02% 31.03%

New items 21.11% 35.51% 8.01% 23.12%

Popular items 30.81% 44.19% 16.30% 32.27%

Long-tail items 28.01% 38.43% 15.11% 27.66%

Series 31.04% 43.00% 16.92% 31.51%

Non-series 17.94% 15.26% 5.31% 7.22%

All items 29.90% 42.02% 15.91% 30.53%

Page 25: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / Watching behavior

• Contents selected via R4U are watched 40% longer and completed

with almost twice more probability than in standard way.

Enter date in master25 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Watching Length Ratio Completed Watching Ratio

Item splits EPG-Z R4U EPG-Z R4U

Old items 30.02% 42.04% 16.02% 31.03%

New items 21.11% 35.51% 8.01% 23.12%

Popular items 30.81% 44.19% 16.30% 32.27%

Long-tail items 28.01% 38.43% 15.11% 27.66%

Series 31.04% 43.00% 16.92% 31.51%

Non-series 17.94% 15.26% 5.31% 7.22%

All items 29.90% 42.02% 15.91% 30.53%<< <<

Page 26: EPG content recommendation in large scale: a case study on interactive TV platform

Online results / Offline vs. Online metrics

• High correlation between Recall/MRR and Completed Waching Ratio

Enter date in master26 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Offline Online

Recall@15 MRR CWR

Old items 56.98% 0.1070 31.03%

New items 26.15% 0.0405 23.12%

Popular items 57.64% 0.1097 32.27%

Long-tail items 43.61% 0.0712 27.66%

Series 57.91% 0.1094 31.51%

Non-series 24.41% 0.0322 7.22%

All items 55.48% 0.1038 30.53%

Page 27: EPG content recommendation in large scale: a case study on interactive TV platform

Conclusion

• Linear recommendation difficulties.

• Metadata based item modeling (CBF) is quite effective, additional

improvement by combining with CF.

• Users prefer first items, they don’t do much effort.

• High click-through rate, especially for new items.

• R4U affects user behavior and satisfaction.

• Contents selected via R4U are watched 40% longer and completed

with almost twice more probability than in standard way.

• High correlation between the proposed offline and online metrics.

Enter date in master27 ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States

Page 28: EPG content recommendation in large scale: a case study on interactive TV platform

ICMLA 2013 - Machine Learning with Multimedia Data. 2013. Miami. United States