23
1 Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014 Dept. of Informatics & Telecommunications University of Athens, Greece

Predicting Download Directories for Web Resources

Embed Size (px)

DESCRIPTION

Dept. of Informatics & Telecommunications University of Athens, Greece. Predicting Download Directories for Web Resources George Valkanas Dimitrios Gunopulos. 4 th International Conference on Web Intelligence, Mining and Semantics June 3, 2014. Online User Activities. - PowerPoint PPT Presentation

Citation preview

Page 1: Predicting  Download  Directories  for        Web  Resources

1

Predicting Download Directories for Web Resources

George Valkanas Dimitrios Gunopulos

4th International Conference on Web Intelligence, Mining and SemanticsJune 3, 2014

Dept. of Informatics & TelecommunicationsUniversity of Athens, Greece

Page 2: Predicting  Download  Directories  for        Web  Resources

2

Online User Activities

Activity ABS Survey

StatCan Survey

Infoplease Survey

Emailing 91% 93% 92%

General Web Browsing

87% > 70% 83%

Online Purchases

45% > 50% 62%

Download Content

37% ~30% 42%

Page 3: Predicting  Download  Directories  for        Web  Resources

3

Facilitating Downloads

Save Link In Folder

Page 4: Predicting  Download  Directories  for        Web  Resources

4

Facilitating Downloads

Save Link In Folder

Problems:• Predefined Directories• Blunt approach / No learning • UI Clutter• Tedious user management

Page 5: Predicting  Download  Directories  for        Web  Resources

5

A principled solution

Page 6: Predicting  Download  Directories  for        Web  Resources

6

A principled solution

Associate the navigation through the hierarchy with a cost function

One possible c.f.: Hierarchical Navigation Cost (HNC), i.e., #clicks

HNC(imgs/, docs/) = 2

Page 7: Predicting  Download  Directories  for        Web  Resources

7

Problem Definition

Given The hierarchical structure A target directory T, where the

resource will be saved

Goal Suggest a directory S that minimizes the cost function

cf( S, T )

Page 8: Predicting  Download  Directories  for        Web  Resources

8

Problem Definition

Given The hierarchical structure A target directory T, where the

resource will be saved

Goal Suggest a directory S that minimizes the cost function

cf( S, T )

•But if I know T, why not suggest T directly? (0 cost)

Page 9: Predicting  Download  Directories  for        Web  Resources

9

Problem Definition

Given The hierarchical structure A target directory T, where the

resource will be saved

Goal Suggest a directory S that minimizes the cost function

cf( S, T )

•But if I know T, why not suggest T directly? (0 cost)

In this setting, we don’t know T until it’s too late!

Page 10: Predicting  Download  Directories  for        Web  Resources

10

Casting to a classification framework Directories are potential class values T is the true target class S is the output of a classification process Web resource properties → classification features

Recommend S that best matches T Use directories from past saves as candidate classes

Page 11: Predicting  Download  Directories  for        Web  Resources

11

Features & Distances

Feature DistanceTimestamp Exponential decay

Domain (current / referrer) Equality

Path, filename (current / referrer page)

Tokenize & Jaccard

Title Tokenize & Jaccard

Filename Tokenize & Jaccard

Extension Covariance Matrix

Keywords Jaccard

Page 12: Predicting  Download  Directories  for        Web  Resources

12

Experimental Setup

Implement classifier as a FF plugin DiDoCtor approach Javascript 1-NN classifier

6 participants 4-month minimum use period

Baseline Last-by-domain (LBD), current browser approach Simulated, based on submitted result

Metrics Click Distance: HNC, Breadcrumbs Classification Accuracy

Page 13: Predicting  Download  Directories  for        Web  Resources

13

Preliminary Result Analysis

Page 14: Predicting  Download  Directories  for        Web  Resources

14

Preliminary Result Analysis

Take Home Messages1. Users have different saving pattern behavior(s)

Page 15: Predicting  Download  Directories  for        Web  Resources

15

Preliminary Result Analysis

Take Home Messages1. Users have different saving pattern behavior(s)

2. Users have high variability in their accesses to each directory

Page 16: Predicting  Download  Directories  for        Web  Resources

16

Click Distance - HNC

Take Home MessageSignificant reduction in number of clicks to reach target directory!

Page 17: Predicting  Download  Directories  for        Web  Resources

17

Click Distance - HNC

Take Home MessageSignificant reduction in number of clicks to reach target directory!

Click distance gainis even higher

when consideringa breadcrumbs UI!

Page 18: Predicting  Download  Directories  for        Web  Resources

18

Running Accuracy

Take Home MessageDiDoctor is much more accurate in predicting the download directory

Page 19: Predicting  Download  Directories  for        Web  Resources

19

Basic Model Extensions

Feature reweightingRELIEF_F

Page 20: Predicting  Download  Directories  for        Web  Resources

20

Basic Model Extensions

Feature reweightingRELIEF_F

Suggesting k directories

Page 21: Predicting  Download  Directories  for        Web  Resources

21

Alternative classifiers

Take Home Messages• Classifiers can help!• DiDoCtor generally

performs the best• Accuracy is affected

by user behavior!

Page 22: Predicting  Download  Directories  for        Web  Resources

22

Conclusions & Future work

Approach for facilitating downloads Optimization problem & classification framework

Experimentation with real users Basic model extensions

Further exploit the temporal dimension More informative features (e.g., entities) Automatic generation of directories

Page 23: Predicting  Download  Directories  for        Web  Resources

23

Thank you!

Questions?

AcknowledgementsTo the evaluators of our pluginHeraclitus II fellowship, THALIS-GeoComp,

THALIS-DISFER, Aristeia-MMD, EU project INSIGHT