37
MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016 Marco Antonio Casanova Gisele Rabello Lopes Bernardo Pereira Nunes

MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

Franklin Anderson de Amorim

17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016

Marco Antonio CasanovaGisele Rabello Lopes

Bernardo Pereira Nunes

Page 2: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

Franklin Anderson de Amorim

17th International Conference on Electronic Commerce and Web Technologies - EC-Web 2016

Marco Antonio CasanovaGisele Rabello Lopes

Bernardo Pereira Nunes

Page 3: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Agenda

1. Frequent Itemsets and Data Streams 2. MFI-TransSW+ algorithm 3. ClickRec Recommendation System 4. Experiments and results.

Page 4: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Frequent Itemsets

{bread,milk,coffee},{bread,milk,cheese},{bread,cheese}

Item transaction

Itemsets k=2 Support

bread, milk 2

bread, coffee 1

milk, coffee 1

bread, cheese 2

milk, cheese 1

X is frequent if and only if sup(X) ≥ N · s, were N is the number of transactions and s is a limit, defined by the user, called minimum support.

s = 0.5

Frequent itemset

N = 3If a set I of items is frequent, then so is every subset of I.

Page 5: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Data Streams

{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f},{a,b,c}...

Data stream

Page 6: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

{a,b,c},{a,b,d}, {c,d,f},{a,d},{d,e},{b,c,e},{a,f},{a,c,f}

Data Stream - Sliding Windows

Sliding window

window size = 6,{a,b,c}

Page 7: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

MFI-TransSW & MFI-TransSW+

Page 8: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

MFI-TransSW

• Process sliding windows • Uses bit vectors

bit(x)=101001

(original algorithm)

Page 9: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

MFI-TransSW

Phases1. Load window 2. Slide window 3. Generate frequent itemsets

Page 10: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

T1=(acd) ,T4=(be)

bit(a)=1

bit(b)=0

bit(c)=1

bit(d)=1

bit(e)=0

MFI-TransSW

01

0

0

1

window size=3

1

1011

11

00

1

, T3=(abce), T2=(bce)

Data stream Loading and sliding window

Page 11: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

bit(a)=1

bit(b)=0

bit(c)=1

bit(d)=1

bit(e)=0

T1=(acd) ,T4=(be)

MFI-TransSW

left bit-shift

01

0

0

11

1011

11

00

1

, T3=(abce), T2=(bce)

Data stream

window size=3

Loading and sliding window

Page 12: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

bit(a)=101

bit(b)=011

bit(c)=111

bit(d)=100

bit(e)=011

freq(a)=2

freq(b)=2

freq(c)=3

freq(e)=1

freq(f)=2

MFI-TransSW

window size=3 s=0.5

Mining frequent itemsets

Page 13: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

bit(a)=101

bit(b)=011

MFI-TransSW

freq(a)=2

freq(b)=2

bit(a <and> b)=001 freq(a <and> b)=1

bitwise AND

window size=3 s=0.5

Mining frequent itemsets

Page 14: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

• Fast • Finds all frequent itemsets • No false positives or false negatives • On-demand generation of frequent

itemsets • Small memory footprint

MFI-TransSW

Page 15: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b)

({a}),({b,c}),({a,b})

,(user-2,a)

({a}),({a,b,c}),({a,b})Transactions

Clickstream

MFI-TransSW+

List of UID's

0 1 2

user-1 user-2 user-3

0 1 2

bit(a) 1 0 1bit(b) 0 1 1bit(c) 0 1 0

bit(a) 1 1 1

Page 16: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

MFI-TransSW+(user-1,a),(user-2,b),(user-3,a),(user-2,c),(user-3,b),(user-2,a)Clickstream

0 1 2

bit(a) 1 1bit(b) 1 1bit(c) 0 1 0

List of UID's

0 1 2

user-2 user-3

List of Bit Vectors per User

0 1 2

0,1,2 0,1

0

1

2

,(user-4,b)

window size=3

1001

user-1user-4

01

Page 17: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

• Process clickstreams • Uses bit vectors as circular lists • More efficient “clean and update" • Faster

MFI-TransSW+

Page 18: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

ClickRec

Page 19: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

A news article realtime recommendation system based on web clickstreams and semantic annotations.

ClickRec

Page 20: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

ClickRec01100100 01100001 01110100

1) Data Streams Processor

Clickstream 2) Frequent Itemsets Miner

3) Recommender

MFI-TransSW+ MFI-TransSW+

ClickRec

Page 21: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

(user-1, {a,b,c})

ClickRec

(user-1, {<tag1>, <tag2>,<tag3>,<tag4>})

Page 22: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

(user-1, {a,b,c})

ClickRec

(user-1, {<neymar>, <messi>,<c.ronaldo>,<barcelona>})

Page 23: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

ClickRec<messi> <neymar>

<c.ronaldo> <barcelona> <messi>

TF-IDF

TF-IDF

<neymar> <barcelona> <messi>

<c.ronaldo> <chelsea> <messi>

<c.ronaldo> <barcelona> <robben>

<neymar> <chelsea> <robben>

Frequent itemsets

Page 24: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Experiments

Page 25: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

1. Real world clickstream from one of the largest news Web sites in Brazil

2. Total = 24 hours of clickstream = 25 million “clicks" (pageviews)

3. Two editorials: sports and entertainment

Experiments

Page 26: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

1. Load a window with w transactions 2. Execute 10k slidings 3. Measure the time to execute item 2

ExperimentsMFI-TransSW vs MFI-TransSW+

Page 27: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

MFI-TransSW vs MFI-TransSW+Ex

ecut

ion

time

(sec

onds

)

MFI-TranSW MFI-TranSW+

0,41

41,45

Window Size = 1.000

100x faster

Page 28: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Tim

es fa

ster

Window Size

1,00

0

2,00

0

3,00

0

4,00

0

5,00

0

6,00

0

7,00

0

8,00

0

9,00

0

10,0

00

816x

666x623x

521x476x

413x

337x286x

216x

102x

MFI-TransSW vs MFI-TransSW+

Page 29: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Experiments

Window sizeExecution time (seconds)

MFI-TranSW MFI-TranSW+1.000 41,45 0,412.000 136,74 0,633.000 272,24 0,954.000 395,55 1,185.000 533,10 1,296.000 761,31 1,607.000 996,10 1,918.000 1.295,16 2,089.000 1.484,10 2,23

10.000 1.928,76 2,36

MFI-TransSW vs MFI-TransSW+

Page 30: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

1. Divide clickstream in pairs of two consecutive hours A. The first hour is used to mine the frequent itemsets

B. The second hour is used to extract a sample of 10k users (the sample users must have accessed more than one page)

2. Test recommendations C. Feed the first page accessed by the user to ClickRec,

which recommends 10 pages to the user

D. Verify if the user accessed one of the recommendations

ExperimentsClickRec

Page 31: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

ExperimentsClickRec

Hit

rate

0%

5%

10%

15%

20%

25%

30%

35%

40%

0:00

vs

1:00

6:00

vs

7:00

12:0

0 vs

13:

00

18:0

0 vs

19:

00

Sports editorial

Morning Afternoon NightLate Night

Page 32: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

ExperimentsClickRec

Hit

rate

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

0:00

vs

1:00

6:00

vs

7:00

12:0

0 vs

13:

00

18:0

0 vs

19:

00

Entertainment editorial

Morning Afternoon NightLate Night

Page 33: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Conclusion

Page 34: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Conclusion

MFI-TransSW+ • Processes clickstreams • Uses bit vectors as circular lists • Up to 2 orders of magnitude faster than

the original algorithm (MFI-TransSW)

Page 35: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Conclusion

ClickRec • Based on MFI-TransSW+ • Uses semantic annotations • Generates recommendations in

realtime • Hit rate > 20%

Page 36: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

References

[Agrawal et al. 1994] AGRAWAL, R.; SRIKANT, R.. Fast Algorithms for Mining Association Rules. Proc. 20th int. conf. very large data bases, VLDB, p. 1–32, 1994. 3, 4.1.3

[Chi et al. 2006] CHI, Y.; WANG, H.; PHILIP, S. Y. ; MUNTZ, R. R.. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems, 10(3):265– 294, 2006. 3

[Li et al. 2009] LI, H.-F.; LEE, S.-Y.. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Systems with Applications, 36(2):1466–1477, 2009. 1.2, 3, 20

Page 37: MFI-TransSW+€¦ · MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams Franklin Anderson de Amorim 17th International Conference on Electronic Commerce

Thanks!