37
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: [email protected]

The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: [email protected]

Embed Size (px)

Citation preview

Page 1: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

The Summary of My Work In Graduate Grade One

Reporter: Yuanshuai Sun

E-mail: [email protected]

Page 2: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

1

2

3

4

5

KNN Algorithm—CF

Recommender System

Matrix Factorization

MF on Hadoop

Thesis Framework

Content

Page 3: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

1 Recommender System

Recommender system is a system which can recommend something you are maybe interested that you haven’t a try.

For example, if you have bought a book about machine learning, the system would give a recommendation list including some books about data mining, pattern recognition, even some programming technology.

Page 4: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

1 Recommender System

Page 5: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

1 Recommender System

But how she get the recommendation list ?

Machine Learning

1. Nuclear Pattern Recognition Method and Its Application2. Introduction to Robotics3. Data Mining4. Beauty of Programming5. Artificial Intelligence

Page 6: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

1 Recommender System

There are many ways by which we can get the list. Recommender systems are usually classified into the following categories, based on how recommendations are made,

1. Content-based recommendations: The user will be recommended items similar to the ones the user preferred in the past;

Page 7: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

1 Recommender System

2. Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked in the past;

Corated Item

Top 1

The similar user favorite but target user not bought

recommend it to target user

Page 8: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

1 Recommender System

3. Hybrid approaches: These methods combine collaborative and content-based methods, which can help to avoid certain limitations of content-based and collaborative.Different ways to combine collaborative and content-based methods into a hybrid recommender system can be classified as follows:1). implementing collaborative and content-based methods separately and combining their predictions,2). incorporating some content-based characteristics into a collaborative approach,3). incorporating some collaborative characteristics into a content-based approach,

4). constructing a general unifying model that incorporates both content-based and collaborative characteristics.

Page 9: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

KDD CUP 2011 website: http://kddcup.yahoo.com/index.php

Recommending Music Items based on the Yahoo! Music Dataset.

The dataset is split into two subsets: - Train data: in the file trainIdx2.txt- Test data: in the file testIdx2.txtAt each subset, user rating data is grouped by user. First line for a user is formatted as: <UsedId>|<#UserRatings>\nEach of the next <#UserRatings> lines describes a single rating by <UsedId>. Rating line format: <ItemId>\t<Score>\nThe scores are integers lying between 0 and 100, and are withheld from the test set. All user id's and item id's are consecutive integers, both starting at zero

Page 10: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

KNN is the algorithm used when I participate the KDD CUP 2011 with my advisor Mrs Lin, KNN belongs to collaborative recommendation.

Corated Item

Top 1

The similar user’s favorite song but target

user not seen

recommend it to target user

Page 11: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

user

item

?),,,(3

)?,,,(2

),?,,(1

333231

242221

141311

rrruser

rrruser

rrruser

11r ? 13r 14r

21r

31r

22r

32r 33r

? 24r

?

1

2

3

1 2 3 4

Page 12: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

1. Cosine distance

2. Pearson correlation coefficient

Where Sxy is the set of all items corated by both users x and y.

Page 13: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

21

)100)(100())(())(( ,,,,

xyxyxy Ssyx

Ssysyxsx

Ssysyxsx rrrrrrrrrr

UCyxsim ),(

21xyxyxy SSS

1. Cosine distance

where21xyxy SS and

Page 14: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

||10000 2,,,,

1xy

Sssysx

Sssysx Srrrr

xyxy

UPyxsim ),(

21xyxyxy SSS where 21

xyxy SS

2. Pearson correlation coefficient

and

Page 15: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

trackData.txt - Track information formatted as:<TrackId>|<AlbumId>|<ArtistId>|<Optional GenreId_1>|...|<Optional GenreId_k>\n

albumData.txt - Album information formatted as:<AlbumId>|<ArtistId>|<Optional GenreId_1>|...|<Optional GenreId_k>\n

artistData.txt - Artist listing formatted as:<ArtistId>\n

genreData.txt - Genre listing formatted as:<GenreId>\n

Page 16: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

a b c d e f g

h i

j k

l m

Track

Genre

Album

Artist

Page 17: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

),()()())(

1)()(

)()1((),(wt pcTpICcIC

pd

pd

pE

Epc

is comentropy.

1. The distance between parent node with child node

where

2. Similarity between c1 and c2

Page 18: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

Page 19: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

2 KNN Algorithm—CF

Page 20: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

u1

u2

u3

i1 i2 i3 Users Feature Matrix Items Feature Matrix

x11*y11 + x12*y12 = 1

x11*y21 + x12*y22 = 3

x21*y11 + x22*y12 = 2

x31*y21 + x32*y22 = 1

x31*y31 + x32*y32 = 3

U,V

x11*y31 + x12*y32 = ?

x21*y21 + x22*y22 = ?

x21*y31 + x22*y32 = ?

x31*y11 + x32*y12 = ?

Page 21: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

Matrix factorization (abbr. MF), just as the name suggests, decomposes a big matrix into the multiplication form of several small matrix. It defines mathematically as follows,

We here assume the target matrix , the factor matrix and , where k << min (m, n), so it is

nmR kmU

knV

),( TVUKR

Page 22: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

Kernel Function

Kernel Function decides how to compute the prediction matrix , that is, it’s a function with the features matrix U and V as the arguments. We can express it as follows:

R

),(~, jiji vuKcar

Page 23: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

Kernel Function

For the kernel K : one can use one of the following well-known kernels:

RRR kk

jijil vuvuK ,),(

djijip vuvuK ),1(),(

)2

||||exp(),(

2

2

ji

jir

vuvuK

),(),( , jijisjis vubvuK

linear

polynomial

RBF

logistic

………………

…………

………..

………

xs ex

1

1:)(with

Page 24: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

Rji

jiijVU

vurf),(

2

,)(minarg

We quantify the quality of the approximation with the Euclidean distance, so we can get the objective function as follows,

Rji

jijiVU

rrf),(

2~

,,,

))((minarg

Where i.e. is the predict value.

K

kjkikjiji vuvur

1**

~

, *~

, jir

Rji

jiji

ji

jiji

VUrr

r

rrf

),(

~

,,~

,

,,

,)log(minarg

22 |||||||| jvu vui

Page 25: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

1. Alternating Descent Method

This method only works, when the loss function implies with Euclidean distance.

0])[( iuj jjiij

i

UVVUrU

f

So, we can get

The same to .jV

)( jj

j jij

i VV

VrU

Page 26: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

2. Gradient Descent Method

The update rules of U defines as follows,

j jjiiji

VVUrU

f])[(

iii UfUU /*/

iuUwhere

The same to .jV

Rji

jiijVU

vurf),(

2

,)(minarg 22 |||||||| jvu vu

i

Page 27: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

Gradient AlgorithmStochastic Gradient Algorithm

Page 28: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

3 Matrix Factorization

Online Algorithm

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender Systems

Page 29: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

4 MF on Hadoop

Loss Function

Rji

jiijVU

vurf),(

2

,)(minarg

])()[( ijT

ijT

ijijij UVUURVV

UVU

URVV

T

T

jij

We update the factor V for reducing the objective function f with the conventional gradient descendent, as follows,

, the same to factor matrix U.

, so it is reachableUVU

VT

Here we set

Page 30: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

4 MF on Hadoop

Page 31: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

4 MF on Hadoop

Page 32: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

BA 1b 2b … nb

Ta1

Ta2

Tma

R_1_1 R_1_2 … R_1_n

R_m_1 R_m_2 … R_m_n

R_2_1 R_2_2 … R_2_n

……

4 MF on Hadoop

Page 33: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

4 MF on Hadoop

× =

Left Matrix

Right Matrix

× =

× =

+

+

||

Page 34: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

4 MF on Hadoop

BA

Tb1Tb2

…Tsb

1a

2a

sa

R_1_1

R_s_s

R_2_2

Page 35: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

4 MF on Hadoop

AB =

… . . . . . . …

,

11C

1MC

N1C

MNC

A =

… . . . . . . …

,

11A

1AM

S1A

MSA

B =

… . . . . . . …

,

11B

1SB

N1B

SNB

where ),...,1;,...,1(1

NjMiBACS

kkjikij

Page 36: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn

5 Thesis Framework

Recommendation System

1. Introduction to recommendation system

2. My work to KNN

3. Matrix factorization in recommendation system

4. MF incremental updating using Hadoop

Page 37: The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun E-mail: sunyuan_2008@yahoo.cn