Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10

Singular Value Singular Value Decomposition Decomposition

andandItem-Based Item-Based

Collaborative Filtering Collaborative Filtering forfor

Netflix PrizeNetflix Prize

Presentation by Tingda Lu at the Saturday Research meeting 10_23_10

enhanced (with audio added) by William perrizoComputer Science

North Dakota State UniversityFargo, ND 58108 USA

AgendaAgenda Singular Value DecompositionSingular Value Decomposition Item-based P-Tree CF algorithmItem-based P-Tree CF algorithm Similarity measurementsSimilarity measurements Experimental resultsExperimental results

Recommendation SystemRecommendation System analyzes customer’s purchase analyzes customer’s purchase

historyhistory identifies customer’s preferenceidentifies customer’s preference recommends most likely purchasesrecommends most likely purchases increases customer satisfactionincreases customer satisfaction leads to business successleads to business success amazon.com and Netflixamazon.com and Netflix

SVDSVD SVD is an important factorization of a SVD is an important factorization of a

rectangular real or matrix, with apps in rectangular real or matrix, with apps in signal processing and statisticssignal processing and statistics

SVD proposed in Netflix by Simon FunkSVD proposed in Netflix by Simon Funk SVD, mathematically, looks nothing like SVD, mathematically, looks nothing like

this but engineers, over many years this but engineers, over many years have boiled the technique down into have boiled the technique down into very simple versions (such as this one) very simple versions (such as this one) for their quick and effective usefor their quick and effective use

SVDSVD User’s rate movies with user preferences about various features of the movie. User’s rate movies with user preferences about various features of the movie. Features can be anything you want them to be (or nothing! randomly constructed!). Features can be anything you want them to be (or nothing! randomly constructed!).

In fact, it is typical to start with a fix number of meaningless features populated with In fact, it is typical to start with a fix number of meaningless features populated with

random values, then back propagate to "improve" those values until some random values, then back propagate to "improve" those values until some satisfaction level is reached (in terms of the RMSE). This back propagation is satisfaction level is reached (in terms of the RMSE). This back propagation is identical to that of the back prop of Neural Networks.identical to that of the back prop of Neural Networks.

Tingda found 30 features too small and 100 right (200 was too time consuming).Tingda found 30 features too small and 100 right (200 was too time consuming). Arijit: Go to Netflix site for feature ideas (meaningful features ought to be better?)Arijit: Go to Netflix site for feature ideas (meaningful features ought to be better?)

features -->

use

rs -->

movies -->

featu

res --

>

What about creating and optimizing (with back What about creating and optimizing (with back propagation) a custom matrix for each prediction we propagation) a custom matrix for each prediction we have to make? - i.e., in movie-vote.C or user-vote.C. have to make? - i.e., in movie-vote.C or user-vote.C.

The call from mpp-user-C to e.g., movie-vote.C sends The call from mpp-user-C to e.g., movie-vote.C sends M,U,supM,supU. M,U,supM,supU.

*** In movie-vote [or user-vote] before entering nested *** In movie-vote [or user-vote] before entering nested loop (outer VoterLoop, inner DimLoop), train optimal Vloop (outer VoterLoop, inner DimLoop), train optimal VTT and N matrixes for that vote only (so number of features and N matrixes for that vote only (so number of features could be raised substantially since [pruned] supM and could be raised substantially since [pruned] supM and supU are << 17,000 and 500,000).supU are << 17,000 and 500,000).

SVD trainingSVD training Parameters: learning rate and lambdaParameters: learning rate and lambda Tune the parameters to minimize errorTune the parameters to minimize error

Collaborative Collaborative Filtering (CF) Filtering (CF) alg is widely alg is widely used in used in recommendatiorecommendation systemsn systems

User-based CF User-based CF algorithm is algorithm is limited because limited because of its of its computation computation complexitycomplexity

Movie-based Movie-based (Item-based) CF (Item-based) CF has less has less scalability scalability concernsconcerns

/* Movie-based PTree CF*//* Movie-based PTree CF*/

PTree.load_binary();PTree.load_binary();

// Calculate the similarity // Calculate the similarity while while ii in in I I {{

while while jj in in I I { { simsimi,ji,j = sim(PTree[i], Ptree[j]); = sim(PTree[i], Ptree[j]);

}}}}

// Get the top K nearest neighbors to item // Get the top K nearest neighbors to item iipt=Ptree.get_items(u);pt=Ptree.get_items(u);sort(pt.begin(), pt.end(), simsort(pt.begin(), pt.end(), simi,pt.get_index()i,pt.get_index()););

// Prediction of rating on item // Prediction of rating on item ii by user by user uusum = 0.0, weight = 0.0;sum = 0.0, weight = 0.0;for (j=0; j<K; ++j) {for (j=0; j<K; ++j) {

sum += rsum += ru,pt[j]u,pt[j] * sim * simi,pt[j]i,pt[j]; ; weight += simweight += simi,pt[j]i,pt[j];;}}

pred = sum/weightpred = sum/weight

sim is any simmilarity function. The only req. is that sim(i.i) >= sim(i,j). In movie-vote.C one could backpropagate train VT and N (see *** on previous slide) anew for each call from mpp-user.C to movie-vote.C and thereby allow a large number of features (much higher accuracy?) because VT and N are much smaller than UT and M

Here Closed Nearest Neighbor methods should improve the result! If the similarity is simple enough to allow the calculation through PTrees, then closed K Nearest Neighbor will be both faster and more accurate.

Similarities Similarities (correlations)(correlations)

Cosine basedCosine based

Pearson correlationPearson correlation

Adjusted Cosine Adjusted Cosine

SVD item-feature or Tingda Lu similarity?SVD item-feature or Tingda Lu similarity?

or combining Pearson and Adj Cosine:or combining Pearson and Adj Cosine:

* i

i

j

j

Similarity Similarity CorrectionCorrection Two items are not similar if only a few Two items are not similar if only a few

customers purchased or rated bothcustomers purchased or rated both Co-support is included in item similarityCo-support is included in item similarity

PredictionPredictionWeighted AverageWeighted Average

Item EffectsItem Effects

\ \ RMSERMSENeighbor SizeNeighbor Size Cosine Pearson Adj. Cos SVD IF

K=10 1.0742 1.0092 0.9786 0.9865

K=20 1.0629 1.0006 0.9685 0.9900

K=30 1.0602 1.0019 0.9666 0.9972

K=40 1.0592 1.0043 0.9960 1.0031

K=50 1.0589 1.0064 0.9658 1.0078

Adj Cosine similarity gets much lower RMSE Adj Cosine similarity gets much lower RMSE The reason lies in the fact that other algorithms do not exclude the user rating varianceThe reason lies in the fact that other algorithms do not exclude the user rating varianceAdjusted Cosine algorithm discards the user variance hence gets better prediction accuracyAdjusted Cosine algorithm discards the user variance hence gets better prediction accuracy

Similarity Similarity CorrectionCorrectionAll algorithms get better RMSE All algorithms get better RMSE

with similarity correction with similarity correction except Adjusted Cosine.except Adjusted Cosine.

Cosine Pearson Adj. Cos SVD IF

Before 1.0589 1.0006 0.9658 0.9865

After 1.0588 0.9726 1.0637 0.9791

Improve 0.009% 2.798% -10.137% 0.750%

Item EffectsItem EffectsImprovements for all algorithms.Improvements for all algorithms.

Individual’s behavior influenced by others.Individual’s behavior influenced by others.

Cosine Pearson Adj. Cos SVD IF

Before 1.0589 1.0006 0.9658 0.9865

After 0.95750 0.9450 0.9468 0.9381

Improve

9.576% 5.557% 1.967% 4.906%

ConclusionConclusionExperiments were carried out on Cosine, Pearson, Adjusted Cosine and SVD item-feature algsExperiments were carried out on Cosine, Pearson, Adjusted Cosine and SVD item-feature algsSupport correction and item effects significantly improve the prediction accuracy.Support correction and item effects significantly improve the prediction accuracy.Pearson and SVD item-feature algs achieve better results with similarity correction and item Pearson and SVD item-feature algs achieve better results with similarity correction and item

effects. effects.

10_23_10 Saturday notes (by Mohammad) 10_23_10 Saturday notes (by Mohammad) Participants:Participants: Mohammad, Arjun, Arijit, Using Skype – Tingda and Prakash.Mohammad, Arjun, Arijit, Using Skype – Tingda and Prakash. Tingda Lu: “Singular Value Decomposition and item-based collaborative filtering for Netflix prize”. As Tingda went Tingda Lu: “Singular Value Decomposition and item-based collaborative filtering for Netflix prize”. As Tingda went

through the slides, the group members discussed various issues. Here are some key points of the discussionsthrough the slides, the group members discussed various issues. Here are some key points of the discussions

In the 5th slide, Tingda showed two matrices U and M. Matrix UT contains the users in rows and features in the columns. So In the 5th slide, Tingda showed two matrices U and M. Matrix UT contains the users in rows and features in the columns. So there would be 500,000 rows in the matrix (as there are half a million users in the Netflix problem) but number of there would be 500,000 rows in the matrix (as there are half a million users in the Netflix problem) but number of features is not known (as it is not described in the problem). As Tingda mentioned, you can take as many features as features is not known (as it is not described in the problem). As Tingda mentioned, you can take as many features as you wish but larger number would give you good result. The value of these features might be randomly filled but you wish but larger number would give you good result. The value of these features might be randomly filled but they will converge to some values by neural network back propagation. As Tingda found 10 to 30 features are too they will converge to some values by neural network back propagation. As Tingda found 10 to 30 features are too small, 40 – 60 still not large enough and 100 is good enough.small, 40 – 60 still not large enough and 100 is good enough.

M is the movie matrix where rows represent the features and columns represent the movies. So there are 100 features and M is the movie matrix where rows represent the features and columns represent the movies. So there are 100 features and 17,000 movies. So it’s a 100x17000 matrix – same thing goes for the features. Arijit suggested that we may go the 17,000 movies. So it’s a 100x17000 matrix – same thing goes for the features. Arijit suggested that we may go the Netflix’s website to see what the features they use to describe their movies are and we may use those features.Netflix’s website to see what the features they use to describe their movies are and we may use those features.

In slide no 8, an algorithm is shown for “Item based PTree CF”. The alg 1st calculates similarity between items in the item In slide no 8, an algorithm is shown for “Item based PTree CF”. The alg 1st calculates similarity between items in the item set I. Here a long discussion took place to choose the similarity function:set I. Here a long discussion took place to choose the similarity function:

– Tingda gave 4 similarity fctns; cosine, pearson, adjusted cosine and SVD item feature (shown in slide 9, 10).Tingda gave 4 similarity fctns; cosine, pearson, adjusted cosine and SVD item feature (shown in slide 9, 10).

– Dr. Perrizo's similarity is Sim(i, j) = a positive real number following the property that Sim(i, i) >= Sim(i, j).Dr. Perrizo's similarity is Sim(i, j) = a positive real number following the property that Sim(i, i) >= Sim(i, j).

– Dr. Perrizo made a suggestion of combining the Pearson and Adjusted cosine similarity function as follows:Dr. Perrizo made a suggestion of combining the Pearson and Adjusted cosine similarity function as follows:

In 2nd part, K nearest nbrs are computed. Dr. P suggested to use Closed KNN. I.e., consider all nbrs same distance as kth Dr. P.:use Sum of Cor (ui, uj), not Nij

Then Dr. P.: Use these similarities in use-vote.C and movie-vote.C and get ‘Pruned Training Set Support’ (PTSS) values, which will be used by mpp-user.C to make the final prediction (? )

More features -> more accuracy: In 1, if we include more features that will give us more accuracy in prediction. But we already have too many rows in user matrix (half a million). And we need to train the matrix using back prop (very time consuming). So don’t train matrices before pruning seriously like 10 users so that you can increase number of features.

Make code generic (not specific to Netflix problem) so that the code may be used in e.g., satellite imagery – LandSat 5?). 0 rating is not really 0 in Netflix problem should be removed in generic code as 0 may be a valid rating in other problem.

Tingda used similarity correction. E.g., he didn’t use 2 items ( or movies) similar if only a few number of users rated both. Tingda's formula: Log(Nij)*Sim(I,j) Dr. Perrizo suggest to use Sum of Cor (ui, uj) instead of Nij

Documents

Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10