Click here to load reader

Scalable Recommendation Algorithms with LSH

  • View
    65

  • Download
    4

Embed Size (px)

Text of Scalable Recommendation Algorithms with LSH

  • Scalable Recommendation Algorithms for Massive Data

    Maruf Aytekin

    PhD Candidate

    Computer Engineering Department Bahcesehir University

  • Outline Introduction Collaborative Filtering (CF) and Scalability Problem Locality Sensitive Hashing (LSH) for Recommendation Improvement for LSH methods Preliminary Results Work Plan

  • Recommender SystemsRecommender systems Applied to various domains:

    Book/movie/news recommendations Contextual advertising Search engine personalization Matchmaking

    Two type of problems: Preference elicitation (prediction) Set-based recommendations (top-N)

  • Recommender Systems Content-based filtering Collaborative filtering (CF)

    Model-based Neighborhood-based

  • Neighborhood-based Methods

    The idea: Similar users behave in a similar way. User-based: rely on the opinion of like-minded users to

    predict a rating. Item-based: look at rating given to similar items. Require computation of similarity weights to select trusted neighbors whose ratings are used in the prediction.

  • Neighborhood-based Methods

    Problem Compare all users/items to find trusted neighbors

    (k-nearest-neighbors) Not scale well with data size (# of users/items)

    Computational Complexity

    Space Model Build Query User-based O(m2) O(m2n) O(m) Item-based O(n2) O(n2m) O(n) m : number of users n : number of items

  • Various MethodsModel-based recommendation techniques Dimensionality reduction (SVD, PCA, Random projections) Classification (like, dislike) Neural network classifier Clustering (ANN) Bayesian inference techniques

    Distributed computation Map-reduce Distributed CF algorithms

  • Locality Sensitive Hashing (LSH)

    ANN search method Provides a way to eliminate searching all of the data to

    find the nearest neighbors Finds the nearest neighbors fast in basic

    neighbourhood based methods.

  • Locality Sensitive Hashing (LSH)

    General approach: Hash items several times, in such a way that similar

    items are more likely to be hashed to the same bucket than dissimilar items are.

    Pairs hashed to the same bucket candidate pairs. Check only the candidate pairs for similarity.

  • Locality-Sensitive FunctionsThe function h will hash items, and the decision will be based on whether or not the result is equal. h(x) = h(y) make x and y a candidate pair. h(x) h(y) do not make x and y a candidate pair. g = h1 AND h2 AND h3

    or

    g = h1 OR h2 OR h3 A collection of functions of this form will be called a family of

    functions.

  • LSH for CosineCharikar defines family of functions for Cosine as follows: Let u and v be rating vectors and r is a random generated vector whose components are +1 and 1. The family of hash functions (H) generated:

    , where

    shows the probability of u and v being declared as a candidate pair.

  • LSH for CosineExample: r1 = [-1, 1, 1,-1,-1]

    r2 = [ 1, 1, 1,-1,-1]

    r3 = [-1,-1, 1,-1, 1]

    r4 = [-1, 1,-1, 1,-1]

    h1(u1) = u1.r1 = -6 => 0 h2(u1) = u1.r2 = 4 => 1 h3(u1) = u1.r3 = -12 => 0 h4(u1) = u1.r4 = 2 => 1

    u1 = [5, 4, 0, 4, 1]

    u2 = [2, 1, 1, 1, 4]

    u3 = [4, 3, 0, 5, 2]

    g(u1) = 0 1 0 1

    g(u2) = 0 0 1 0 g(u3) = 0 1 0 1

    AND g(u1) = 0101

    max 24 = 16 buckets

  • LSH Model Build

    U1

    U2

    U3

    Um

    .

    .

    .

    .

    .

    h1

    h3 U7 U11 U10

    .

    .

    U13 U39

    .

    . Um

    U1 U3 U5

    .

    .

    U2 U9 U6

    .

    .

    bucket 1 key: 0101

    bucket 2 key: 1110

    bucket 3 key: 1101

    bucket 4 key: 1001

    h2

    h4

    [0,1]

    [0,1]AN

    D-C

    onst

    ruct

    ion

    [0,1]

    [0,1]

    K = 4, number of hash functions . . . .

  • Hash Tables (Bands)

    U2 U6 U1 U3

    .

    .

    .

    candidate set for U5 C(U5)

    L = 2 K = 4

    hash table 1

    hash table 2

  • LSH Methods Clustering Based:

    UB-KNN-LSH: User-based CF prediction with LSH

    IB-KNN-LSH: Item-based CF with LSH

    Frequency Based:

    UB-LSH1: User-based prediction with LSH

    IB-LSH1: Item-based prediction with LSH

  • LSH Methods for

    Prediction

  • UB-KNN-LSH IB-KNN-LSH

    find candidate set, C, for target user, u, with LSH.

    find k-nearest-neighbors to u from C that have rated on i.

    use k-nearest-neighbors to generate a prediction for u on i.

    find candidate set, C, for target item, i, with LSH.

    find k-nearest-neighbors to i from C which user u rated on.

    use k-nearest-neighbors to generate a prediction for u on item i.

    LSH Methods Prediction

  • UB-LSH1 IB-LSH1 find candidate users list, Cl, for

    u who rated on i with LSH.

    calculate frequency of each user in Cl who rated on i.

    sort candidate users based on frequency and get top k users

    use frequency as weight to predict rating for u on i with user-based prediction.

    find candidate items list, Cl, for i with LSH.

    calculate frequency of items in Cl which is rated by u.

    sort candidate items based on frequency and get top k items.

    use frequency as weight to predict rating for u on i with item based prediction.

    LSH Methods Prediction

  • Improvement Prediction

    UB-LSH2 IB-LSH2

    find candidate users list, Cl, for u who rated on i with LSH.

    select k users from Cl randomly.

    predict rating for u on i with user-based prediction as the average ratings of k users.

    find candidate items list, Cl, for i with LSH.

    select k items rated by u from Cl randomly.

    predict rating for u on i with item-based prediction as the average ratings of k items.

    - Eliminate frequency calculation and sorting. - Frequent users or items in Cl have higher chance to be selected randomly.

  • Complexity Prediction

    Space Model Build Prediction User-based O(m) O(m2) O(mn) Item-based O(n) O(n2) O(mn) UB-KNN-LSH O(mL) O(mLKt) O(L+|C|n+k) IB-KNN-LSH O(nL) O(nLKt) O(L+|C|m+k) UB-LSH1 O(mL) O(mLKt) O(L+|Cl|+|Cl|lg(|Cl|)+k) IB-LSH1 O(nL) O(nLKt) O(L+|Cl|+|Cl|lg(|Cl|)+k) UB-LSH2 O(mL) O(mLKt) O(L+2k) IB-LSH2 O(nL) O(nLKt) O(L+2k) m : number of users n : number of items L: number of hash tables K : number of hash functions t : time to evaluate a hash function C: Candidate user (or item) set ( |C| Lm / 2K or |C| Ln / 2K )

    Cl : Candidate user (or item) list ( | Cl | Lm / 2K or | Cl | Ln / 2K )

  • | Cl | Lm / 2K

    L = 5 m =16,042

    Candidate List (Cl) Prediction

    0

    10000

    20000

    30000

    40000

    50000

    1 2 3 4 5 6 7 8 9 10

    Number of Users

    Number of Hash Functions

    Cl m

    | Cl | Ln / 2K

    L = 5 n =17,454

    0

    10000

    20000

    30000

    40000

    50000

    1 2 3 4 5 6 7 8 9 10

    Number of Items

    Number of Hash Functions

    Cl n

  • Results Model Build

  • Results Prediction

    0.8

    1

    1.2

    1.4

    1.6

    1.8

    4 5 6 7 8 9 10 11 12 13

    MAE

    Number of Hash Functions

    UB-KNNIB-KNN

    UB-KNN-LSHIB-KNN-LSH

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    0.7

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    4 5 6 7 8 9 10 11 12 13

    MAE

    Number of Hash Functions

    UB-KNNIB-KNN

    UB-KNN-LSHIB-KNN-LSH

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    Movie Lens 1M Amazon Movies

  • 0

    2

    4

    6

    8

    10

    12

    14

    4 5 6 7 8 9 10 11 12 13

    Run Time(ms)

    Number of Hash Functions

    UB-KNN-LSHIB-KNN-LSH

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    4 5 6 7 8 9 10 11 12 13

    Run Time(ms)

    Number of Hash Functions

    UB-KNN-LSHIB-KNN-LSH

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    Movie Lens 1M Amazon Movies

    Results Prediction

  • 0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    4 5 6 7 8 9 10 11 12 13

    Run Time(ms.)

    Number of Hash Functions

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    4 5 6 7 8 9 10 11 12 13

    Run Time(ms.)

    Number of Hash Functions

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    Movie Lens 1M Amazon Movies

    Results Prediction

  • 0

    0.2

    0.4

    0.6

    0.8

    1

    4 5 6 7 8 9 10 11 12 13

    Prediction Coverage

    Number of Hash Functions

    UB-KNN-LSHIB-KNN-LSH

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    0

    0.2

    0.4

    0.6

    0.8

    1

    4 5 6 7 8 9 10 11 12 13

    Prediction Coverage

    Number of Hash Functions

    UB-KNN-LSHIB-KNN-LSH

    UB-LSH1UB-LSH2IB-LSH1IB-LSH2

    Movie Lens 1M Amazon Movies

    Results Prediction

  • 0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    Coverage - higher is better

    Runtime -lower is bet

Search related