10-405 Big ML - Carnegie Mellon School of Computer Sciencewcohen/10-405/sgd-for-mf.pdf · • The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share

10-405BigML

1

MatrixFactorization

2

WhatisMFandwhatcanyoudowithit?

3

Recoveringlatentfactorsinamatrix

m columns

v11 …

… …

vij

…

vnm

n ro

ws

4


K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

5

Whatisthisfor?

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

6

MFforcollaborativefiltering

7

Whatiscollaborativefiltering?


m movies

v11 …

… …

vij

…

vnm

V[i,j] = user i’s rating of movie j

n us

ers

9


m movies

n us

ers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~


10

11

12


m movies

n us

ers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~


13

MFforimagemodeling

14

15

MFforimages

10,000 pixels

1000

imag

es

1000 * 10,000,00

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 … … …

… …

vij

…

vnm

~

V[i,j] = pixel j in image i

2 prototypes

PC1

PC2

MFformodelingtext

17


m terms

n do

cum

ents

doc term matrix

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

V[i,j] = TFIDF score of term j in doc i

18


m terms

n do

cum

ents

doc term matrix

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~


19https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

• The Neatest Little Guide to Stock Market Investing

• Investing For Dummies, 4th Edition• The Little Book of Common Sense Investing:

The Only Way to Guarantee Your Fair Share of Stock Market Returns

• The Little Book of Value Investing• Value Investing: From Graham to Buffett and

Beyond• Rich Dad’s Guide to Investing: What the Rich

Invest in, That the Poor and the Middle Class Do Not!

• Investing in Real Estate, 5th Edition• Stock Investing For Dummies• Rich Dad’s Advisors: The ABC’s of Real

Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss


m terms

n do

cum

ents

doc term matrix

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~


20

estate … land

invest rich

dummy stock saving advice ...

Doc = weighted sum of topics

Investing for real estate

Rich Dad’s Advisor’s: The ABCs of Real

Estate Investment …

The little book of common

sense investing: …

Neatest Little Guide to Stock

Market Investing

MFvsotherlearningtasks

23

MFislikelinearregression

24

MFislikemultiple-outputmulti-variablelinearregression

25

𝑦K = 𝒙 N 𝒘K𝑦P = 𝒙 N 𝒘P

𝑦Q = 𝒙 N 𝒘Q

…

Multi-outputlinearregressionasMF

m weight vectors

n ex

ampl

es

x11 x12

x21 x22

.. ..

… …

xn1 yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

m outputs for each xiexamples

X

W

Y

output1output2

….



…

MFislikeclustering

27

k-means Clustering

centroids

28

Each point is in one cluster

Each cluster is a weighted average of points

k-meansasMF

cluster means

n ex

ampl

es

0 1

1 0

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

original data setindicators for r

clusters

Z

M

X

MFis“soft”clustering– eachexampleisaweightedsumofclusters

K * m

n *

K

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

30

HowdoyoudoMF?

31

talk pilfered from à …..

KDD 2011

32

33


m movies

n us

ers

m movies

x1 y1

x2 y2

.. ..

… …

xn yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~


r

W

H

V

34

35

36

37

MatrixfactorizationasSGD

step size why does this work?

38

MatrixfactorizationasSGD- whydoesthiswork?Here’sthekeyclaim:

39

Checkingtheclaim

Think for SGD for logistic regression• LR loss = compare y and ŷ = dot(w,x)• similar but now update w (user weights) and x (movie weight)

40

Whatlossfunctionsarepossible?

41

Whatlossfunctionsarepossible?

42

ALS = alternating least squares

43

limited memory quasi-Newton

talk pilfered from à …..

KDD 2011

44

45

iterative SGD, no mixing


param mixing

alternating least squares

IPM

MatrixfactorizationasSGD- whydoesthiswork?Here’sthekeyclaim:

46

47

48

49

H1 H2 H3W1 V

11

W2 V22

W3 V33

H1 H2 H3W1 V

12

W2 V23

W3 V31

H1 H2 H3W1 V

13

W2 V21

W3 V32

Node1

Node2

Node3

Strata1 Strata2 Strata3

Epoch1

50

iterative SGD, no mixing


param mixing

alternating least squares

IPM

51

52

53

Hadoopscalability

Hadoop process setup time starts

to dominate

54

MFislikelogisticregression

55

LinearregressionasMF

weight vectors

n ex

ampl

es

x11 x12

x21 x22

.. ..

… …

xn1 yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

training dataexamples

X

W

Y

output1output2

….



…

Logistic?regressionasMF

weight vectors

n ex

ampl

es

x11 x12

x21 x22

.. ..

… …

xn1 yn

a1 a2 .. … am

b1 b2 … … bmv11 …

… …

vij

…

vnm

~

training dataexamples

X

W

Y

output1output2

….

Vectorizinglogisticregression• ManyMLmethodscanberewrittenusingnothingbutvector-matrixoperations(“vectorizing”)

• Whydothis?–Simpler(onceyouunderstanditwell)–Faster(giventherightinfrastructure- e.g.,numpy,GPUs,…)

–Cansimplifyoptimization(morelater)

58

Vectorized minibatch logistic regression

• Computationwe’dliketovectorize:–Foreachxintheminibatch,compute

• Foreachfeaturej:updatewj using

59

Vectorizing logistic regression

• Computationwe’dliketoparallelize:–Foreachxintheminibatch Xbatch,compute

60

𝑋ghijk𝒘 = 𝑥KK ⋯ 𝑥K

n

⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q

n

𝑤K

⋮𝑤n

=𝒘 N 𝒙𝟏⋮

𝒘 N 𝒙𝑩


• Computationwe’dliketoparallelize:–Foreachxintheminibatch Xbatch,compute

61

𝒘 N 𝒙𝟏⋮

𝒘 N 𝒙𝑩+ 1

innumpyifMisamatrixM+1doesthe“rightthing”

sodoesM.exp(),M.dot(),M.reciprocal(),…


• Computationwe’dliketoparallelize:–Foreachxintheminibatch,compute

62

def logistic(X):return(X.exp()+1).reciprocal()p=logistic(Xb.dot(w))#Brows,1column

Binary to softmax logistic regression

63

𝑋ghijk𝒘 = 𝑥KK ⋯ 𝑥K

n

⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q

n

𝑤K

⋮𝑤n

=𝒘 N 𝒙𝟏⋮

𝒘 N 𝒙𝑩

Binary to softmax logistic regression

64

𝑝x ≡exp(𝒙 N 𝒘x)

∑ exp(𝒙 N�x| 𝒘x|)

X𝑊 = 𝑥KK ⋯ 𝑥K

n

⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q

n

𝑤K

⋮𝑤n

=𝒘 N 𝒙𝟏⋮

𝒘 N 𝒙𝑩

XW= 𝑥KK ⋯ 𝑥K

n

⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q

n

𝑤KxK … 𝑤K

x~

⋮ ⋱ ⋮𝑤nxK … 𝑤n

x~=

𝒘xK N 𝒙K … 𝒘x~ N 𝒙K⋮ ⋱ ⋮

𝒘xK N 𝒙q … 𝒘x~ N 𝒙q

65

http://minpy.readthedocs.io/en/latest/get-started/logistic_regression.html

Matrixmultiply,;thenexponentiate

component-wise

Sumthecolumnstogetthedenominator;

keepdim=Truemeans…

𝑝x ≡exp(𝒙 N 𝒘x)

∑ exp(𝒙 N�x| 𝒘x|)

XW= 𝑥KK ⋯ 𝑥K

n

⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q

n

𝑤KxK … 𝑤K

x~

⋮ ⋱ ⋮𝑤nxK … 𝑤n

x~=

𝒘xK N 𝒙K … 𝒘x~ N 𝒙K⋮ ⋱ ⋮

𝒘xK N 𝒙q … 𝒘x~ N 𝒙q

… thatthislinewillworkcorrectlyeventhough’a’

and‘a_sum’havedifferentshapes

prob willhaveBrowsandKcolumns,andeach

rowwillsumto1

66


67


68


Erroroneachexamplexinbatchandeachclassy

pythonbug:shouldbex.T (transpose)

Thegradientstep!

x.Tdy= 𝑥KK ⋯ 𝑥qK⋮ ⋱ ⋮𝑥Kn ⋯ 𝑥q

nN𝑑𝑦�K

xK … 𝑑𝑦�Kx~

⋮ ⋱ ⋮𝑑𝑦�q

xK … 𝑑𝑦�qx~


• ManyMLmethodscanberewrittenusingnothingbutvector-matrixoperations(“vectorizing”)

• Whydothis?–Simpler(onceyouunderstanditwell)–Faster(giventherightinfrastructure- e.g.,numpy,GPUs,…)

–Cansimplifyoptimization(morelater)

69

Documents

10-405 Big ML - Carnegie Mellon School of Computer Sciencewcohen/10-405/sgd-for-mf.pdf · • The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share