Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
10-405BigML
1
MatrixFactorization
2
WhatisMFandwhatcanyoudowithit?
3
Recoveringlatentfactorsinamatrix
m columns
v11 …
… …
vij
…
vnm
n ro
ws
4
Recoveringlatentfactorsinamatrix
K * m
n *
K
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
5
Whatisthisfor?
K * m
n *
K
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
6
MFforcollaborativefiltering
7
Whatiscollaborativefiltering?
Recoveringlatentfactorsinamatrix
m movies
v11 …
… …
vij
…
vnm
V[i,j] = user i’s rating of movie j
n us
ers
9
Recoveringlatentfactorsinamatrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
10
11
12
Recoveringlatentfactorsinamatrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
13
MFforimagemodeling
14
15
MFforimages
10,000 pixels
1000
imag
es
1000 * 10,000,00
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 … … …
… …
vij
…
vnm
~
V[i,j] = pixel j in image i
2 prototypes
PC1
PC2
MFformodelingtext
17
Recoveringlatentfactorsinamatrix
m terms
n do
cum
ents
doc term matrix
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
V[i,j] = TFIDF score of term j in doc i
18
Recoveringlatentfactorsinamatrix
m terms
n do
cum
ents
doc term matrix
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
V[i,j] = TFIDF score of term j in doc i
19https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
• The Neatest Little Guide to Stock Market Investing
• Investing For Dummies, 4th Edition• The Little Book of Common Sense Investing:
The Only Way to Guarantee Your Fair Share of Stock Market Returns
• The Little Book of Value Investing• Value Investing: From Graham to Buffett and
Beyond• Rich Dad’s Guide to Investing: What the Rich
Invest in, That the Poor and the Middle Class Do Not!
• Investing in Real Estate, 5th Edition• Stock Investing For Dummies• Rich Dad’s Advisors: The ABC’s of Real
Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss
Recoveringlatentfactorsinamatrix
m terms
n do
cum
ents
doc term matrix
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
V[i,j] = TFIDF score of term j in doc i
20
estate … land
invest rich
dummy stock saving advice ...
Doc = weighted sum of topics
Investing for real estate
Rich Dad’s Advisor’s: The ABCs of Real
Estate Investment …
The little book of common
sense investing: …
Neatest Little Guide to Stock
Market Investing
MFvsotherlearningtasks
23
MFislikelinearregression
24
MFislikemultiple-outputmulti-variablelinearregression
25
𝑦K = 𝒙 N 𝒘K𝑦P = 𝒙 N 𝒘P
𝑦Q = 𝒙 N 𝒘Q
…
Multi-outputlinearregressionasMF
m weight vectors
n ex
ampl
es
x11 x12
x21 x22
.. ..
… …
xn1 yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
m outputs for each xiexamples
X
W
Y
output1output2
….
𝑦K = 𝒙 N 𝒘K𝑦P = 𝒙 N 𝒘P
𝑦Q = 𝒙 N 𝒘Q
…
MFislikeclustering
27
k-means Clustering
centroids
28
Each point is in one cluster
Each cluster is a weighted average of points
k-meansasMF
cluster means
n ex
ampl
es
0 1
1 0
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
original data setindicators for r
clusters
Z
M
X
MFis“soft”clustering– eachexampleisaweightedsumofclusters
K * m
n *
K
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
30
HowdoyoudoMF?
31
talk pilfered from à …..
KDD 2011
32
33
Recoveringlatentfactorsinamatrix
m movies
n us
ers
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
34
35
36
37
MatrixfactorizationasSGD
step size why does this work?
38
MatrixfactorizationasSGD- whydoesthiswork?Here’sthekeyclaim:
39
Checkingtheclaim
Think for SGD for logistic regression• LR loss = compare y and ŷ = dot(w,x)• similar but now update w (user weights) and x (movie weight)
40
Whatlossfunctionsarepossible?
41
Whatlossfunctionsarepossible?
42
ALS = alternating least squares
43
limited memory quasi-Newton
talk pilfered from à …..
KDD 2011
44
45
iterative SGD, no mixing
limited memory quasi-Newton
param mixing
alternating least squares
IPM
MatrixfactorizationasSGD- whydoesthiswork?Here’sthekeyclaim:
46
47
48
49
H1 H2 H3W1 V
11
W2 V22
W3 V33
H1 H2 H3W1 V
12
W2 V23
W3 V31
H1 H2 H3W1 V
13
W2 V21
W3 V32
Node1
Node2
Node3
Strata1 Strata2 Strata3
Epoch1
50
iterative SGD, no mixing
limited memory quasi-Newton
param mixing
alternating least squares
IPM
51
52
53
Hadoopscalability
Hadoop process setup time starts
to dominate
54
MFislikelogisticregression
55
LinearregressionasMF
weight vectors
n ex
ampl
es
x11 x12
x21 x22
.. ..
… …
xn1 yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
training dataexamples
X
W
Y
output1output2
….
𝑦K = 𝒙 N 𝒘K𝑦P = 𝒙 N 𝒘P
𝑦Q = 𝒙 N 𝒘Q
…
Logistic?regressionasMF
weight vectors
n ex
ampl
es
x11 x12
x21 x22
.. ..
… …
xn1 yn
a1 a2 .. … am
b1 b2 … … bmv11 …
… …
vij
…
vnm
~
training dataexamples
X
W
Y
output1output2
….
Vectorizinglogisticregression• ManyMLmethodscanberewrittenusingnothingbutvector-matrixoperations(“vectorizing”)
• Whydothis?–Simpler(onceyouunderstanditwell)–Faster(giventherightinfrastructure- e.g.,numpy,GPUs,…)
–Cansimplifyoptimization(morelater)
58
Vectorized minibatch logistic regression
• Computationwe’dliketovectorize:–Foreachxintheminibatch,compute
• Foreachfeaturej:updatewj using
59
Vectorizing logistic regression
• Computationwe’dliketoparallelize:–Foreachxintheminibatch Xbatch,compute
60
𝑋ghijk𝒘 = 𝑥KK ⋯ 𝑥K
n
⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q
n
𝑤K
⋮𝑤n
=𝒘 N 𝒙𝟏⋮
𝒘 N 𝒙𝑩
Vectorizing logistic regression
• Computationwe’dliketoparallelize:–Foreachxintheminibatch Xbatch,compute
61
𝒘 N 𝒙𝟏⋮
𝒘 N 𝒙𝑩+ 1
innumpyifMisamatrixM+1doesthe“rightthing”
sodoesM.exp(),M.dot(),M.reciprocal(),…
Vectorizing logistic regression
• Computationwe’dliketoparallelize:–Foreachxintheminibatch,compute
62
def logistic(X):return(X.exp()+1).reciprocal()p=logistic(Xb.dot(w))#Brows,1column
Binary to softmax logistic regression
63
𝑋ghijk𝒘 = 𝑥KK ⋯ 𝑥K
n
⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q
n
𝑤K
⋮𝑤n
=𝒘 N 𝒙𝟏⋮
𝒘 N 𝒙𝑩
Binary to softmax logistic regression
64
𝑝x ≡exp(𝒙 N 𝒘x)
∑ exp(𝒙 N�x| 𝒘x|)
X𝑊 = 𝑥KK ⋯ 𝑥K
n
⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q
n
𝑤K
⋮𝑤n
=𝒘 N 𝒙𝟏⋮
𝒘 N 𝒙𝑩
XW= 𝑥KK ⋯ 𝑥K
n
⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q
n
𝑤KxK … 𝑤K
x~
⋮ ⋱ ⋮𝑤nxK … 𝑤n
x~=
𝒘xK N 𝒙K … 𝒘x~ N 𝒙K⋮ ⋱ ⋮
𝒘xK N 𝒙q … 𝒘x~ N 𝒙q
65
http://minpy.readthedocs.io/en/latest/get-started/logistic_regression.html
Matrixmultiply,;thenexponentiate
component-wise
Sumthecolumnstogetthedenominator;
keepdim=Truemeans…
𝑝x ≡exp(𝒙 N 𝒘x)
∑ exp(𝒙 N�x| 𝒘x|)
XW= 𝑥KK ⋯ 𝑥K
n
⋮ ⋱ ⋮𝑥qK ⋯ 𝑥q
n
𝑤KxK … 𝑤K
x~
⋮ ⋱ ⋮𝑤nxK … 𝑤n
x~=
𝒘xK N 𝒙K … 𝒘x~ N 𝒙K⋮ ⋱ ⋮
𝒘xK N 𝒙q … 𝒘x~ N 𝒙q
… thatthislinewillworkcorrectlyeventhough’a’
and‘a_sum’havedifferentshapes
prob willhaveBrowsandKcolumns,andeach
rowwillsumto1
66
http://minpy.readthedocs.io/en/latest/get-started/logistic_regression.html
67
http://minpy.readthedocs.io/en/latest/get-started/logistic_regression.html
68
http://minpy.readthedocs.io/en/latest/get-started/logistic_regression.html
Erroroneachexamplexinbatchandeachclassy
pythonbug:shouldbex.T (transpose)
Thegradientstep!
x.Tdy= 𝑥KK ⋯ 𝑥qK⋮ ⋱ ⋮𝑥Kn ⋯ 𝑥q
nN𝑑𝑦�K
xK … 𝑑𝑦�Kx~
⋮ ⋱ ⋮𝑑𝑦�q
xK … 𝑑𝑦�qx~
Vectorizing logistic regression
• ManyMLmethodscanberewrittenusingnothingbutvector-matrixoperations(“vectorizing”)
• Whydothis?–Simpler(onceyouunderstanditwell)–Faster(giventherightinfrastructure- e.g.,numpy,GPUs,…)
–Cansimplifyoptimization(morelater)
69