Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Learning a Compressed Sensing Measurement Matrix via
Gradient UnrollingShanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2,
Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2
1University of Texas at Austin, 2Google Research
Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189
Learning a Compressed Sensing Measurement Matrix via
Gradient UnrollingShanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2,
Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2
1University of Texas at Austin, 2Google Research
Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189
Motivation
• Goal: Create good representations for sparse data
Motivation
• Goal: Create good representations for sparse data• Amazon employee dataset: 𝑑 = 15𝑘, nnz = 9• RCV1 text dataset: 𝑑 = 47𝑘, nnz = 76• Wiki multi-label dataset: 𝑑 = 31𝑘, nnz = 19
• eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels)
One-hot encoded categorical data+Text parts
Motivation
• Goal: Create good representations for sparse data• Amazon employee dataset: 𝑑 = 15𝑘, nnz = 9• RCV1 text dataset: 𝑑 = 47𝑘, nnz = 76• Wiki multi-label dataset: 𝑑 = 31𝑘, nnz = 19
• eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels)
• Unlike image/video data, there is no notion of spatial/time locality. No CNN • Reduce the dimensionality via a linear sketching/embeddingWant: Beyond sparsity, learn additional structure
One-hot encoded categorical data+Text parts
Representing vectors in low-dimension
• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask: Linear compression, • And Linear recovery • Best learned
measurement/reconstructionmatrices for l2 norm?
𝑥 ∈ ℝ/
𝑦 = 𝐴𝑥
𝑦 ∈ ℝ-(𝑚 < 𝑑)
?𝑥 ≈ 𝑥
Encode Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟
Representing vectors in low-dimension
• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • And Linear recovery • Best learned
measurement/reconstructionmatrices for l2 norm?• PCA
𝑥 ∈ ℝ/
𝑦 = 𝐴𝑥
𝑦 ∈ ℝ-(𝑚 < 𝑑)
?𝑥 ≈ 𝑥
Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟
Encode
Representing vectors in low-dimension
• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • And Linear recovery • Best learned
measurement/reconstructionmatrices for l2 norm?• PCA• But if x is sparse we can do
better
𝑥 ∈ ℝ/
𝑦 = 𝐴𝑥
𝑦 ∈ ℝ-(𝑚 < 𝑑)
?𝑥 ≈ 𝑥
Recover𝑙𝑖𝑛𝑒𝑎𝑟 𝑙𝑖𝑛𝑒𝑎𝑟
Encode
Compressed Sensing (Donoho; Candes et al.; …)
𝑥 ∈ ℝ/𝑦 = 𝐴𝑥 ∈ ℝ-
(𝑚 < 𝑑)
?𝑥 ≈ 𝑥
Recover
• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • Recovery by convex opt• ℓI-min, Lasso,...
• Near-perfect recovery for sparse vectors. • Provably for Gaussian random A.
𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦
ℓ𝟏-min𝑙𝑖𝑛𝑒𝑎𝑟Encode
Compressed Sensing (Donoho; Candes et al.; …)
𝑥 ∈ ℝ/𝑦 = 𝐴𝑥 ∈ ℝ-
(𝑚 < 𝑑)
?𝑥 ≈ 𝑥
Compress Recover
• 𝐴 ∈ ℝ-×/ Measurement matrix• If we ask Linear compression, • Recovery by convex opt• ℓI-min, Lasso,...
• Near-perfect recovery for sparse vectors. • Provably for Gaussian random A.
1. If our vectors aresparse +additional unknown structure
(e.g. one-hot encoded features, text+features, XML, etc)
2. Can we LEARN a measurement matrix A
3. Make it work well for convex opt decoder
Comparisons of the recovery performanceLearned measurements + ℓI-min decoding [our method]
Gaussian measurements+ model-based CoSaMP
Gaussian + ℓI-min-decoding
Fraction of exactly recovered points
Exact recovery:𝑥 − ?𝑥 U ≤ 10XIY
Number of measurements (m)
It has structure knowledge!
Learning a measurement matrix
• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/
𝑥\ ∈ ℝ/
𝑦 = 𝐴𝑥\
]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/
𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦
ℓI-min𝐴 ∈ ℝ-×/RecoverEncode
Learning a measurement matrix
• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/
𝑥\ ∈ ℝ/
𝑦 = 𝐴𝑥\
𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦
ℓI-min𝐴 ∈ ℝ-×/
Objective function:min ∑\_I[ 𝑥\ − 𝑓 𝐴, 𝐴𝑥\ U
U
𝐴 ∈ ℝ-×/
Recover
]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/
Encode
Learning a measurement matrix
• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/
𝑥\ ∈ ℝ/
𝑦 = 𝐴𝑥\
𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦
ℓI-min𝐴 ∈ ℝ-×/
Objective function:min ∑\_I[ 𝑥\ − 𝑓 𝐴, 𝐴𝑥\ U
U
𝐴 ∈ ℝ-×/
Problem:How to compute gradient w.r.t. 𝐴?
Recover
]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/
Encode
Learning a measurement matrix
• Training data: 𝑛 sparse vectors 𝑥I, 𝑥U, … , 𝑥[ ∈ ℝ/
𝑥\ ∈ ℝ/
𝑦 = 𝐴𝑥\Recover
𝑓(𝐴, 𝑦) ≔ argminOP 𝑥Q I s. t. 𝐴𝑥Q = 𝑦
ℓI-min𝐴 ∈ ℝ-×/
Objective function:min ∑\_I[ 𝑥\ − 𝑓 𝐴, 𝐴𝑥\ U
U
𝐴 ∈ ℝ-×/
Problem:How to compute gradient w.r.t. 𝐴?
Key idea:Replace 𝑓(𝐴, 𝑦) by a few steps of projected subgradient
]𝑥\ = 𝑓 𝐴, 𝐴𝑥\ ∈ ℝ/
Encode
ℓI-AE: a novel autoencoder architecture 𝑥
𝑦 = 𝐴𝑥
𝐴 𝐴%
𝑥(') = 𝐴%𝑦𝑧' = sign(𝑥 ' )
−𝛼' 𝐼 − 𝐴%𝐴 𝑧' 𝑥(1)
𝑥(')
BN
−𝛼% 𝐼 − 𝐴%𝐴 𝑧%𝑥(%4')
𝑥(%)
BN
𝑧% = sign(𝑥 % )
…
𝑥5 = ReLU(𝑥(%4'))
𝐈𝐧𝐩𝐮𝐭 𝐎𝐮𝐭𝐩𝐮𝐭
𝐄𝐧𝐜𝐨𝐝𝐞𝐫 𝐃𝐞𝐜𝐨𝐝𝐞𝐫
𝑥(`aI) = 𝑥(`) − 𝛼` 𝐼 − 𝐴d𝐴 sign(𝑥(`))One step of projected subgradient
Real sparse datasets
Fraction of exactly recovered test points
Our method performs the best!
[Our method]
2-layer
Test RMSE
2-layerd = 15k, nnz = 9 d = 31k, nnz = 19 d = 47k, nnz = 76
Number of measurements (m)
Summary
• Key idea: We learn a compressed sensing measurement matrix by unrolling the projected subgradient of ℓI-min decoder• Implemented as an autoencoder ℓI-AE• Compared 12 algorithms over 6 datasets (3 synthetic and 3 real)• Our method created perfect reconstruction with 1.1-3X fewer
measurements compared to previous state-of-the-art methods• Applied to Extreme multilabel classification, our method outperforms SLEEC (Bhatia et al., 2015)
Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189