Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Greedy and Randomized Projection Methods
Jamie Haddock
UCLA CAM Colloquium,
October 30, 2019
Computational and Applied Mathematics
UCLA
joint with Jesús A. De Loera, Deanna Needell, and Anna Ma
https://arxiv.org/abs/1802.03126 (BIT Numerical Mathematics 2019)
https://arxiv.org/abs/1605.01418 (SISC 2017)
1
BIG Data
2
BIG Data
2
Motivation: Average Consensus
0By Everaldo Coelho (YellowIcon); - All Crystal icons were posted by the author as LGPL on kde-look;, LGPL,
https://commons.wikimedia.org/w/index.php?curid=7288951. 3
Motivation: Average Consensus
0By Everaldo Coelho (YellowIcon); - All Crystal icons were posted by the author as LGPL on kde-look;, LGPL,
https://commons.wikimedia.org/w/index.php?curid=7288951. 3
Motivation: Average Consensus
0By Everaldo Coelho (YellowIcon); - All Crystal icons were posted by the author as LGPL on kde-look;, LGPL,
https://commons.wikimedia.org/w/index.php?curid=7288951. 3
Motivation: Average Consensus
0By Everaldo Coelho (YellowIcon); - All Crystal icons were posted by the author as LGPL on kde-look;, LGPL,
https://commons.wikimedia.org/w/index.php?curid=7288951.
3
Motivation: Average Consensus
AC solution:
argminx1
2‖x− c‖2 s.t.
1 0 −1 0 . . . 00 1 −1 0 . . . 1...
......
.... . .
...
0 0 1 0 . . . −1
x = 0
Gossip Method: Begin with x0 = c ∈ R|V|:1. Choose eij ∈ E .
2. Define x(i)k = x
(j)k =
x(i)k−1+x
(j)k−1
2 .
3. Repeat.
Applications:
. clock synchronization
. PageRank
. opinion formation
. blockchain technology
4
Motivation: Average Consensus
AC solution:
argminx1
2‖x− c‖2 s.t.
1 0 −1 0 . . . 00 1 −1 0 . . . 1...
......
.... . .
...
0 0 1 0 . . . −1
x = 0
Gossip Method: Begin with x0 = c ∈ R|V|:1. Choose eij ∈ E .
2. Define x(i)k = x
(j)k =
x(i)k−1+x
(j)k−1
2 .
3. Repeat.
Applications:
. clock synchronization
. PageRank
. opinion formation
. blockchain technology
4
Motivation: Average Consensus
AC solution:
argminx1
2‖x− c‖2 s.t.
1 0 −1 0 . . . 00 1 −1 0 . . . 1...
......
.... . .
...
0 0 1 0 . . . −1
x = 0
Gossip Method: Begin with x0 = c ∈ R|V|:1. Choose eij ∈ E .
2. Define x(i)k = x
(j)k =
x(i)k−1+x
(j)k−1
2 .
3. Repeat.
Applications:
. clock synchronization
. PageRank
. opinion formation
. blockchain technology
4
Motivation: Average Consensus
AC solution:
argminx1
2‖x− c‖2 s.t.
1 0 −1 0 . . . 00 1 −1 0 . . . 1...
......
.... . .
...
0 0 1 0 . . . −1
x = 0
Gossip Method: Begin with x0 = c ∈ R|V|:1. Choose eij ∈ E .
2. Define x(i)k = x
(j)k =
x(i)k−1+x
(j)k−1
2 .
3. Repeat.
Applications:
. clock synchronization
. PageRank
. opinion formation
. blockchain technology
4
Motivation: Average Consensus
AC solution:
argminx1
2‖x− c‖2 s.t.
1 0 −1 0 . . . 00 1 −1 0 . . . 1...
......
.... . .
...
0 0 1 0 . . . −1
x = 0
Gossip Method: Begin with x0 = c ∈ R|V|:1. Choose eij ∈ E .
2. Define x(i)k = x
(j)k =
x(i)k−1+x
(j)k−1
2 .
3. Repeat.
Applications:
. clock synchronization
. PageRank
. opinion formation
. blockchain technology
4
Motivation: Average Consensus
AC solution:
argminx1
2‖x− c‖2 s.t.
1 0 −1 0 . . . 00 1 −1 0 . . . 1...
......
.... . .
...
0 0 1 0 . . . −1
x = 0
Gossip Method: Begin with x0 = c ∈ R|V|:1. Choose eij ∈ E .
2. Define x(i)k = x
(j)k =
x(i)k−1+x
(j)k−1
2 .
3. Repeat.
Applications:
. clock synchronization
. PageRank
. opinion formation
. blockchain technology
4
Motivation: Average Consensus
AC solution:
argminx1
2‖x− c‖2 s.t.
1 0 −1 0 . . . 00 1 −1 0 . . . 1...
......
.... . .
...
0 0 1 0 . . . −1
x = 0
Gossip Method: Begin with x0 = c ∈ R|V|:1. Choose eij ∈ E .
2. Define x(i)k = x
(j)k =
x(i)k−1+x
(j)k−1
2 .
3. Repeat.
Applications:
. clock synchronization
. PageRank
. opinion formation
. blockchain technology
4
General Setup
We are interested in solving highly overdetermined systems of equations
(or inequalities), Ax = b (Ax ≤ b), where A ∈ Rm×n, b ∈ Rm andm� n. Rows are denoted aTi .
5
Iterative Projection Methods
If {x ∈ Rn : Ax = b} is nonempty, these methods construct anapproximation to a solution:
1. Randomized Kaczmarz Method
2. Motzkin’s Method
3. Sampling Kaczmarz-Motzkin Methods (SKM)
Applications:
1. Tomography (Algebraic Reconstruction Technique)
2. Linear programming
3. Average consensus (greedy gossip with eavesdropping)
6
Iterative Projection Methods
If {x ∈ Rn : Ax = b} is nonempty, these methods construct anapproximation to a solution:
1. Randomized Kaczmarz Method
2. Motzkin’s Method
3. Sampling Kaczmarz-Motzkin Methods (SKM)
Applications:
1. Tomography (Algebraic Reconstruction Technique)
2. Linear programming
3. Average consensus (greedy gossip with eavesdropping)
6
Iterative Projection Methods
If {x ∈ Rn : Ax = b} is nonempty, these methods construct anapproximation to a solution:
1. Randomized Kaczmarz Method
2. Motzkin’s Method
3. Sampling Kaczmarz-Motzkin Methods (SKM)
Applications:
1. Tomography (Algebraic Reconstruction Technique)
2. Linear programming
3. Average consensus (greedy gossip with eavesdropping)
6
Kaczmarz Method
x0
Given x0 ∈ Rn:1. Choose ik ∈ [m] with probability
‖aik ‖2
‖A‖2F.
2. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
3. Repeat.
7
Kaczmarz Method
x0
x1
Given x0 ∈ Rn:1. Choose ik ∈ [m] with probability
‖aik ‖2
‖A‖2F.
2. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
3. Repeat.
7
Kaczmarz Method
x0
x1
x2
Given x0 ∈ Rn:1. Choose ik ∈ [m] with probability
‖aik ‖2
‖A‖2F.
2. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
3. Repeat.
7
Kaczmarz Method
x0
x1
x2
x3
Given x0 ∈ Rn:1. Choose ik ∈ [m] with probability
‖aik ‖2
‖A‖2F.
2. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
3. Repeat.
7
Motzkin’s Method
x0
Given x0 ∈ Rn:1. Choose ik ∈ [m] as
ik := argmaxi∈[m]
|aTi xk−1 − bi |.
2. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
3. Repeat. 8
Motzkin’s Method
x0
x1
Given x0 ∈ Rn:1. Choose ik ∈ [m] as
ik := argmaxi∈[m]
|aTi xk−1 − bi |.
2. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
3. Repeat. 8
Motzkin’s Method
x0
x1x2
Given x0 ∈ Rn:1. Choose ik ∈ [m] as
ik := argmaxi∈[m]
|aTi xk−1 − bi |.
2. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
3. Repeat. 8
Our Hybrid Method (SKM)
x0
Given x0 ∈ Rn:1. Choose τk ⊂ [m] to be a
sample of size β constraints
chosen uniformly at random
among the rows of A.
2. From the β rows, choose
ik := argmaxi∈τk
|aTi xk−1 − bi |.
3. Define
xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
4. Repeat.9
Our Hybrid Method (SKM)
x0
x1
Given x0 ∈ Rn:1. Choose τk ⊂ [m] to be a
sample of size β constraints
chosen uniformly at random
among the rows of A.
2. From the β rows, choose
ik := argmaxi∈τk
|aTi xk−1 − bi |.
3. Define
xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
4. Repeat.9
Our Hybrid Method (SKM)
x0
x1
x2
Given x0 ∈ Rn:1. Choose τk ⊂ [m] to be a
sample of size β constraints
chosen uniformly at random
among the rows of A.
2. From the β rows, choose
ik := argmaxi∈τk
|aTi xk−1 − bi |.
3. Define
xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
4. Repeat.9
Glimpse of HUGE Body of Literature
RK: [Strohmer-Vershynin ’09], [Needell-Srebro-Ward ’16]
Greedy: [Censor ’81], [Nutini et al ’16], [Bai-Wu ’18], [Du-Gao ’19]
Accel.: [Hanke-Niethammer ’90], [Liu-Wright ’16], [Morshed-Islam ’19]
Block: [Popa et al ’12], [Needell-Tropp ’14], [Needell-Zhao-Zouzias ’15],
Sketching: [Gower-Richtarik ’15], [Needell-Rebrova ’19]
Phase retrieval: [Tan-Vershynin ’17], [Jeong-Güntürk ’17]
LP: [Motzkin-Schoenberg ’54], [Agmon ’54], [Goffin ’80], [Chubanov ’12]
10
Experimental Convergence
. β: sample size
. A is 50000× 100 Gaussian matrix, consistent system
. ‘faster’ convergence for larger sample size
11
Experimental Convergence
. β: sample size
. A is 50000× 100 Gaussian matrix, consistent system
. ‘faster’ convergence for larger sample size
11
Experimental Convergence
. β: sample size
. A is 50000× 100 Gaussian matrix, consistent system
. ‘faster’ convergence for larger sample size
11
Convergence Rates
Below are the convergence rates for the methods on a system, Ax = b,
which is consistent with unique solution x, whose rows have been
normalized to have unit norm.
. RK (Strohmer - Vershynin ’09):
E||xk − x||22≤(
1− σ2min(A)
m
)k||x0 − x||22
. MM (Agmon ’54):
‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
. SKM (DeLoera - H. - Needell ’17):
E‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
Why are these all the same?
12
Convergence Rates
Below are the convergence rates for the methods on a system, Ax = b,
which is consistent with unique solution x, whose rows have been
normalized to have unit norm.
. RK (Strohmer - Vershynin ’09):
E||xk − x||22≤(
1− σ2min(A)
m
)k||x0 − x||22
. MM (Agmon ’54):
‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
. SKM (DeLoera - H. - Needell ’17):
E‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
Why are these all the same?
12
Convergence Rates
Below are the convergence rates for the methods on a system, Ax = b,
which is consistent with unique solution x, whose rows have been
normalized to have unit norm.
. RK (Strohmer - Vershynin ’09):
E||xk − x||22≤(
1− σ2min(A)
m
)k||x0 − x||22
. MM (Agmon ’54):
‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
. SKM (DeLoera - H. - Needell ’17):
E‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
Why are these all the same?
12
Convergence Rates
Below are the convergence rates for the methods on a system, Ax = b,
which is consistent with unique solution x, whose rows have been
normalized to have unit norm.
. RK (Strohmer - Vershynin ’09):
E||xk − x||22≤(
1− σ2min(A)
m
)k||x0 − x||22
. MM (Agmon ’54):
‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
. SKM (DeLoera - H. - Needell ’17):
E‖xk − x‖22≤(
1− σ2min(A)
m
)k‖x0 − x‖22
Why are these all the same?12
A Pathological Example
Because.
x0
13
Structure of the Residual
Several works have used sparsity of the residual to improve the
convergence rate of greedy methods.
[De Loera, H., Needell ’17], [Bai, Wu ’18], [Du, Gao ’19]
However, not much sparsity can be expected in most cases. Instead, we’d
like to use dynamic range of the residual to guarantee faster convergence.
γk :=‖Axk − Ax‖2
‖Axk − Ax‖2∞
14
Structure of the Residual
Several works have used sparsity of the residual to improve the
convergence rate of greedy methods.
[De Loera, H., Needell ’17], [Bai, Wu ’18], [Du, Gao ’19]
However, not much sparsity can be expected in most cases. Instead, we’d
like to use dynamic range of the residual to guarantee faster convergence.
γk :=‖Axk − Ax‖2
‖Axk − Ax‖2∞
14
An Accelerated Convergence Rate
Theorem (H. - Needell ’19)
Let x denote the solution of the consistent, normalized system Ax = b.
Motzkin’s method exhibits the (possibly highly accelerated) convergence
rate:
‖xT − x‖2≤T−1∏k=0
(1− σ
2min(A)
4γk
)· ‖x0 − x‖2
Here γk bounds the dynamic range of the kth residual, γk :=‖Axk−Ax‖2‖Axk−Ax‖2∞
.
. improvement over previous result when 4γk < m
15
Netlib LP Systems
200 400 600 800
Iterations
0
2
4
6
8
10
||x
k-x
||2
1013
Motzkin
RK
(1 - min
2/4
k)||x
k-1-x||
2
(1-min
2/m)
k||x
0-x||
2
200 400 600 800 1000 1200
Iterations
0
0.5
1
1.5
2
2.5
||x
k-x
||2
1012
Motzkin
RK
(1 - min
2/4
k)||x
k-1-x||
2
(1-min
2/m)
k||x
0-x||
2
200 400 600 800 1000 1200 1400 1600
Iterations
0
0.5
1
1.5
2
2.5
||x
k-x
||2
1012
Motzkin
RK
(1 - min
2/4
k)||x
k-1-x||
2
(1-min
2/m)
k||x
0-x||
2
100 200 300 400 500 600
Iterations
0
0.5
1
1.5
2
2.5
3
3.5
||x
k-x
||2
104
Motzkin
RK
(1 - min
2/4
k)||x
k-1-x||
2
(1-min
2/m)
k||x
0-x||
2
16
Extending to SKM
Corollary (H. - Ma 2019+)
Let A be normalized so ‖ai‖2= 1 for all rows i = 1, ...,m. If the systemAx = b is consistent with the unique solution x∗ then the SKM method
converges at least linearly in expectation and the rate depends on the
dynamic range of the random sample of rows of A, τj . Precisely, in the
j + 1st iteration of SKM, we have
Eτj‖xj+1 − x∗‖22≤(
1− βσ2min(A)
γjm
)‖xj − x∗‖22
where γj =
∑τj∈([m]β )
‖Aτj xj−bτj ‖22∑
τj∈([m]β )‖Aτj xj−bτj ‖2∞
.
17
Extending to SKM
. A is 50000× 100 Gaussian matrix, consistent system
. bound uses dynamic range of sample of β rows
18
What can we say about γj?
1 ≤ γj ≤ β
↗ ↖Axk − b = ei Axk − b = c1“Best” case “Worst” case
Best Case Worst Case Previous Best Previous Worst
MM 1− σ2min(A)1− σ
2min(A)m
1− σ2min(A)
4
1− σ2min(A)mSKM 1−
βσ2min(A)m 1− σ
2min(A)mRK 1− σ
2min(A)m
Table 1: Contraction terms α such that Eτk ‖ek‖2≤ α‖ek−1‖2.
Nervous? γk ≥ βmσ2min(A) when A is row-normalized
19
What can we say about γj?
1 ≤ γj ≤ β↗ ↖
Axk − b = ei Axk − b = c1
“Best” case “Worst” case
Best Case Worst Case Previous Best Previous Worst
MM 1− σ2min(A)1− σ
2min(A)m
1− σ2min(A)
4
1− σ2min(A)mSKM 1−
βσ2min(A)m 1− σ
2min(A)mRK 1− σ
2min(A)m
Table 1: Contraction terms α such that Eτk ‖ek‖2≤ α‖ek−1‖2.
Nervous? γk ≥ βmσ2min(A) when A is row-normalized
19
What can we say about γj?
1 ≤ γj ≤ β↗ ↖
Axk − b = ei Axk − b = c1“Best” case “Worst” case
Best Case Worst Case Previous Best Previous Worst
MM 1− σ2min(A)1− σ
2min(A)m
1− σ2min(A)
4
1− σ2min(A)mSKM 1−
βσ2min(A)m 1− σ
2min(A)mRK 1− σ
2min(A)m
Table 1: Contraction terms α such that Eτk ‖ek‖2≤ α‖ek−1‖2.
Nervous? γk ≥ βmσ2min(A) when A is row-normalized
19
What can we say about γj?
1 ≤ γj ≤ β↗ ↖
Axk − b = ei Axk − b = c1“Best” case “Worst” case
Best Case Worst Case Previous Best Previous Worst
MM 1− σ2min(A)1− σ
2min(A)m
1− σ2min(A)
4
1− σ2min(A)mSKM 1−
βσ2min(A)m 1− σ
2min(A)mRK 1− σ
2min(A)m
Table 1: Contraction terms α such that Eτk ‖ek‖2≤ α‖ek−1‖2.
Nervous? γk ≥ βmσ2min(A) when A is row-normalized
19
What can we say about γj?
1 ≤ γj ≤ β↗ ↖
Axk − b = ei Axk − b = c1“Best” case “Worst” case
Best Case Worst Case Previous Best Previous Worst
MM 1− σ2min(A)1− σ
2min(A)m
1− σ2min(A)
4
1− σ2min(A)mSKM 1−
βσ2min(A)m 1− σ
2min(A)mRK 1− σ
2min(A)m
Table 1: Contraction terms α such that Eτk ‖ek‖2≤ α‖ek−1‖2.
Nervous?
γk ≥ βmσ2min(A) when A is row-normalized
19
What can we say about γj?
1 ≤ γj ≤ β↗ ↖
Axk − b = ei Axk − b = c1“Best” case “Worst” case
Best Case Worst Case Previous Best Previous Worst
MM 1− σ2min(A)1− σ
2min(A)m
1− σ2min(A)
4
1− σ2min(A)mSKM 1−
βσ2min(A)m 1− σ
2min(A)mRK 1− σ
2min(A)m
Table 1: Contraction terms α such that Eτk ‖ek‖2≤ α‖ek−1‖2.
Nervous? γk ≥ βmσ2min(A) when A is row-normalized
19
γk : Gaussian systems
γk .nβ
log β
20
γk : Gaussian systems
γk .nβ
log β
20
Gaussian Convergence
. A is 50000× 100 Gaussian matrix, consistent system
21
γk : AC systems
γk ≤β∑
eij∈E(x(i)k − x
(j)k )
2
|E|(x (βi )k − x(βj )k )
2
βi , βj : nodes connected by the βth smallest residual
22
Generalizing the Convergence Result
. can immediately generalize to varying β (SKM with βk)
. to generalize to non-normalized A, we need
• a sampling distribution that depends upon ‖ai‖2
• a sampling distribution that depends upon xk
Generalized SKM sampling distribution: px :(
[m]βk
)→ [0, 1)
• t(τ, x) = argmaxt∈τ (atx− bt)2
greedy subresidual choice
• px(τk) =‖at(τk ,x)‖
2∑τ∈([m]βk)
‖at(τ,x)‖2
proportional to norm of selected row
23
Generalizing the Convergence Result
. can immediately generalize to varying β (SKM with βk)
. to generalize to non-normalized A, we need
• a sampling distribution that depends upon ‖ai‖2
• a sampling distribution that depends upon xk
Generalized SKM sampling distribution: px :(
[m]βk
)→ [0, 1)
• t(τ, x) = argmaxt∈τ (atx− bt)2
greedy subresidual choice
• px(τk) =‖at(τk ,x)‖
2∑τ∈([m]βk)
‖at(τ,x)‖2
proportional to norm of selected row
23
Generalizing the Convergence Result
. can immediately generalize to varying β (SKM with βk)
. to generalize to non-normalized A, we need
• a sampling distribution that depends upon ‖ai‖2
• a sampling distribution that depends upon xk
Generalized SKM sampling distribution: px :(
[m]βk
)→ [0, 1)
• t(τ, x) = argmaxt∈τ (atx− bt)2
greedy subresidual choice
• px(τk) =‖at(τk ,x)‖
2∑τ∈([m]βk)
‖at(τ,x)‖2
proportional to norm of selected row
23
Generalizing the Convergence Result
. can immediately generalize to varying β (SKM with βk)
. to generalize to non-normalized A, we need
• a sampling distribution that depends upon ‖ai‖2
• a sampling distribution that depends upon xk
Generalized SKM sampling distribution: px :(
[m]βk
)→ [0, 1)
• t(τ, x) = argmaxt∈τ (atx− bt)2
greedy subresidual choice
• px(τk) =‖at(τk ,x)‖
2∑τ∈([m]βk)
‖at(τ,x)‖2
proportional to norm of selected row
23
Generalizing the Convergence Result
. can immediately generalize to varying β (SKM with βk)
. to generalize to non-normalized A, we need
• a sampling distribution that depends upon ‖ai‖2
• a sampling distribution that depends upon xk
Generalized SKM sampling distribution: px :(
[m]βk
)→ [0, 1)
• t(τ, x) = argmaxt∈τ (atx− bt)2
greedy subresidual choice
• px(τk) =‖at(τk ,x)‖
2∑τ∈([m]βk)
‖at(τ,x)‖2
proportional to norm of selected row
23
Generalizing the Convergence Result
. can immediately generalize to varying β (SKM with βk)
. to generalize to non-normalized A, we need
• a sampling distribution that depends upon ‖ai‖2
• a sampling distribution that depends upon xk
Generalized SKM sampling distribution: px :(
[m]βk
)→ [0, 1)
• t(τ, x) = argmaxt∈τ (atx− bt)2
greedy subresidual choice
• px(τk) =‖at(τk ,x)‖
2∑τ∈([m]βk)
‖at(τ,x)‖2
proportional to norm of selected row
23
Generalizing the Convergence Result
. can immediately generalize to varying β (SKM with βk)
. to generalize to non-normalized A, we need
• a sampling distribution that depends upon ‖ai‖2
• a sampling distribution that depends upon xk
Generalized SKM sampling distribution: px :(
[m]βk
)→ [0, 1)
• t(τ, x) = argmaxt∈τ (atx− bt)2
greedy subresidual choice
• px(τk) =‖at(τk ,x)‖
2∑τ∈([m]βk)
‖at(τ,x)‖2
proportional to norm of selected row
23
Generalized SKM
Given x0 ∈ Rn:
1. Choose τk ∈(
[m]βk
)according to pxk−1 .
2. Choose ik := t(τk , xk−1).
3. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
4. Repeat.
. If βk = 1, this is the distribution in [Strohmer, Vershynin ’09].
. If ‖ai‖2= 1, this is the uniform distribution over(
[m]βk
).
24
Generalized SKM
Given x0 ∈ Rn:
1. Choose τk ∈(
[m]βk
)according to pxk−1 .
2. Choose ik := t(τk , xk−1).
3. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
4. Repeat.
. If βk = 1, this is the distribution in [Strohmer, Vershynin ’09].
. If ‖ai‖2= 1, this is the uniform distribution over(
[m]βk
).
24
Generalized SKM
Given x0 ∈ Rn:
1. Choose τk ∈(
[m]βk
)according to pxk−1 .
2. Choose ik := t(τk , xk−1).
3. Define xk := xk−1 +bik−a
Tikxk−1
||aik ||2aik .
4. Repeat.
. If βk = 1, this is the distribution in [Strohmer, Vershynin ’09].
. If ‖ai‖2= 1, this is the uniform distribution over(
[m]βk
).
24
Generalized Result
Theorem (H. - Ma 2019+)
Let x∗ denote the unique solution to the system of equations Ax = b.
Then generalized SKM converges at least linearly in expectation and the
bound on the rate depends on the dynamic range, γk of the random
sample of βk rows of A, τk . Precisely, in the kth iteration of generalized
SKM, we have
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2.
. If all rows of A have the same norm, then
Eτk‖xk − x∗‖2≤(
1− βkσ2min(A)
γk‖A‖2F
)‖xk−1 − x∗‖2.
25
Generalized Result
Theorem (H. - Ma 2019+)
Let x∗ denote the unique solution to the system of equations Ax = b.
Then generalized SKM converges at least linearly in expectation and the
bound on the rate depends on the dynamic range, γk of the random
sample of βk rows of A, τk . Precisely, in the kth iteration of generalized
SKM, we have
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2.
. If all rows of A have the same norm, then
Eτk‖xk − x∗‖2≤(
1− βkσ2min(A)
γk‖A‖2F
)‖xk−1 − x∗‖2.
25
Sketch of Proof
‖xk − x∗‖2 = ‖xk−1 − x∗‖2−‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
1
Eτk‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
=∑
τ∈([m]βk)
‖at(τ,xk−1)‖2∑π∈([m]βk)
‖at(π,xk−1)‖2‖Aτxk−1 − bτ‖2∞‖at(τ,xk−1)‖2
=1∑
π∈([m]βk)‖at(π,xk−1)‖2
∑τ∈([m]βk)
‖Aτxk−1 − bτ‖2∞
=
(mβk
)βk
γkm∑π∈([m]βk)
‖at(π,xk−1)‖2‖Axk−1 − b‖2
1Originally created by en:User:Michael Hardy, then scaled, with colour and labels being added by en:User:Wapcaplet, transformed in svg
format by fr:Utilisateur:Steff, changed colors and font by de:Leo2004. (https://commons.wikimedia.org/wiki/File:Pythagorean.svg)
26
Sketch of Proof
‖xk − x∗‖2 = ‖xk−1 − x∗‖2−‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
1
Eτk‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
=∑
τ∈([m]βk)
‖at(τ,xk−1)‖2∑π∈([m]βk)
‖at(π,xk−1)‖2‖Aτxk−1 − bτ‖2∞‖at(τ,xk−1)‖2
=1∑
π∈([m]βk)‖at(π,xk−1)‖2
∑τ∈([m]βk)
‖Aτxk−1 − bτ‖2∞
=
(mβk
)βk
γkm∑π∈([m]βk)
‖at(π,xk−1)‖2‖Axk−1 − b‖2
1Originally created by en:User:Michael Hardy, then scaled, with colour and labels being added by en:User:Wapcaplet, transformed in svg
format by fr:Utilisateur:Steff, changed colors and font by de:Leo2004. (https://commons.wikimedia.org/wiki/File:Pythagorean.svg)
26
Sketch of Proof
‖xk − x∗‖2 = ‖xk−1 − x∗‖2−‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
1
Eτk‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
=∑
τ∈([m]βk)
‖at(τ,xk−1)‖2∑π∈([m]βk)
‖at(π,xk−1)‖2‖Aτxk−1 − bτ‖2∞‖at(τ,xk−1)‖2
=1∑
π∈([m]βk)‖at(π,xk−1)‖2
∑τ∈([m]βk)
‖Aτxk−1 − bτ‖2∞
=
(mβk
)βk
γkm∑π∈([m]βk)
‖at(π,xk−1)‖2‖Axk−1 − b‖2
1Originally created by en:User:Michael Hardy, then scaled, with colour and labels being added by en:User:Wapcaplet, transformed in svg
format by fr:Utilisateur:Steff, changed colors and font by de:Leo2004. (https://commons.wikimedia.org/wiki/File:Pythagorean.svg)
26
Sketch of Proof
‖xk − x∗‖2 = ‖xk−1 − x∗‖2−‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
1
Eτk‖Aτkxk−1 − bτk‖2∞‖at(τk ,xk−1)‖2
=∑
τ∈([m]βk)
‖at(τ,xk−1)‖2∑π∈([m]βk)
‖at(π,xk−1)‖2‖Aτxk−1 − bτ‖2∞‖at(τ,xk−1)‖2
=1∑
π∈([m]βk)‖at(π,xk−1)‖2
∑τ∈([m]βk)
‖Aτxk−1 − bτ‖2∞
=
(mβk
)βk
γkm∑π∈([m]βk)
‖at(π,xk−1)‖2‖Axk−1 − b‖2
1Originally created by en:User:Michael Hardy, then scaled, with colour and labels being added by en:User:Wapcaplet, transformed in svg
format by fr:Utilisateur:Steff, changed colors and font by de:Leo2004. (https://commons.wikimedia.org/wiki/File:Pythagorean.svg)
26
Now can we determine the optimal β?
Roughly, if we know the value of γj , we can (just) do it.
27
Now can we determine the optimal β?
Roughly, if we know the value of γj , we can (just) do it.
27
Now can we determine the optimal β?
Roughly, if we know the value of γj , we can (just) do it.
27
Conclusions and Future Work
. presented a convergence rate for SKM which generalizes (and
improves) results for RK and Motzkin
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2
. proved a result which “easily” yields results which illustrate SKM
improvement for specific types of measurement matrices
. specialized our result to Gaussian matrices and AC systems
. identify useful bounds on γk for other useful systems
. identify optimal β of systems for which γk is known
28
Conclusions and Future Work
. presented a convergence rate for SKM which generalizes (and
improves) results for RK and Motzkin
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2
. proved a result which “easily” yields results which illustrate SKM
improvement for specific types of measurement matrices
. specialized our result to Gaussian matrices and AC systems
. identify useful bounds on γk for other useful systems
. identify optimal β of systems for which γk is known
28
Conclusions and Future Work
. presented a convergence rate for SKM which generalizes (and
improves) results for RK and Motzkin
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2
. proved a result which “easily” yields results which illustrate SKM
improvement for specific types of measurement matrices
. specialized our result to Gaussian matrices and AC systems
. identify useful bounds on γk for other useful systems
. identify optimal β of systems for which γk is known
28
Conclusions and Future Work
. presented a convergence rate for SKM which generalizes (and
improves) results for RK and Motzkin
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2
. proved a result which “easily” yields results which illustrate SKM
improvement for specific types of measurement matrices
. specialized our result to Gaussian matrices and AC systems
. identify useful bounds on γk for other useful systems
. identify optimal β of systems for which γk is known
28
Conclusions and Future Work
. presented a convergence rate for SKM which generalizes (and
improves) results for RK and Motzkin
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2
. proved a result which “easily” yields results which illustrate SKM
improvement for specific types of measurement matrices
. specialized our result to Gaussian matrices and AC systems
. identify useful bounds on γk for other useful systems
. identify optimal β of systems for which γk is known
28
Conclusions and Future Work
. presented a convergence rate for SKM which generalizes (and
improves) results for RK and Motzkin
Eτk‖xk − x∗‖2≤(
1−βk(mβk
)σ2min(A)
γkm∑τ∈([m]βk)
‖at(τ,xk−1)‖2)‖xk−1 − x∗‖2
. proved a result which “easily” yields results which illustrate SKM
improvement for specific types of measurement matrices
. specialized our result to Gaussian matrices and AC systems
. identify useful bounds on γk for other useful systems
. identify optimal β of systems for which γk is known
28
Thanks for listening!
Questions?
[1] J. A. De Loera, J. Haddock, and D. Needell. A sampling Kaczmarz-Motzkin
algorithm for linear feasibility. SIAM Journal on Scientific Computing,
39(5):S66–S87, 2017.
[2] J. Haddock and D. Needell. On Motzkins method for inconsistent linear systems.
BIT Numerical Mathematics, 59(2):387–401, 2019.
[3] Nicolas Loizou and Peter Richtárik. Revisiting randomized gossip algorithms:
General framework, convergence rates and novel block and accelerated
protocols. arXiv preprint arXiv:1905.08645, 2019.
[4] T. S. Motzkin and I. J. Schoenberg. The relaxation method for linear
inequalities. Canadian J. Math., 6:393–404, 1954.
[5] T. Strohmer and R. Vershynin. A randomized Kaczmarz algorithm with
exponential convergence. J. Fourier Anal. Appl., 15:262–278, 2009.
29
Block Kaczmarz
30