Lecture 1: Numerical Issues from Inverse Problems ...mueller/graduate...Lecture 1: Numerical Issues...

Preview:

Citation preview

Lecture 1: Numerical Issues from InverseProblems (Parameter Estimation, Regularization

Theory, and Parallel Algorithms)

Youzuo Lin 1

Joint work with: Rosemary A. Renaut 2

Brendt Wohlberg 1

Hongbin Guo 2

1. Los Alamos National Laboratory2. School of Mathematical and Statistical Sciences, Arizona State University

Graduate Student Workshop on Inverse Problem, 2016

Outline

1 Introduction: Inverse Problem and Regularization

2 Topic 1: Multisplitting for Regularized Least SquaresRegularization ParallelizationMultiple Right Hand Side Problem

3 Topic 2: Total-Variation RegularizationNumerical Methods for Total VariationOptimal Parameter Selection, UPRE Approach

4 Topic 3: Projected Krylov Subspace Solvers on GPUFine-Grained Parallelism Model for Krylov Subspace Solvers onGPUOptimizations of Krylov Subspace Algorithms on GPUNumerical Results

5 References

Inverse Theory Inverse Problems Workshop 1 / 64

Outline

1 Introduction: Inverse Problem and Regularization

2 Topic 1: Multisplitting for Regularized Least SquaresRegularization ParallelizationMultiple Right Hand Side Problem

3 Topic 2: Total-Variation RegularizationNumerical Methods for Total VariationOptimal Parameter Selection, UPRE Approach

4 Topic 3: Projected Krylov Subspace Solvers on GPUFine-Grained Parallelism Model for Krylov Subspace Solvers onGPUOptimizations of Krylov Subspace Algorithms on GPUNumerical Results

5 References

Inverse Theory Inverse Problems Workshop 2 / 64

Inverse Problems and Ill-posedness

• General Linear Inverse Problem

b = Af + η,Measured data b andsystem (or transform)matrix A

⇓Input Data f

• Image Restoration - DeconvolutionTransform matrix A is blurring matrix,η is the noise

• Image Reconstruction - TomographyTransform matrix A is Radontransform, η is the noise

Inverse Theory Inverse Problems Workshop 2 / 64

Inverse Problems and Ill-posedness

• General Linear Inverse Problem

b = Af + η,Measured data b andsystem (or transform)matrix A

⇓Input Data f

• Image Restoration - DeconvolutionTransform matrix A is blurring matrix,η is the noise

• Image Reconstruction - TomographyTransform matrix A is Radontransform, η is the noise

Inverse Theory Inverse Problems Workshop 2 / 64

Inverse Problems and Ill-posedness

• General Linear Inverse Problem

b = Af + η,Measured data b andsystem (or transform)matrix A

⇓Input Data f

• Image Restoration - DeconvolutionTransform matrix A is blurring matrix,η is the noise

• Image Reconstruction - TomographyTransform matrix A is Radontransform, η is the noise

Inverse Theory Inverse Problems Workshop 2 / 64

Inverse Problems and Ill-posedness

• General Linear Inverse Problem

b = Af + η,Measured data b andsystem (or transform)matrix A

⇓Input Data f

• Image Restoration - DeconvolutionTransform matrix A is blurring matrix,η is the noise

• Image Reconstruction - TomographyTransform matrix A is Radontransform, η is the noise

Inverse Theory Inverse Problems Workshop 2 / 64

Direct Inverse Filtering

• How about f = A−1b?

Figure: Left: Blurred and Noisy Image with SNR of 12.21 (dB), Right:Restored Image by Direct Inversion with SNR of -11.17 (dB)

Inverse Theory Inverse Problems Workshop 3 / 64

Direct Inverse Filtering

• How about f = A−1b?

Figure: Left: Blurred and Noisy Image with SNR of 12.21 (dB), Right:Restored Image by Direct Inversion with SNR of -11.17 (dB)

Inverse Theory Inverse Problems Workshop 3 / 64

Constrained Least Squares and RegularizationModel

Constrained Least Squares:

minf{J(f)} ,

subject to ‖ Af− b ‖22= σ2.

Regularization Model:

minf

{‖ Af− b ‖22 +λ2J(f)

}, λ > 0,

‖ Af− b ‖22 : Data Fidelity TermJ(f) : Regularization Termλ : Regularization Parameter

Inverse Theory Inverse Problems Workshop 4 / 64

Specific Regularization

Tikhonov Regularization:When J(f) is a quadratic functional fT LT Lf,

minf

{‖ Af− b ‖22 +λ2 ‖ Lf ‖22

},

Total Variation (TV) Regularization:When J(f) is a nonquadratic functional like Total Variation,

minf

{‖ Af− b ‖22 +λ2 ‖ ∇f ‖1

},

Others:Learning Based Regularization, Compressive Sensing etc.

Inverse Theory Inverse Problems Workshop 5 / 64

Outline

1 Introduction: Inverse Problem and Regularization

2 Topic 1: Multisplitting for Regularized Least SquaresRegularization ParallelizationMultiple Right Hand Side Problem

3 Topic 2: Total-Variation RegularizationNumerical Methods for Total VariationOptimal Parameter Selection, UPRE Approach

4 Topic 3: Projected Krylov Subspace Solvers on GPUFine-Grained Parallelism Model for Krylov Subspace Solvers onGPUOptimizations of Krylov Subspace Algorithms on GPUNumerical Results

5 References

Inverse Theory Inverse Problems Workshop 6 / 64

Topic 1: Multisplitting for Regularized LeastSquares [Renaut, Lin and Guo, 2009]

• Regularization Parallelization• Bottleneck for Applications• Formulation-Multisplitting and Domain Decomposition• Local Solver Selection and Computational Cost• Update Scheme

• Multiple Right-Hand Sides Problem• A Second Look at Multisplitting• Hybrid Regularization with Multisplitting and Multiple Right Hand

Sides• Computational Cost Analysis• Application in Image Reconstruction and Restoration

• Numerical Results

Inverse Theory Inverse Problems Workshop 6 / 64

Tikhonov Solution and Memory Bottleneck

• Solving for the TK regularization is equivalent to solving for a largescale linear system:

(AT A + λ2LT L)f = AT b.

• What about RAM requirement? In the positron emissiontomography (PET) image reconstruction example in singleprecision, for an image of size 256× 256, with projection anglesfrom 0o to 180o with a 2o increment, the required RAM for storingA is::

65536× 33030× 410243 ≈ 8.02GB!

• What can we do about this? We apply Domain Decompositionidea to our problem.

Inverse Theory Inverse Problems Workshop 7 / 64

Formulation - Multisplitting

Linear multisplitting (LMS): Given a matrix A ∈ <n×n and a collectionof matrices M(i), N(i), E (i) ∈ <n×n, i = 1, . . . ,p satisfying

1 A = M(i) − N(i) for each i , i = 1, . . . ,p.2 M(i) is regular (nonsingular), i = 1, . . . ,p.3 E (i) is a non-negative diagonal matrix, i = 1, . . . ,p and∑p

i=1 E (i) = I.

Then the collection of triples (M(i),N(i),E (i)), i = 1, . . . ,p is called amultisplitting of A and the LMS method is defined by the iteration:

x (k+1) =

p∑i=1

E (i)(M(i))−1(N(i)x (k) + fb).

Inverse Theory Inverse Problems Workshop 8 / 64

Formulation - Domain Decomposition

• Domain decomposition is a particular method of multisplitting inwhich the submatrices M(i) are defined to be consistent with apartitioning of the domain:

x = (xT1 , x

T2 , ..., x

Tp )T .

• eg: Different Splitting Schemes

Inverse Theory Inverse Problems Workshop 9 / 64

Formulation - Domain Decomposition

• Domain decomposition is a particular method of multisplitting inwhich the submatrices M(i) are defined to be consistent with apartitioning of the domain:

x = (xT1 , x

T2 , ..., x

Tp )T .

• eg: Different Splitting Schemes

Inverse Theory Inverse Problems Workshop 9 / 64

Formulation - Operator Split [Renaut 1998]

• Corresponding to different splitting of the image x , we will havesplitting on kernel operator A as,

A = (A1,A2, ...,Ap),

• The linear system Ax ∼= b is replaced with the split systems

Aiyi∼= bi(x),

where bi(x) = b −∑

j 6=i Ajxj = b − Ax + Aixi .

Inverse Theory Inverse Problems Workshop 10 / 64

Formulation - Operator Split [Renaut 1998]

• Corresponding to different splitting of the image x , we will havesplitting on kernel operator A as,

A = (A1,A2, ...,Ap),

• The linear system Ax ∼= b is replaced with the split systems

Aiyi∼= bi(x),

where bi(x) = b −∑

j 6=i Ajxj = b − Ax + Aixi .

Inverse Theory Inverse Problems Workshop 10 / 64

Formulation - Local Problem

• Local Problem - Solving Ax ∼= b turns into solving p subproblems,

miny∈<ni

‖Aiyi − bi(x)‖2, 1 ≤ i ≤ p.

• Update Scheme - The global solution update from local solutionsat step K to step K + 1 is given by,

x (k+1) =

p∑i=1

τ(k+1)i (xlocal)

(k+1)i ,

where (xlocal)(k+1)i =

((x (k)1 )T , . . . , (x (k)

i−1)T , (y (k+1)i )T , (x (k)

i+1)T , . . . , (x (k)p )T )T

Inverse Theory Inverse Problems Workshop 11 / 64

Formulation - Local Problem

• Local Problem - Solving Ax ∼= b turns into solving p subproblems,

miny∈<ni

‖Aiyi − bi(x)‖2, 1 ≤ i ≤ p.

• Update Scheme - The global solution update from local solutionsat step K to step K + 1 is given by,

x (k+1) =

p∑i=1

τ(k+1)i (xlocal)

(k+1)i ,

where (xlocal)(k+1)i =

((x (k)1 )T , . . . , (x (k)

i−1)T , (y (k+1)i )T , (x (k)

i+1)T , . . . , (x (k)p )T )T

Inverse Theory Inverse Problems Workshop 11 / 64

Formulation - Regularized Multisplitting

• Tikhonov Regularization:

minx

{‖ Ax− b ‖22 +λ2‖Lx‖22

}, λ > 0,

• equivalent to,

min∥∥∥∥( A

)x −

(b0

)∥∥∥∥2

2,

where Λ is block diagonal with entries λIni .

Inverse Theory Inverse Problems Workshop 12 / 64

Formulation - Regularized Multisplitting

• Tikhonov Regularization:

minx

{‖ Ax− b ‖22 +λ2‖Lx‖22

}, λ > 0,

• equivalent to,

min∥∥∥∥( A

)x −

(b0

)∥∥∥∥2

2,

where Λ is block diagonal with entries λIni .

Inverse Theory Inverse Problems Workshop 12 / 64

Formulation - Regularized Multisplitting

• Similarly, we will have splitting on the operator as,(ALΛ

)=

((A1λ1L1

)(A2λ2L2

)· · ·(

ApλpLp

)).

• i th block equation of the normal equations is given by

(ATi Ai + λ2

i LTi Li)yi = AT

i bi(x)− λi

p∑j=1,j 6=i

λjLTi Ljxj .

Inverse Theory Inverse Problems Workshop 13 / 64

Formulation - Regularized Multisplitting

• Similarly, we will have splitting on the operator as,(ALΛ

)=

((A1λ1L1

)(A2λ2L2

)· · ·(

ApλpLp

)).

• i th block equation of the normal equations is given by

(ATi Ai + λ2

i LTi Li)yi = AT

i bi(x)− λi

p∑j=1,j 6=i

λjLTi Ljxj .

Inverse Theory Inverse Problems Workshop 13 / 64

Local Solver Selection and Computational Cost

• How to select the local solver ?

• Computation Cost and Memory Cost• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)

• Computation Cost: CCGLS ≈ 2n2i m̃ + K (2(ki + 1)n2

i + 10kini )• Memory Cost: Matrix vector multiplication, only need to store matrix

Ai

• Solver: CGLS seems to be reasonable at this point!

Inverse Theory Inverse Problems Workshop 14 / 64

Local Solver Selection and Computational Cost

• How to select the local solver ?• Computation Cost and Memory Cost

• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)• Computation Cost: CCGLS ≈ 2n2

i m̃ + K (2(ki + 1)n2i + 10kini )

• Memory Cost: Matrix vector multiplication, only need to store matrixAi

• Solver: CGLS seems to be reasonable at this point!

Inverse Theory Inverse Problems Workshop 14 / 64

Local Solver Selection and Computational Cost

• How to select the local solver ?• Computation Cost and Memory Cost

• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)

• Computation Cost: CCGLS ≈ 2n2i m̃ + K (2(ki + 1)n2

i + 10kini )• Memory Cost: Matrix vector multiplication, only need to store matrix

Ai

• Solver: CGLS seems to be reasonable at this point!

Inverse Theory Inverse Problems Workshop 14 / 64

Local Solver Selection and Computational Cost

• How to select the local solver ?• Computation Cost and Memory Cost

• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)• Computation Cost: CCGLS ≈ 2n2

i m̃ + K (2(ki + 1)n2i + 10kini )

• Memory Cost: Matrix vector multiplication, only need to store matrixAi

• Solver: CGLS seems to be reasonable at this point!

Inverse Theory Inverse Problems Workshop 14 / 64

Local Solver Selection and Computational Cost

• How to select the local solver ?• Computation Cost and Memory Cost

• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)• Computation Cost: CCGLS ≈ 2n2

i m̃ + K (2(ki + 1)n2i + 10kini )

• Memory Cost: Matrix vector multiplication, only need to store matrixAi

• Solver: CGLS seems to be reasonable at this point!

Inverse Theory Inverse Problems Workshop 14 / 64

Four Different Update Schemes

• Block Jacobian Update• Set τ (k+1)

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

• Convex Update• Set τ (k+1)

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

• Local Update• x(k+1) = (xlocal )

(k+1)i .

• Optimal τ Update• min

τr(x(k+1)),

where r(x(k+1)) = r(x(k))−p∑

i=1

τ(k+1)i

(AiλiLi

)δ(k+1)i .

Inverse Theory Inverse Problems Workshop 15 / 64

Four Different Update Schemes

• Block Jacobian Update• Set τ (k+1)

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

• Convex Update• Set τ (k+1)

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

• Local Update• x(k+1) = (xlocal )

(k+1)i .

• Optimal τ Update• min

τr(x(k+1)),

where r(x(k+1)) = r(x(k))−p∑

i=1

τ(k+1)i

(AiλiLi

)δ(k+1)i .

Inverse Theory Inverse Problems Workshop 15 / 64

Four Different Update Schemes

• Block Jacobian Update• Set τ (k+1)

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

• Convex Update• Set τ (k+1)

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

• Local Update• x(k+1) = (xlocal )

(k+1)i .

• Optimal τ Update• min

τr(x(k+1)),

where r(x(k+1)) = r(x(k))−p∑

i=1

τ(k+1)i

(AiλiLi

)δ(k+1)i .

Inverse Theory Inverse Problems Workshop 15 / 64

Four Different Update Schemes

• Block Jacobian Update• Set τ (k+1)

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

• Convex Update• Set τ (k+1)

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

• Local Update• x(k+1) = (xlocal )

(k+1)i .

• Optimal τ Update• min

τr(x(k+1)),

where r(x(k+1)) = r(x(k))−p∑

i=1

τ(k+1)i

(AiλiLi

)δ(k+1)i .

Inverse Theory Inverse Problems Workshop 15 / 64

A Second Look at the Subproblem

The i th block equation of the normal equations is given by

(ATi Ai + λ2

i LTi Li)yi = AT

i bi(x)− λi

p∑j=1,j 6=i

λjLTi Ljxj .

• The system matrix is unchanged;

Inverse Theory Inverse Problems Workshop 16 / 64

A Second Look at the Subproblem

Rewrite the right hand side (RHS) as,

bi(x(k+1)) = bi(x(k))−

p∑j=1,j 6=i

τ(k+1)j B(k+1)

j

+ OB(k+1)i ,

where B(k+1)j = Aj(x

(k+1)j − xk

j ), OB(k+1)i is the update of the

overlapped problem.

• The new RHS is updated based on the previous RHS;

Inverse Theory Inverse Problems Workshop 16 / 64

Linear System with Multiple RHS

• Multiple RHS Problem Setup

AX = B = [b(1), . . . ,b(s)],

where A ∈ <m×n, X ∈ <n×s and B ∈ <m×s.

• What about solving s systems independently?• Direct Method (LU): Major Computational Cost ≈ O(n3) + s ·O(n2);• Iterative Method (CGLS): Major Computational Cost ≈ s ·O(n2);

• Several algorithms have been proposed to speed up solving theabove system.

Inverse Theory Inverse Problems Workshop 17 / 64

Linear System with Multiple RHS

• Multiple RHS Problem Setup

AX = B = [b(1), . . . ,b(s)],

where A ∈ <m×n, X ∈ <n×s and B ∈ <m×s.

• What about solving s systems independently?• Direct Method (LU): Major Computational Cost ≈ O(n3) + s ·O(n2);• Iterative Method (CGLS): Major Computational Cost ≈ s ·O(n2);

• Several algorithms have been proposed to speed up solving theabove system.

Inverse Theory Inverse Problems Workshop 17 / 64

Linear System with Multiple RHS

• Multiple RHS Problem Setup

AX = B = [b(1), . . . ,b(s)],

where A ∈ <m×n, X ∈ <n×s and B ∈ <m×s.

• What about solving s systems independently?• Direct Method (LU): Major Computational Cost ≈ O(n3) + s ·O(n2);• Iterative Method (CGLS): Major Computational Cost ≈ s ·O(n2);

• Several algorithms have been proposed to speed up solving theabove system.

Inverse Theory Inverse Problems Workshop 17 / 64

Efficient Iterative Algorithm for Solving MultipleRHS

• If b(1), . . . ,b(s) are random or completely orthogonal to eachother, no way to speed up the algorithm!

• If b(1), . . . ,b(s) are close to each other and share information• Galerkin-Projection based Krylov subspace methods are proposed

for an efficient solver.• We choose CG as the specific Krylov subspace method proposed

in [Chan and Wan, 1997], because it is well-supportedtheoretically.

Inverse Theory Inverse Problems Workshop 18 / 64

Efficient Iterative Algorithm for Solving MultipleRHS

• If b(1), . . . ,b(s) are random or completely orthogonal to eachother, no way to speed up the algorithm!

• If b(1), . . . ,b(s) are close to each other and share information• Galerkin-Projection based Krylov subspace methods are proposed

for an efficient solver.• We choose CG as the specific Krylov subspace method proposed

in [Chan and Wan, 1997], because it is well-supportedtheoretically.

Inverse Theory Inverse Problems Workshop 18 / 64

Basic Idea of Projected CG Algorithm for SolvingMRHS

• Step 1: Solve the seed system.• Select the seed;• Apply the CGLS algorithm for solving seed system and save the

conjugate directions for Step 2;

• Step 2: Solve the remaining system by projection• Use the conjugate directions from the seed systems to span the

Krylov subspace• Project the systems onto the Krylov subspace for an approximated

solution

• Step 3: Restart the seed and Refine the solution• A new seed might be selected from the remaining systems to

restart the procedure.• Refine the approximation, if the accuracy is insufficient.

Inverse Theory Inverse Problems Workshop 19 / 64

Basic Idea of Projected CG Algorithm for SolvingMRHS

• Step 1: Solve the seed system.• Select the seed;• Apply the CGLS algorithm for solving seed system and save the

conjugate directions for Step 2;

• Step 2: Solve the remaining system by projection• Use the conjugate directions from the seed systems to span the

Krylov subspace• Project the systems onto the Krylov subspace for an approximated

solution

• Step 3: Restart the seed and Refine the solution• A new seed might be selected from the remaining systems to

restart the procedure.• Refine the approximation, if the accuracy is insufficient.

Inverse Theory Inverse Problems Workshop 19 / 64

Basic Idea of Projected CG Algorithm for SolvingMRHS

• Step 1: Solve the seed system.• Select the seed;• Apply the CGLS algorithm for solving seed system and save the

conjugate directions for Step 2;

• Step 2: Solve the remaining system by projection• Use the conjugate directions from the seed systems to span the

Krylov subspace• Project the systems onto the Krylov subspace for an approximated

solution

• Step 3: Restart the seed and Refine the solution• A new seed might be selected from the remaining systems to

restart the procedure.• Refine the approximation, if the accuracy is insufficient.

Inverse Theory Inverse Problems Workshop 19 / 64

Basic Idea of Projected CG Algorithm for SolvingMRHS

• Step 1: Solve the seed system.• Select the seed;• Apply the CGLS algorithm for solving seed system and save the

conjugate directions for Step 2;

• Step 2: Solve the remaining system by projection• Use the conjugate directions from the seed systems to span the

Krylov subspace• Project the systems onto the Krylov subspace for an approximated

solution

• Step 3: Restart the seed and Refine the solution• A new seed might be selected from the remaining systems to

restart the procedure.• Refine the approximation, if the accuracy is insufficient.

Inverse Theory Inverse Problems Workshop 19 / 64

Adapt to Our Problem: Solving Each Subproblemfrom RLSMS, Scheme 1

• Scheme 1 (Project/Restart): Treat the ith subproblem at alliteration as a Multiple RHS with s = 2, i.e. at iteration k = 1, solvethe subproblem as seed, for iteration k > 1, solve it as projectionand refine it as necessary;

• Comment: Straight forward adaptation, but usually the accuracyof approximation remaining after projection is not enough; Needextra steps for further refinement;

Inverse Theory Inverse Problems Workshop 20 / 64

Adapt to Our Problem: Solving Each Subproblemfrom RLSMS, Scheme 1

• Scheme 1 (Project/Restart): Treat the ith subproblem at alliteration as a Multiple RHS with s = 2, i.e. at iteration k = 1, solvethe subproblem as seed, for iteration k > 1, solve it as projectionand refine it as necessary;

• Comment: Straight forward adaptation, but usually the accuracyof approximation remaining after projection is not enough; Needextra steps for further refinement;

Inverse Theory Inverse Problems Workshop 20 / 64

Adapt to Our Problem: Solving Each Subproblemfrom RLSMS, Scheme 2

• Scheme 2 (Project/Augment): Based on seed selection as inScheme 1, in the refinement stages of solving the remainingsystems, stores the newly generated conjugate directions withthose from solving seeds;

• Comment: The motivation of this change is again to improve theKrylov subspace. By updating the conjugate directions, we will beable to add in new information which is not from solving the seedsystem. Numerical tests indicate this scheme works best withlarge scale problems!

Inverse Theory Inverse Problems Workshop 21 / 64

Adapt to Our Problem: Solving Each Subproblemfrom RLSMS, Scheme 2

• Scheme 2 (Project/Augment): Based on seed selection as inScheme 1, in the refinement stages of solving the remainingsystems, stores the newly generated conjugate directions withthose from solving seeds;

• Comment: The motivation of this change is again to improve theKrylov subspace. By updating the conjugate directions, we will beable to add in new information which is not from solving the seedsystem. Numerical tests indicate this scheme works best withlarge scale problems!

Inverse Theory Inverse Problems Workshop 21 / 64

Computational Cost Analysis

The computation cost of projected CGLS consists of two othercomponents: one is the cost for solving the seed system, which is thecost of the CGLS; the other is the cost for solving the remainingsystems, which includes the projection and the few more steps ofCGLS as needed:

CProjCG ≈ 2(K + 2ki + 2)mni + (6(K − 1)ki + 10ki + 1)ni + µ · CG.

Inverse Theory Inverse Problems Workshop 22 / 64

1-D Phillips, Restoration Result

Figure: Restoration results using different methods for the sample of Phillipssize of 1024× 1, noise level 6%.

Inverse Theory Inverse Problems Workshop 23 / 64

1-D Phillips, Global and Local Iteration Result

Figure: Maximum numbers of inner iterations against outer iteration count.

Inverse Theory Inverse Problems Workshop 24 / 64

2-D Seismic Reconstruction, Phantom Size 64× 64

Figure: SNR (Without Splitting): 11.65 dB, SNR (CGLS/Zero): 11.68 dB,SNR (Project/Restart): 11.63 dB, SNR (Project/Augment): 11.60 dB.

Inverse Theory Inverse Problems Workshop 25 / 64

2-D Seismic Reconstruction, Global and LocalIteration Result

Figure: Maximum numbers of inner iterations against outer iteration count.

Inverse Theory Inverse Problems Workshop 26 / 64

Outline

1 Introduction: Inverse Problem and Regularization

2 Topic 1: Multisplitting for Regularized Least SquaresRegularization ParallelizationMultiple Right Hand Side Problem

3 Topic 2: Total-Variation RegularizationNumerical Methods for Total VariationOptimal Parameter Selection, UPRE Approach

4 Topic 3: Projected Krylov Subspace Solvers on GPUFine-Grained Parallelism Model for Krylov Subspace Solvers onGPUOptimizations of Krylov Subspace Algorithms on GPUNumerical Results

5 References

Inverse Theory Inverse Problems Workshop 27 / 64

Topic 2: Total-Variation Regularization

• Numerical Methods for Total Variation• Iteratively Reweighted Norm Algorithm• Lagged Diffusivity Fixed Point Iteration Algorithm

• Optimal Regularization Parameter Selection,• Trace Approximation• UPRE Approach

• Numerical Results

Inverse Theory Inverse Problems Workshop 27 / 64

Iteratively Reweighted Norm Algorithm

Approximate L1 norm by weighted L2 norm, TV regularizationbecomes:

minf

{12‖ Af − b ‖22 +

λ

2‖W 1/2

r Df ‖22},

⇐⇒ minf

{12‖

(I 00 W 1/2

r

)((A√λD

)f −

(b0

))‖22

},

where Wr = diag(2((Dx f )2 + (Dy f )2)−12 ).

This minimization problem can be solved by CG solver.Ref: “B. Wohlberg, P. Rodrguez, An Iteratively Reweighted Norm Algorithmfor Minimization of Total Variation Functionals, IEEE Signal ProcessingLetters, 2007”

Inverse Theory Inverse Problems Workshop 28 / 64

Lagged Diffusivity Fixed Point Iteration Algorithm

Approximate L1 norm by,∑ψ(f ) =

∑√( df

dx + β2)

Gradient = AT (Af − b) + λL(f )f ,

Approximate Hessian = AT A + λL(f ),

where L(f ) = 4xDT diag(ψ′(f ))D.Based on the Quasi-Newton idea, we solve for f iteratively by,

fk+1 = fk − (AT A + λL(fk ))−1Gradient ,

which is called as “Lagged Diffusivity Fixed Point Iteration”.Ref: “C.R. Vogel, M. E. Oman, Iterative Methods for Total Variation, SIAM J.on Sci. Comp, 17(1996)”

Inverse Theory Inverse Problems Workshop 29 / 64

Image Deblurring Comparison using TV andTikhonov

Figure: Left: Original Clean Image, Right: Blurred and Noisy Image

Inverse Theory Inverse Problems Workshop 30 / 64

Image Deblurring Comparison using TV andTikhonov

Figure: Left: TK based Deblurred Result, Right: TV Based DeblurredResult

Inverse Theory Inverse Problems Workshop 31 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Problem Description-Simple Example

Inverse Theory Inverse Problems Workshop 32 / 64

Fundamental Idea of UPRE

Derived from the Mean Squared Error (MSE) of the predictive error,

1n||Pλ||2 =

1n||Afλ − Aftrue||2,

where fλ is the regularized solution,ftrue is the true solution.To avoid using ftrue, we define a functional which is the unbiasedestimator of the above, and name it UPRE (Unbiased Predictive RiskEstimator).

λopt = arg minλ

UPRE(λ)

Inverse Theory Inverse Problems Workshop 33 / 64

UPRE for Tikhonov

Vogel in [1] proved that, because the solution for TikhonovRegularization is linearly dependent on the right hand side of (2), i.e.,

fTK = RTK(b + η),

where RTK = (AT A + λI)−1AT , we will have

UPRETikhonov(λ) = E(1n||Pλ||2)

=1n||rλ||2 +

2σ2

ntrace(Aλ)− σ2,

where rλ = Afλ − b, Aλ = A(AT A + λI)−1AT , and σ is the variance ofthe noise.

Inverse Theory Inverse Problems Workshop 34 / 64

UPRE for TV

• Difficulty 1: Nonquadratic penalty functional in TV term.• Hint: Linearization Approximation!• In Lagged Diffusivity approach,

fTV = RTV(b + η),

where RTV = (AT A + λ2 L(fk ))−1AT

the influence matrix,

Aλ = A(AT A + λ2 L(fk ))−1AT .

Aλ can be proved to be symmetric, which will give us the samederivation of the UPRE functional for TV.

Inverse Theory Inverse Problems Workshop 35 / 64

UPRE for TV

• Difficulty 2: How to computetrace(Aλ) = trace(A(AT A + λ

2 L(fk ))−1AT )

• Hint: Krylov Space Method!• Basic Idea :

trace(f (M)) ' E(uT f (M)u),

where u is a discrete multivariate random variable, where eachentry takes the values -1 and +1 with probability 0.5, and thematrix M is symmetric positive definite (SPD).

Inverse Theory Inverse Problems Workshop 36 / 64

UPRE for TV

• Golub [2] pointed out that,

E(uT f (M)u) =k∑

i=1

ωi f (θi),

where θi are the eigenvalues of M, and ωi are squares of the firstcomponents of the normalized eigenvectors of M.

• Start the Lanczos procedure on M, and find out the mostsignificant eigenvectors until accuracy is satisfied.

Inverse Theory Inverse Problems Workshop 37 / 64

Trace Approximation Results

Figure: Left: Accurate Trace Vs. Estimated Trace, Right: Time Cost forComputing Accurate Trace Vs. Time Cost for Computing Estimated Trace

Inverse Theory Inverse Problems Workshop 38 / 64

TV Based Deblurring Using Optimal and RandomParameters

Figure: Left: Use Optimal Parameter, Right: Use Random Parameter

Inverse Theory Inverse Problems Workshop 39 / 64

More General Results for Deblurring

Figure: Plot of Optimal SNR and Estimated SNR versus original noisy SNRon satellite image.

Inverse Theory Inverse Problems Workshop 40 / 64

More General Results for Denoising

Figure: Plot of Optimal SNR and Estimated SNR versus original noisy SNRon satellite image. Ref: “G. Gilboa, N. Sochen, Y. Y. Zeevi, Estimation ofoptimal PDE-based denoising in the SNR sense, IEEE Transactions onImage Processing 15(8), 2006.”

Inverse Theory Inverse Problems Workshop 41 / 64

Conclusions

• Tikhonov Regularization:• Pros : easy to implement, cheap cost;• Cons : the result is usually smoothed out;

• Total Variation (TV) Regularization:• Pros : Preserve image edges provided with reasonable parameter,

piecewise constant• Cons : Expensive cost

Inverse Theory Inverse Problems Workshop 42 / 64

Outline

1 Introduction: Inverse Problem and Regularization

2 Topic 1: Multisplitting for Regularized Least SquaresRegularization ParallelizationMultiple Right Hand Side Problem

3 Topic 2: Total-Variation RegularizationNumerical Methods for Total VariationOptimal Parameter Selection, UPRE Approach

4 Topic 3: Projected Krylov Subspace Solvers on GPUFine-Grained Parallelism Model for Krylov Subspace Solvers onGPUOptimizations of Krylov Subspace Algorithms on GPUNumerical Results

5 References

Inverse Theory Inverse Problems Workshop 43 / 64

Topic 3: Projected Krylov Subspace Solvers onGPU [Lin and Renaut, 2010]

• Problem Description and Krylov Subspace Solvers• 3-D Medical Image Reconstruction• Inverse Linear Systems with Multiple Right-hand Sides• Conjugate Gradient Algorithm

• Fine-Grained Parallelism Model for Krylov Subspace Solvers onGPU

• GPU Structure• BLAS Performance on GPU• CG on GPU (ver 0)

• Optimizations of Krylov Subspace Algorithms on GPU• Projected CG• Modified Projected CG

• Numerical Results

Inverse Theory Inverse Problems Workshop 43 / 64

3-D Medical Image Reconstruction

Figure: Image Slices from a 3-D Scan

Inverse Theory Inverse Problems Workshop 44 / 64

Inverse Linear Systems with Multiple Right-handSides

• Multiple Right-hand Sides System:

Ax = B = [b(1), . . .b(s)],

where bi stands for the i th measurement with respect to the i th slice asin a 3-D medical image.• Tikhonov regularization is applied to the above systems.• Regularization parameter is assumed to be close enough to theoptimal value.

Inverse Theory Inverse Problems Workshop 45 / 64

Conjugate Gradient Algorithm

• k = 0, r0 = b− Ax0;while (||rk ||2 > TOL ∗ ||b||2)

k = k + 1;if (k == 1)

p1 = r0;elseβk =< rk−1, rk−1 > / < rk−2, rk−2 >;pk = rk−1 + βkpk−1;

end;αk =< rk−1, rk−1 > / < pk , Apk >;xk = xk−1 + αkpk ;rk = rk−1 − αkApk ;

end (while).

Inverse Theory Inverse Problems Workshop 46 / 64

GPU Structure - NVIDIA Tesla C1060

Figure: NVIDIA Tesla C1060

Inverse Theory Inverse Problems Workshop 47 / 64

GPU Structure - NVIDIA Tesla C1060

Streaming Multiprocessors (SM): 30Streaming Processors (SP): 8

Register: 16 K per SMShared Memory: 16 KB per SMConstant Memory: 64 KBGlobal Memory: 4 GBBandwidth: 102 GB/Sec

GFLOPS (PEAK)Single Precision: 933Double Precision: 78

Figure: Structure ofNVIDIA Tesla C1060

Inverse Theory Inverse Problems Workshop 48 / 64

BLAS Performance on GPU - Matrix VectorMultiplication

• Testing EnvironmentOperating System: Ubuntu, 64 bitCPU (Host): Intel Quad Core Xeon E5506, 2.13 GHzGPU (Device): NVIDIA Tesla C1060Programming on Host: Matlab using Single / Multi Core(s)Programming on Device: CUDA + CUBLAS, Matlab + Jacket

Inverse Theory Inverse Problems Workshop 49 / 64

BLAS Performance on GPU - Matrix VectorMultiplication

• Testing matrices are selected from Matrix Market

Benchmark Prob. Sizebcsstm19 8171138 bus 1138bcsstm23 3134bcsstk16 4884s1rmt3m1 5489bcsstk18 15439

Inverse Theory Inverse Problems Workshop 50 / 64

BLAS Performance on GPU - Matrix VectorMultiplication

Figure: Time Cost Figure: Speedup Ratio

Inverse Theory Inverse Problems Workshop 51 / 64

BLAS Performance on GPU - Matrix VectorMultiplication

Figure: Relative Error

Inverse Theory Inverse Problems Workshop 52 / 64

BLAS Performance on GPU - BLAS 1, BLAS 2 andBLAS 3

Figure: GFLOPS Comparison

• Comment: BLAS 3 outperforms BLAS 1 and BLAS 2 significantly.

Inverse Theory Inverse Problems Workshop 53 / 64

CG on GPU (ver 0)

• A synthetic problem setup using Chan’s example [Chan and Wan,1997]

A = diag(1, . . . ,n), n = 10240. bi(t) = sin(t + i∆t), i = 1, . . . ,n,and ∆t = 2π/100.• TOL = 1.0× 10−5.• Results:

CG on Host: 31.90 sec on a single core or 17.53 sec on amulti-core.

CG on Device: 5.49 sec.

Inverse Theory Inverse Problems Workshop 54 / 64

CG on GPU (ver 0) - A closer look

Total Time: 5.49 sec;

Mat-Vec: 3.98 sec(72.5%);

Dot-Prod: 0.57 sec(10.4%);

SAXPY: 0.16 sec(2.9%);

Comment: Mat-Vec is too expensive!

Inverse Theory Inverse Problems Workshop 55 / 64

Projected CG (PrCG) - Algorithm

• By applying Lanczos-Galerkin projection method to CG, we have thePrCG [Chan and Wan, 1997].

i = 0;r(q,1)i = b(q) − Ax(q,1)

i ;for i = 1 to k

α(q,1)i =

< p(1)i , r(q,1)i >

< p(1)i , Ap(1)

i >;

x(q,1)i = x(q,1)

i + α(q,1)i p(1)

i ;r(q,1)i+1 = r(q,1)i − α(q,1)

i Ap(1)i ;

end (for);% Restart the system if needed;

Inverse Theory Inverse Problems Workshop 56 / 64

Projected CG (PrCG)

Total Time: 1.91 sec;

Mat-Vec: 1.01 sec(52.8%);

Dot-Prod: 0.45 sec(23.6%);

SAXPY: 0.29 sec(15.2%);

Comment: SAXPY and Dot-Prod are becoming relatively expensive.

Inverse Theory Inverse Problems Workshop 57 / 64

Augmented Projected CG (APrCG)

Using the orthogonality properties that < p(1)j , Ap(1)

i >= 0, j 6= i ,

and < p(1)j , r(q,1)i >= 0, j < i ,

take the dot product with p(1)i on both sides of

r(q,1)i+1 = b(q) −k∑

i=1

α(q,1)i Ap(1)

i , we can rewrite αi as,

α(q,1)i =

< p(1)i , b(q) >

< p(1)i , Ap(1)

i >.

Inverse Theory Inverse Problems Workshop 58 / 64

Augmented Projected CG (APrCG) - Algorithm

• Let Pk = [p1, . . . ,pk ] and APk = [Ap1, . . . ,Apk ].

i = 0;r(q,1)i = b(q) − Ax(q,1)

i ;α =< Pk , r(q,1)0 > ./diagVec(< Pk , APk >);Λ = diag(α);x(q,1) = x(q,1)

0 + sum(Pk · Λ);r(q,1) = r(q,1)0 − sum(APk · Λ);

Inverse Theory Inverse Problems Workshop 59 / 64

Augmented Projected CG (APrCG)

Figure: PrCG Figure: APrCG

Inverse Theory Inverse Problems Workshop 60 / 64

Numerical Test / 3D Shepp-Logan Phantom

Figure: Slice Show of A 3D Shepp-Logan Phantom

Inverse Theory Inverse Problems Workshop 61 / 64

3D Shepp-Logan Phantom - Results

Slice CG PrCG APrCG ImpRateCost SNR Cost SNR Cost SNR65 17.41 13.67 2.16 13.67 2.06 13.67 4.63%

66 16.34 13.80 2.74 13.83 2.58 13.83 5.84%

67 14.35 13.91 3.11 13.95 2.81 13.95 9.65%

68 13.03 13.80 3.53 13.83 3.46 13.83 1.98%

Table: Reconstruction of Four Consecutive Slices from 65 to 68. The 64th

slice is selected as seed.

Inverse Theory Inverse Problems Workshop 62 / 64

3D Shepp-Logan Phantom - Results

Figure: Reconstruction Results on Host and Device,slice index = 66.

Inverse Theory Inverse Problems Workshop 63 / 64

Outline

1 Introduction: Inverse Problem and Regularization

2 Topic 1: Multisplitting for Regularized Least SquaresRegularization ParallelizationMultiple Right Hand Side Problem

3 Topic 2: Total-Variation RegularizationNumerical Methods for Total VariationOptimal Parameter Selection, UPRE Approach

4 Topic 3: Projected Krylov Subspace Solvers on GPUFine-Grained Parallelism Model for Krylov Subspace Solvers onGPUOptimizations of Krylov Subspace Algorithms on GPUNumerical Results

5 References

Inverse Theory Inverse Problems Workshop 64 / 64

References

1 R. A. Renaut, Y. Lin, and H. Guo. “Solving Least SquaresProblems by Using a Regularized Multisplitting Method.”Numerical Linear Algebra with applications, vol. 19, issue. 4,2009.

2 Y. Lin, B. Wohlberg, and H. Guo. “UPRE method for total variationparameter selection.” Signal processing, vol. 90, no. 8, 2010.

Inverse Theory Inverse Problems Workshop 64 / 64

Recommended