Lecture 1: Numerical Issues from Inverse Problems ...mueller/graduate...Lecture 1: Numerical Issues...

Lecture 1: Numerical Issues from InverseProblems (Parameter Estimation, Regularization

Theory, and Parallel Algorithms)

Youzuo Lin 1

Joint work with: Rosemary A. Renaut 2

Brendt Wohlberg 1

Hongbin Guo 2

1. Los Alamos National Laboratory2. School of Mathematical and Statistical Sciences, Arizona State University

Graduate Student Workshop on Inverse Problem, 2016

Outline

1 Introduction: Inverse Problem and Regularization

2 Topic 1: Multisplitting for Regularized Least SquaresRegularization ParallelizationMultiple Right Hand Side Problem

3 Topic 2: Total-Variation RegularizationNumerical Methods for Total VariationOptimal Parameter Selection, UPRE Approach

4 Topic 3: Projected Krylov Subspace Solvers on GPUFine-Grained Parallelism Model for Krylov Subspace Solvers onGPUOptimizations of Krylov Subspace Algorithms on GPUNumerical Results

5 References

Inverse Theory Inverse Problems Workshop 1 / 64

Outline

5 References

Inverse Problems and Ill-posedness

• General Linear Inverse Problem

b = Af + η,Measured data b andsystem (or transform)matrix A

⇓Input Data f

• Image Restoration - DeconvolutionTransform matrix A is blurring matrix,η is the noise

• Image Reconstruction - TomographyTransform matrix A is Radontransform, η is the noise

⇓Input Data f

Direct Inverse Filtering

• How about f = A−1b?

Figure: Left: Blurred and Noisy Image with SNR of 12.21 (dB), Right:Restored Image by Direct Inversion with SNR of -11.17 (dB)

Direct Inverse Filtering

• How about f = A−1b?

Figure: Left: Blurred and Noisy Image with SNR of 12.21 (dB), Right:Restored Image by Direct Inversion with SNR of -11.17 (dB)

Constrained Least Squares and RegularizationModel

Constrained Least Squares:

minf{J(f)} ,

subject to ‖ Af− b ‖22= σ2.

Regularization Model:

{‖ Af− b ‖22 +λ2J(f)

}, λ > 0,

‖ Af− b ‖22 : Data Fidelity TermJ(f) : Regularization Termλ : Regularization Parameter

Specific Regularization

Tikhonov Regularization:When J(f) is a quadratic functional fT LT Lf,

{‖ Af− b ‖22 +λ2 ‖ Lf ‖22

Total Variation (TV) Regularization:When J(f) is a nonquadratic functional like Total Variation,

{‖ Af− b ‖22 +λ2 ‖ ∇f ‖1

Others:Learning Based Regularization, Compressive Sensing etc.

Outline

5 References

Topic 1: Multisplitting for Regularized LeastSquares [Renaut, Lin and Guo, 2009]

• Regularization Parallelization• Bottleneck for Applications• Formulation-Multisplitting and Domain Decomposition• Local Solver Selection and Computational Cost• Update Scheme

• Multiple Right-Hand Sides Problem• A Second Look at Multisplitting• Hybrid Regularization with Multisplitting and Multiple Right Hand

Sides• Computational Cost Analysis• Application in Image Reconstruction and Restoration

• Numerical Results

Tikhonov Solution and Memory Bottleneck

• Solving for the TK regularization is equivalent to solving for a largescale linear system:

(AT A + λ2LT L)f = AT b.

• What about RAM requirement? In the positron emissiontomography (PET) image reconstruction example in singleprecision, for an image of size 256× 256, with projection anglesfrom 0o to 180o with a 2o increment, the required RAM for storingA is::

65536× 33030× 410243 ≈ 8.02GB!

• What can we do about this? We apply Domain Decompositionidea to our problem.

Formulation - Multisplitting

Linear multisplitting (LMS): Given a matrix A ∈ <n×n and a collectionof matrices M(i), N(i), E (i) ∈ <n×n, i = 1, . . . ,p satisfying

1 A = M(i) − N(i) for each i , i = 1, . . . ,p.2 M(i) is regular (nonsingular), i = 1, . . . ,p.3 E (i) is a non-negative diagonal matrix, i = 1, . . . ,p and∑p

i=1 E (i) = I.

Then the collection of triples (M(i),N(i),E (i)), i = 1, . . . ,p is called amultisplitting of A and the LMS method is defined by the iteration:

x (k+1) =

p∑i=1

E (i)(M(i))−1(N(i)x (k) + fb).

Formulation - Domain Decomposition

• Domain decomposition is a particular method of multisplitting inwhich the submatrices M(i) are defined to be consistent with apartitioning of the domain:

x = (xT1 , x

T2 , ..., x

Tp )T .

• eg: Different Splitting Schemes

Formulation - Domain Decomposition

• Domain decomposition is a particular method of multisplitting inwhich the submatrices M(i) are defined to be consistent with apartitioning of the domain:

x = (xT1 , x

T2 , ..., x

Tp )T .

• eg: Different Splitting Schemes

Formulation - Operator Split [Renaut 1998]

• Corresponding to different splitting of the image x , we will havesplitting on kernel operator A as,

A = (A1,A2, ...,Ap),

• The linear system Ax ∼= b is replaced with the split systems

Aiyi∼= bi(x),

where bi(x) = b −∑

j 6=i Ajxj = b − Ax + Aixi .

Formulation - Operator Split [Renaut 1998]

• Corresponding to different splitting of the image x , we will havesplitting on kernel operator A as,

A = (A1,A2, ...,Ap),

• The linear system Ax ∼= b is replaced with the split systems

Aiyi∼= bi(x),

where bi(x) = b −∑

j 6=i Ajxj = b − Ax + Aixi .

Formulation - Local Problem

• Local Problem - Solving Ax ∼= b turns into solving p subproblems,

miny∈<ni

‖Aiyi − bi(x)‖2, 1 ≤ i ≤ p.

• Update Scheme - The global solution update from local solutionsat step K to step K + 1 is given by,

x (k+1) =

p∑i=1

τ(k+1)i (xlocal)

(k+1)i ,

where (xlocal)(k+1)i =

((x (k)1 )T , . . . , (x (k)

i−1)T , (y (k+1)i )T , (x (k)

i+1)T , . . . , (x (k)p )T )T

Formulation - Local Problem

• Local Problem - Solving Ax ∼= b turns into solving p subproblems,

miny∈<ni

‖Aiyi − bi(x)‖2, 1 ≤ i ≤ p.

• Update Scheme - The global solution update from local solutionsat step K to step K + 1 is given by,

x (k+1) =

p∑i=1

τ(k+1)i (xlocal)

(k+1)i ,

where (xlocal)(k+1)i =

((x (k)1 )T , . . . , (x (k)

i−1)T , (y (k+1)i )T , (x (k)

i+1)T , . . . , (x (k)p )T )T

Formulation - Regularized Multisplitting

• Tikhonov Regularization:

{‖ Ax− b ‖22 +λ2‖Lx‖22

}, λ > 0,

• equivalent to,

min∥∥∥∥( A

)x −

)∥∥∥∥2

where Λ is block diagonal with entries λIni .

• Tikhonov Regularization:

{‖ Ax− b ‖22 +λ2‖Lx‖22

}, λ > 0,

• equivalent to,

min∥∥∥∥( A

)x −

)∥∥∥∥2

where Λ is block diagonal with entries λIni .

• Similarly, we will have splitting on the operator as,(ALΛ

((A1λ1L1

)(A2λ2L2

)· · ·(

ApλpLp

• i th block equation of the normal equations is given by

(ATi Ai + λ2

i LTi Li)yi = AT

i bi(x)− λi

p∑j=1,j 6=i

λjLTi Ljxj .

• Similarly, we will have splitting on the operator as,(ALΛ

((A1λ1L1

)(A2λ2L2

)· · ·(

ApλpLp

• i th block equation of the normal equations is given by

(ATi Ai + λ2

i LTi Li)yi = AT

i bi(x)− λi

p∑j=1,j 6=i

λjLTi Ljxj .

Local Solver Selection and Computational Cost

• How to select the local solver ?

• Computation Cost and Memory Cost• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)

• Computation Cost: CCGLS ≈ 2n2i m̃ + K (2(ki + 1)n2

i + 10kini )• Memory Cost: Matrix vector multiplication, only need to store matrix

• Solver: CGLS seems to be reasonable at this point!

• How to select the local solver ?• Computation Cost and Memory Cost

• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)• Computation Cost: CCGLS ≈ 2n2

i m̃ + K (2(ki + 1)n2i + 10kini )

• Memory Cost: Matrix vector multiplication, only need to store matrixAi

• Iterative Solver: Conjugate Gradient for Least Squares (CGLS)

• Computation Cost: CCGLS ≈ 2n2i m̃ + K (2(ki + 1)n2

i + 10kini )• Memory Cost: Matrix vector multiplication, only need to store matrix

i m̃ + K (2(ki + 1)n2i + 10kini )

Four Different Update Schemes

• Block Jacobian Update• Set τ (k+1)

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

• Convex Update• Set τ (k+1)

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

• Local Update• x(k+1) = (xlocal )

(k+1)i .

• Optimal τ Update• min

τr(x(k+1)),

where r(x(k+1)) = r(x(k))−p∑

τ(k+1)i

(AiλiLi

)δ(k+1)i .

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

(k+1)i .

τr(x(k+1)),

τ(k+1)i

(AiλiLi

)δ(k+1)i .

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

(k+1)i .

τr(x(k+1)),

τ(k+1)i

(AiλiLi

)δ(k+1)i .

i = τi = 1,sox(k+1) = ((y(k)

1 )T , (y(k)2 )T , ..., (y(k)

i−1)T , (y(k+1)i )T , (y(k)

i+1)T , ..., (y(k)p )T )T .

i = τi = 1p ,

x(k+1) = x(k) + 1p (y(k+1) − x(k)).

(k+1)i .

τr(x(k+1)),

τ(k+1)i

(AiλiLi

)δ(k+1)i .

A Second Look at the Subproblem

The i th block equation of the normal equations is given by

(ATi Ai + λ2

i LTi Li)yi = AT

i bi(x)− λi

p∑j=1,j 6=i

λjLTi Ljxj .

• The system matrix is unchanged;

A Second Look at the Subproblem

Rewrite the right hand side (RHS) as,

bi(x(k+1)) = bi(x(k))−

p∑j=1,j 6=i

τ(k+1)j B(k+1)

+ OB(k+1)i ,

where B(k+1)j = Aj(x

(k+1)j − xk

j ), OB(k+1)i is the update of the

overlapped problem.

• The new RHS is updated based on the previous RHS;

Linear System with Multiple RHS

• Multiple RHS Problem Setup

AX = B = [b(1), . . . ,b(s)],

where A ∈ <m×n, X ∈ <n×s and B ∈ <m×s.

• What about solving s systems independently?• Direct Method (LU): Major Computational Cost ≈ O(n3) + s ·O(n2);• Iterative Method (CGLS): Major Computational Cost ≈ s ·O(n2);

• Several algorithms have been proposed to speed up solving theabove system.

AX = B = [b(1), . . . ,b(s)],

Efficient Iterative Algorithm for Solving MultipleRHS

• If b(1), . . . ,b(s) are random or completely orthogonal to eachother, no way to speed up the algorithm!

• If b(1), . . . ,b(s) are close to each other and share information• Galerkin-Projection based Krylov subspace methods are proposed

for an efficient solver.• We choose CG as the specific Krylov subspace method proposed

in [Chan and Wan, 1997], because it is well-supportedtheoretically.

Efficient Iterative Algorithm for Solving MultipleRHS

• If b(1), . . . ,b(s) are random or completely orthogonal to eachother, no way to speed up the algorithm!

• If b(1), . . . ,b(s) are close to each other and share information• Galerkin-Projection based Krylov subspace methods are proposed

for an efficient solver.• We choose CG as the specific Krylov subspace method proposed

in [Chan and Wan, 1997], because it is well-supportedtheoretically.

Basic Idea of Projected CG Algorithm for SolvingMRHS

• Step 1: Solve the seed system.• Select the seed;• Apply the CGLS algorithm for solving seed system and save the

conjugate directions for Step 2;

• Step 2: Solve the remaining system by projection• Use the conjugate directions from the seed systems to span the

Krylov subspace• Project the systems onto the Krylov subspace for an approximated

solution

• Step 3: Restart the seed and Refine the solution• A new seed might be selected from the remaining systems to

restart the procedure.• Refine the approximation, if the accuracy is insufficient.

solution

Adapt to Our Problem: Solving Each Subproblemfrom RLSMS, Scheme 1

• Scheme 1 (Project/Restart): Treat the ith subproblem at alliteration as a Multiple RHS with s = 2, i.e. at iteration k = 1, solvethe subproblem as seed, for iteration k > 1, solve it as projectionand refine it as necessary;

• Comment: Straight forward adaptation, but usually the accuracyof approximation remaining after projection is not enough; Needextra steps for further refinement;

• Scheme 1 (Project/Restart): Treat the ith subproblem at alliteration as a Multiple RHS with s = 2, i.e. at iteration k = 1, solvethe subproblem as seed, for iteration k > 1, solve it as projectionand refine it as necessary;

• Comment: Straight forward adaptation, but usually the accuracyof approximation remaining after projection is not enough; Needextra steps for further refinement;

• Scheme 2 (Project/Augment): Based on seed selection as inScheme 1, in the refinement stages of solving the remainingsystems, stores the newly generated conjugate directions withthose from solving seeds;

• Comment: The motivation of this change is again to improve theKrylov subspace. By updating the conjugate directions, we will beable to add in new information which is not from solving the seedsystem. Numerical tests indicate this scheme works best withlarge scale problems!

• Scheme 2 (Project/Augment): Based on seed selection as inScheme 1, in the refinement stages of solving the remainingsystems, stores the newly generated conjugate directions withthose from solving seeds;

• Comment: The motivation of this change is again to improve theKrylov subspace. By updating the conjugate directions, we will beable to add in new information which is not from solving the seedsystem. Numerical tests indicate this scheme works best withlarge scale problems!

Computational Cost Analysis

The computation cost of projected CGLS consists of two othercomponents: one is the cost for solving the seed system, which is thecost of the CGLS; the other is the cost for solving the remainingsystems, which includes the projection and the few more steps ofCGLS as needed:

CProjCG ≈ 2(K + 2ki + 2)mni + (6(K − 1)ki + 10ki + 1)ni + µ · CG.

1-D Phillips, Restoration Result

Figure: Restoration results using different methods for the sample of Phillipssize of 1024× 1, noise level 6%.

1-D Phillips, Global and Local Iteration Result

Figure: Maximum numbers of inner iterations against outer iteration count.

2-D Seismic Reconstruction, Phantom Size 64× 64

Figure: SNR (Without Splitting): 11.65 dB, SNR (CGLS/Zero): 11.68 dB,SNR (Project/Restart): 11.63 dB, SNR (Project/Augment): 11.60 dB.

2-D Seismic Reconstruction, Global and LocalIteration Result

Figure: Maximum numbers of inner iterations against outer iteration count.

Outline

5 References

Topic 2: Total-Variation Regularization

• Numerical Methods for Total Variation• Iteratively Reweighted Norm Algorithm• Lagged Diffusivity Fixed Point Iteration Algorithm

• Optimal Regularization Parameter Selection,• Trace Approximation• UPRE Approach

Iteratively Reweighted Norm Algorithm

Approximate L1 norm by weighted L2 norm, TV regularizationbecomes:

{12‖ Af − b ‖22 +

2‖W 1/2

r Df ‖22},

⇐⇒ minf

{12‖

(I 00 W 1/2

)((A√λD

)f −

))‖22

where Wr = diag(2((Dx f )2 + (Dy f )2)−12 ).

This minimization problem can be solved by CG solver.Ref: “B. Wohlberg, P. Rodrguez, An Iteratively Reweighted Norm Algorithmfor Minimization of Total Variation Functionals, IEEE Signal ProcessingLetters, 2007”

Lagged Diffusivity Fixed Point Iteration Algorithm

Approximate L1 norm by,∑ψ(f ) =

∑√( df

dx + β2)

Gradient = AT (Af − b) + λL(f )f ,

Approximate Hessian = AT A + λL(f ),

where L(f ) = 4xDT diag(ψ′(f ))D.Based on the Quasi-Newton idea, we solve for f iteratively by,

fk+1 = fk − (AT A + λL(fk ))−1Gradient ,

which is called as “Lagged Diffusivity Fixed Point Iteration”.Ref: “C.R. Vogel, M. E. Oman, Iterative Methods for Total Variation, SIAM J.on Sci. Comp, 17(1996)”

Image Deblurring Comparison using TV andTikhonov

Figure: Left: Original Clean Image, Right: Blurred and Noisy Image

Image Deblurring Comparison using TV andTikhonov

Figure: Left: TK based Deblurred Result, Right: TV Based DeblurredResult

Problem Description-Simple Example

Fundamental Idea of UPRE

Derived from the Mean Squared Error (MSE) of the predictive error,

1n||Pλ||2 =

1n||Afλ − Aftrue||2,

where fλ is the regularized solution,ftrue is the true solution.To avoid using ftrue, we define a functional which is the unbiasedestimator of the above, and name it UPRE (Unbiased Predictive RiskEstimator).

λopt = arg minλ

UPRE(λ)

UPRE for Tikhonov

Vogel in [1] proved that, because the solution for TikhonovRegularization is linearly dependent on the right hand side of (2), i.e.,

fTK = RTK(b + η),

where RTK = (AT A + λI)−1AT , we will have

UPRETikhonov(λ) = E(1n||Pλ||2)

=1n||rλ||2 +

ntrace(Aλ)− σ2,

where rλ = Afλ − b, Aλ = A(AT A + λI)−1AT , and σ is the variance ofthe noise.

UPRE for TV

• Difficulty 1: Nonquadratic penalty functional in TV term.• Hint: Linearization Approximation!• In Lagged Diffusivity approach,

fTV = RTV(b + η),

where RTV = (AT A + λ2 L(fk ))−1AT

the influence matrix,

Aλ = A(AT A + λ2 L(fk ))−1AT .

Aλ can be proved to be symmetric, which will give us the samederivation of the UPRE functional for TV.

UPRE for TV

• Difficulty 2: How to computetrace(Aλ) = trace(A(AT A + λ

2 L(fk ))−1AT )

• Hint: Krylov Space Method!• Basic Idea :

trace(f (M)) ' E(uT f (M)u),

where u is a discrete multivariate random variable, where eachentry takes the values -1 and +1 with probability 0.5, and thematrix M is symmetric positive definite (SPD).

UPRE for TV

• Golub [2] pointed out that,

E(uT f (M)u) =k∑

ωi f (θi),

where θi are the eigenvalues of M, and ωi are squares of the firstcomponents of the normalized eigenvectors of M.

• Start the Lanczos procedure on M, and find out the mostsignificant eigenvectors until accuracy is satisfied.

Trace Approximation Results

Figure: Left: Accurate Trace Vs. Estimated Trace, Right: Time Cost forComputing Accurate Trace Vs. Time Cost for Computing Estimated Trace

TV Based Deblurring Using Optimal and RandomParameters

Figure: Left: Use Optimal Parameter, Right: Use Random Parameter

More General Results for Deblurring

Figure: Plot of Optimal SNR and Estimated SNR versus original noisy SNRon satellite image.

More General Results for Denoising

Figure: Plot of Optimal SNR and Estimated SNR versus original noisy SNRon satellite image. Ref: “G. Gilboa, N. Sochen, Y. Y. Zeevi, Estimation ofoptimal PDE-based denoising in the SNR sense, IEEE Transactions onImage Processing 15(8), 2006.”

Conclusions

• Tikhonov Regularization:• Pros : easy to implement, cheap cost;• Cons : the result is usually smoothed out;

• Total Variation (TV) Regularization:• Pros : Preserve image edges provided with reasonable parameter,

piecewise constant• Cons : Expensive cost

Outline

5 References

Topic 3: Projected Krylov Subspace Solvers onGPU [Lin and Renaut, 2010]

• Problem Description and Krylov Subspace Solvers• 3-D Medical Image Reconstruction• Inverse Linear Systems with Multiple Right-hand Sides• Conjugate Gradient Algorithm

• Fine-Grained Parallelism Model for Krylov Subspace Solvers onGPU

• GPU Structure• BLAS Performance on GPU• CG on GPU (ver 0)

• Optimizations of Krylov Subspace Algorithms on GPU• Projected CG• Modified Projected CG

3-D Medical Image Reconstruction

Figure: Image Slices from a 3-D Scan

Inverse Linear Systems with Multiple Right-handSides

• Multiple Right-hand Sides System:

Ax = B = [b(1), . . .b(s)],

where bi stands for the i th measurement with respect to the i th slice asin a 3-D medical image.• Tikhonov regularization is applied to the above systems.• Regularization parameter is assumed to be close enough to theoptimal value.

Conjugate Gradient Algorithm

• k = 0, r0 = b− Ax0;while (||rk ||2 > TOL ∗ ||b||2)

k = k + 1;if (k == 1)

p1 = r0;elseβk =< rk−1, rk−1 > / < rk−2, rk−2 >;pk = rk−1 + βkpk−1;

end;αk =< rk−1, rk−1 > / < pk , Apk >;xk = xk−1 + αkpk ;rk = rk−1 − αkApk ;

end (while).

GPU Structure - NVIDIA Tesla C1060

Figure: NVIDIA Tesla C1060

GPU Structure - NVIDIA Tesla C1060

Streaming Multiprocessors (SM): 30Streaming Processors (SP): 8

Register: 16 K per SMShared Memory: 16 KB per SMConstant Memory: 64 KBGlobal Memory: 4 GBBandwidth: 102 GB/Sec

GFLOPS (PEAK)Single Precision: 933Double Precision: 78

Figure: Structure ofNVIDIA Tesla C1060

BLAS Performance on GPU - Matrix VectorMultiplication

• Testing EnvironmentOperating System: Ubuntu, 64 bitCPU (Host): Intel Quad Core Xeon E5506, 2.13 GHzGPU (Device): NVIDIA Tesla C1060Programming on Host: Matlab using Single / Multi Core(s)Programming on Device: CUDA + CUBLAS, Matlab + Jacket

• Testing matrices are selected from Matrix Market

Benchmark Prob. Sizebcsstm19 8171138 bus 1138bcsstm23 3134bcsstk16 4884s1rmt3m1 5489bcsstk18 15439

Figure: Time Cost Figure: Speedup Ratio

Figure: Relative Error

BLAS Performance on GPU - BLAS 1, BLAS 2 andBLAS 3

Figure: GFLOPS Comparison

• Comment: BLAS 3 outperforms BLAS 1 and BLAS 2 significantly.

CG on GPU (ver 0)

• A synthetic problem setup using Chan’s example [Chan and Wan,1997]

A = diag(1, . . . ,n), n = 10240. bi(t) = sin(t + i∆t), i = 1, . . . ,n,and ∆t = 2π/100.• TOL = 1.0× 10−5.• Results:

CG on Host: 31.90 sec on a single core or 17.53 sec on amulti-core.

CG on Device: 5.49 sec.

CG on GPU (ver 0) - A closer look

Total Time: 5.49 sec;

Mat-Vec: 3.98 sec(72.5%);

Dot-Prod: 0.57 sec(10.4%);

SAXPY: 0.16 sec(2.9%);

Comment: Mat-Vec is too expensive!

Projected CG (PrCG) - Algorithm

• By applying Lanczos-Galerkin projection method to CG, we have thePrCG [Chan and Wan, 1997].

i = 0;r(q,1)i = b(q) − Ax(q,1)

i ;for i = 1 to k

α(q,1)i =

< p(1)i , r(q,1)i >

< p(1)i , Ap(1)

x(q,1)i = x(q,1)

i + α(q,1)i p(1)

i ;r(q,1)i+1 = r(q,1)i − α(q,1)

i Ap(1)i ;

end (for);% Restart the system if needed;

Projected CG (PrCG)

Total Time: 1.91 sec;

Mat-Vec: 1.01 sec(52.8%);

Dot-Prod: 0.45 sec(23.6%);

SAXPY: 0.29 sec(15.2%);

Comment: SAXPY and Dot-Prod are becoming relatively expensive.

Augmented Projected CG (APrCG)

Using the orthogonality properties that < p(1)j , Ap(1)

i >= 0, j 6= i ,

and < p(1)j , r(q,1)i >= 0, j < i ,

take the dot product with p(1)i on both sides of

r(q,1)i+1 = b(q) −k∑

α(q,1)i Ap(1)

i , we can rewrite αi as,

α(q,1)i =

< p(1)i , b(q) >

< p(1)i , Ap(1)

Augmented Projected CG (APrCG) - Algorithm

• Let Pk = [p1, . . . ,pk ] and APk = [Ap1, . . . ,Apk ].

i = 0;r(q,1)i = b(q) − Ax(q,1)

i ;α =< Pk , r(q,1)0 > ./diagVec(< Pk , APk >);Λ = diag(α);x(q,1) = x(q,1)

0 + sum(Pk · Λ);r(q,1) = r(q,1)0 − sum(APk · Λ);

Augmented Projected CG (APrCG)

Figure: PrCG Figure: APrCG

Numerical Test / 3D Shepp-Logan Phantom

Figure: Slice Show of A 3D Shepp-Logan Phantom

3D Shepp-Logan Phantom - Results

Slice CG PrCG APrCG ImpRateCost SNR Cost SNR Cost SNR65 17.41 13.67 2.16 13.67 2.06 13.67 4.63%

66 16.34 13.80 2.74 13.83 2.58 13.83 5.84%

67 14.35 13.91 3.11 13.95 2.81 13.95 9.65%

68 13.03 13.80 3.53 13.83 3.46 13.83 1.98%

Table: Reconstruction of Four Consecutive Slices from 65 to 68. The 64th

slice is selected as seed.

3D Shepp-Logan Phantom - Results

Figure: Reconstruction Results on Host and Device,slice index = 66.

Outline

5 References

References

1 R. A. Renaut, Y. Lin, and H. Guo. “Solving Least SquaresProblems by Using a Regularized Multisplitting Method.”Numerical Linear Algebra with applications, vol. 19, issue. 4,2009.

2 Y. Lin, B. Wohlberg, and H. Guo. “UPRE method for total variationparameter selection.” Signal processing, vol. 90, no. 8, 2010.

Lecture 1: Numerical Issues from Inverse Problems ...mueller/graduate...Lecture 1: Numerical Issues...

Documents

Inverse problems and numerical methods in applications · Inverse problems and numerical methods in applications March 8{9 2012 Book of Abstracts Institut fur Werksto technik Universit

Sparsity regularization for inverse problems using curvelets€¦ · 1.1 Inverse problems, regularization and curvelets Inverse problems and regularization are widely used in di erent

Inverse Problems for Electrodiffusion

Inverse Problems and Applications

Numerical Inverse Helmholtz Problems with Interior Data · Kui Ren (UT Austin) Numerical Inverse Helmholtz Problems with Interior Data June 20, 2011 1 / 29. Presentation Outline 1

Bayesian Inverse Problems - Technische Universität München · Bayesian Inverse Problems Bayesian Inverse Problems Jonas Latz Technische Universitat M¨ unchen¨ Lehrstuhl f¨ur

Motivation: Why Inverse Problems? Lectures on Linear ...€¦ · Geilo Winter School { Inverse Problems { 1. Introduction 11 Inverse Problems are Ill-Posed Problems Hadamard’s de

bEngineering Inverse Problems in Science andnguyen/publi_web/... · Inverse Problems in Science and Engineering Vol. 20, No. 6, September 2012, 809–839 Novel numerical inversions

Motivation: Why Inverse Problems? Lectures on Linear ......Geilo Winter School { Inverse Problems { 1. Introduction 10 Inverse Problems ! Ill-Conditioned Problems Whenever we solve

Computational Geophysics and Data Analysis Linear Inverse Problems 1 Linear(-ized) Inverse Problems Linear inverse problems - Formulation - Some Linear

Analytical and Numerical Analysis of Inverse Optimization Problems

Spectral-ElementandAdjointMethodsinSeismologyauthors.library.caltech.edu/9615/1/TROccp08.pdfKey words: Spectral-element method, adjoint methods, seismology, inverse problems, numerical

THE AND PROBLEMS*...INVERSE EIGENVALUE PROBLEMS 635 differentiable functions at the points wherethey coalesce. Nonetheless, the behavior ofthe numerical methodsin these circumstances

Inverse moving source problems in electrodynamicslipeijun/paper/2019/HKLZ_IP... · 2019. 10. 30. · 1 Inverse Problems Inverse moving source problems in electrodynamics Guanghui

1 Numerical geometry of non-rigid shapes Shape reconstruction and inverse problems Shape reconstruction and inverse problems Lecture 9 © Alexander & Michael

Inverse problems in geophysics

Inverse Problems - Applications and Solution Strategies · 2014. 12. 10. · Inverse Problems - Applications and Solution Strategies Inverse Problems Determine causes for identi cation

Inverse problems: mathematical analysis and numerical ...haddar/Cours/M2UPMC/introPI.pdf · Introduction to the course \Inverse problems: mathematical analysis and numerical algorithms"

The L-curve and its use in the numerical treatment of inverse … · The L-curve and its use in the numerical treatment of inverse problems P. C. Hansen Department of Mathematical

Large-Scale Inverse Problems