Full waveform inversion with image-guided gradient...142 Y. Ma, D. Hale, Z. Meng & B. Gong. 1 INTRODUCTION With greater computing power, seismic full waveform inversion (FWI) (Tarantola,

CWP-651

Full waveform inversion with image-guided gradient

Yong Ma1, Dave Hale1, Zhaobo (Joe) Meng2 & Bin Gong21Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401, USA2Seismic Technology, ConocoPhillips Company, Houston, TX 77252, USA

Figure 1. Change of data misfit functions vs. iterations in full waveform inversion and image-guided full waveform inversion.

ABSTRACTThe objective of seismic full waveform inversion (FWI) is to estimate a modelof the subsurface that minimizes the difference between recorded seismic dataand synthetic data simulated in that model. Although FWI can yield accurateand high-resolution models, multiple problems have prevented widespread ap-plication of this technique in practice. First, FWI is computationally intensive,in part because it typically requires many iterations of costly gradient-descentcalculations to converge to a solution model. Second, FWI often converges tospurious local minima in the data misfit function of the difference betweenrecorded and synthetic data. Third, FWI is an underdetermined inverse prob-lem with many solutions, most of which may make no geological sense. Theseproblems are related to a typically large number of model parameters and tothe absence of low frequencies in recorded data.FWI with an image-guided gradient mitigates these problems by reducing thenumber of parameters in the subsurface model. We represent the subsurfacemodel with a sparse set of values, and from these values, we use image-guidedinterpolation (IGI) to compute finely- and uniformly-sampled gradients of thedata misfit function in FWI. Because the interpolation is guided by seismicimages, gradients computed in this way conform to geologic structures andsubsequently yield models that also agree with subsurface structures. Becauseof sparse parametrization in the model space, IGI creates models that are moreblocky than finely-sampled models, and this blockiness from the model spacemitigates the absence of low frequencies in recorded data. A smaller number ofparameters to invert also reduces the number of iterations required to convergeto a solution model. Tests with a synthetic model and data demonstrate theseimprovements.

Key words: waveform inversion, image-guided

142 Y. Ma, D. Hale, Z. Meng & B. Gong

1 INTRODUCTION

With greater computing power, seismic full waveforminversion (FWI) (Tarantola, 1984; Pratt et al., 1998;Pratt, 1999; Symes, 2008) has become an increasinglypractical tool for estimating subsurface parameters,which is the ultimate goal in exploration seismology.FWI iteratively updates an estimated subsurface modeland computes corresponding synthetic data to reducethe difference (the data misfit) between the syntheticand recorded data. The FWI technique is attractive inits capability to estimate a subsurface model with gener-ally higher resolution (Operto et al., 2004) than travel-time tomography (Stork, 1992; Woodward, 1992; Vasco& Majer, 1993; Zelt & Barton, 1998) and migration ve-locity analysis (MVA) (Yilmaz & Chambers, 1984; Sava& Biondi, 2004a,b). In practice, a macromodel gener-ated by traveltime tomography or MVA may serve as astarting model for FWI.

Although FWI has a long history and definite ben-efits, two obstacles have prevented its widespread appli-cation in exploration seismology. One obstacle is com-putational cost. FWI requires a huge amount of sim-ulations and reconstructions of seismic wavefields, andits computational cost is proportional to the number ofsources or the number of shots. For large 3D modelsand seismic data sets, these computations may be pro-hibitive. Therefore, various efforts from different per-spectives have been expended to reduce the computa-tional cost. One such method is to apply phase-encodingtechniques (Krebs et al., 2009) that combine all shotstogether to form a simultaneous source. The computa-tional cost of FWI using encoding techniques is therebyreduced by a factor roughly equal to the number ofshots.

FWI also requires multiple iterations of gradientdescent to minimize the data misfit (see Figure 1), andthe computational cost is therefore proportional to thenumber of required iterations. To reduce this number,one may reduce the number of model parameters ofthe subsurface. To reduce the number of parameters,one can represent a finely-sampled model using a sparseset of parameters and some basis functions. Many dif-ferent compression methods employed for this purpose,such as Fourier transform, wavelet transform, curvelettransform, etc., share the same principle of project-ing a model into another sparse domain. Through thissparse representation, one discards unwanted or unre-solvable details that could be present in a more completemodel. The wavelet transform is a representative tech-nique used in inverse problems (Meng & Scales, 1996).However, such methods do not account for geologicalstructures of the subsurface that may be apparent inseismic images and so may yield models that are geo-logically unreasonable.

A second obstacle is that the inverse problem posedby FWI has no unique solution. Many different mod-els may yield synthetic data that match recorded data

within a reasonable tolerance that accounts for uncer-tainties and inadequacies in both recorded data and thetheory underlying computed synthetic data. In partic-ular, low-wavenumber components of models are oftenpoorly recovered by FWI because corresponding low-frequency content in data is rarely recorded. In practice,it can be difficult to obtain an adequate initial modelthat is consistent with unrecorded low frequencies. Thisfact and the nonlinear relationship between model anddata in FWI lead to cycle-skipping and local minima,which correspond to models that poorly approximatethe subsurface.

To mitigate such problems, multiscale approaches(Bunks, 1995; Sirgue & Pratt, 2004; Boonyasiriwatet al., 2009) have been proposed. These methods recur-sively add higher-frequency details to models first com-puted from lower-frequency data. The fidelity of multi-scale techniques depends fundamentally on the fidelityof low-frequency content in recorded data. In practice,the low frequencies required to bootstrap a multiscaleFWI technique may be unavailable. Other methods foraddressing the problems of cycle-skipping and local min-ima have been proposed as well. These include mini-mizing data misfit functions in logarithmic and Laplacedomains (Shin & Min, 2006; Shin & Ha, 2008).

To obtain better subsurface models, a priori infor-mation may be useful. The a priori knowledge can takedifferent forms. For example, both geological and geo-physical data, such as those obtained from boreholes,may provide useful a priori constraints. Other usefulconstraints may be specified shapes and orientations ofgeologic structures in the subsurface.

Inspired by image-guided interpolation (IGI) (Hale,2009a), we have proposed the use of structure-orientatedmetric tensor fields to constrain FWI gradients (Meng,2009). We have first presented this idea at the 2009SEG post-convention workshop (Meng et al., 2009). Inthis paper, we show how IGI and its adjoint may beused to calculate and guide gradients, with structuralinformation derived from seismic images as the a prioriconstraints. We first review basic concepts of FWI andillustrate some practical problems with a synthetic ex-ample. We then construct image-guided FWI by incor-porating the image-guided interpolation and its adjointto constrain the calculations of image-guided gradients.Subsurface models computed from these image-guidedgradients conform to geologic structures apparent in theseismic images. Synthetic results further demonstratethe effectiveness of image-guided FWI in reducing thenumber of iterations required for convergence of FWI(see Figure 1).

2 FULL WAVEFORM INVERSION

Full waveform inversion (Tarantola, 2005) uses recordedseismic data d to estimate parameters of a subsur-face model m, given a forward operator F that syn-

Image-guided FWI 143

thesizes data. In FWI, we seek a model m that min-imizes the difference d − F (m). In seismic inversion,as for most geophysical inversion problems, the forwarddata-synthesizing operator F is a non-linear function ofmodel parameters, such as seismic wave velocities.

2.1 FWI as an optimization problem

Unfortunately, the forward operator F has no inverseF−1 for almost any geophysical inverse problem, so wecannot simply invert the model from the data usingm = F−1 (d). Therefore, FWI is usually formulated asa least-squares optimization problem, in which we com-pute a model m that minimizes the data misfit function

E (m) =1

2‖d− F (m) ‖2 , (1)

where ‖.‖ denotes an L2 norm. All information inrecorded seismic waveforms should, in principle, betaken into account in the data misfit function. There-fore, FWI comprehensively minimizes the differencein traveltimes, amplitudes, converted waves, multiples,etc. between recorded and synthetic data. This all-or-nothing approach distinguishes FWI from other meth-ods, such as traveltime tomography, which only focuseson traveltime differences. Monte Carlo (random) meth-ods (Nocedal & Wright, 2000; Tarantola, 2005) test ran-domly generated models to find one that minimizes thedata misfit function E (m). However, the typically largenumber of model parameters makes such Monte Carlomethods impractical.

The gradient descent method is a more practicalalternative to a random search. We begin with an ini-tial model m0, which can be found using other inversionmethods (e.g., traveltime tomography or migration ve-locity analysis); then we use the gradient of the datamisfit function g ≡ ∇mE = ∂E∂m evaluated at m0 tosearch locally for a model m = m0 + δm that reducesthe data misfit E (m).

The Taylor series expansion of equation 1 about theinitial model is

E (m0 + δm) = E (m0) + δmT g0

+1

2δmT H0δm + ... , (2)

where E (m0) denotes the data misfit evaluated at m0,g0 = g (m0), and H0 denotes the Hessian matrix com-prised of the 2nd partial derivatives of E (m), againevaluated at m0. If we ignore any term higher than the2nd order in equation 2, this Taylor approximation isquadratic in the model perturbation δm, and we canminimize the data misfit E (m) by solving a set of lin-ear equations:

H0δm = −g0 (3)

with a solution

δm = −H−10 g0 . (4)

In Newton’s method for minimization of the datamisfit E (m), we begin with the initial model m0 andsolve iteratively for

δmi = −H−1i gi , (5)

and

mi+1 = mi −H−1i gi , (6)

where gi ≡ g (mi), and Hi is the Hessian matrix forthe model mi. If we neglect nonlinearity (e.g., multi-ple scattering) in the forward operator F, we obtain aGauss-Newton method (Pratt et al., 1998). However, inpractice, the large size of the Hessian matrix Hi, whichdepends on the number of parameters in the model, pre-vents the application of Newton-like methods.

Alternatively, the model update in equation 6 canbe iteratively approximated by replacing the inverse ofthe Hessian matrix with a scalar step length αi:

mi+1 = mi − αihi , (7)

where the search direction hi is determined by conjugategradients (Vigh & Starr, 2008; Gong et al., 2008):

h0 = g0 ,

βi =gTi

`gi − gi−1

´gTi−1gi−1

,

hi = gi + βihi−1 . (8)

In each iteration, we compute the step length αi usinga quadratic line search algorithm (Nocedal & Wright,2000)

2.2 Implementation of FWI

A gradient-descent implementation of FWI consists offour steps performed iteratively, beginning with an ini-tial model m0:

(i) Compute d − F (mi), the difference betweenrecorded data d and synthetic data F (mi) computedfor the current model mi;

(ii) Compute the gradient gi = ∇mEi;(iii) Search for a step length αi in the conjugate di-

rection hi;(iv) Compute the updated model mi+1 using equa-

tion 7.

Most of the computational cost in this implementationlies in steps (ii) and (iii).

This version of FWI can be implemented both inthe time domain (Tarantola, 1984, 1986; Mora, 1989)and in the frequency domain (Pratt, 1999). Perhaps, thegreatest benefit of using frequency domain FWI is thatwe can select only a few frequencies for inversion (Sir-gue & Pratt, 2004). Unfortunately, this advantage doesnot extend to inversion for deep subsurface models thatrequire more frequencies. Because the gradient calcula-tion for full waveform inversion is similar to the process


of reverse time migration (RTM)(Tarantola & Valette,1982; Pratt, 1999), a straightforward approach is to per-form FWI using an RTM engine. Vigh & Starr (2008)note that the advantages of implementing FWI in thetime domain include increased parallelism and reducedmemory requirements, thereby making FWI more appli-cable to large 3D models and data sets. In the examplesshown in this report, we used RTM and implementedFWI in the time domain.

2.3 Synthetic example

Figure 2a depicts a subsurface velocity model with twoanomalies. One is a low-velocity zone and the other is ahigh-velocity bar, as shown separately in Figure 2c. Werefer to the model in Figure 2a as the true model m.Figure 2b displays the initial model m0 that we usedin FWI; it is simply the true model m without the twoanomalies.

To test FWI, we first create data d = F (m)using the true model m. Henceforth, for consistencywith the discussion above, we refer to these data asthe “recorded” data, even though we compute thesenoise-free data using the forward operator F, a finite-difference constant-density solution to the 2D acousticwave equation. A total of 25 shots are evenly distributedon the top surface with an interval of 120 m; the re-ceiver interval is 10 m. The source is a Ricker waveletwith a peak frequency of 15 Hz. For example, Figure 3ashows a common-shot gather for shot number 13 of therecorded data d. Figure 3b shows the corresponding syn-thetic data F (m0) computed for the initial model m0displayed in Figure 2b. Figure 3c displays the differenced − F (m0), which is also known as the data residual,that part of the recorded data that cannot be explainedby the current model. In the four steps of FWI, compu-tation of this data residual is step (i).

In step (ii), we compute the gradient of the datamisfit. As discussed by (Tarantola & Valette, 1982;Pratt, 1999), this gradient is equal to the output of RTMapplied to the data residual shown in Figure 3c, usingthe current model m0 shown in Figure 2b. This methodfor the calculation of gradient is also referred to as theadjoint-state method (Tromp et al., 2005). Figure 4ashows the gradient g0 computed in this way for the firstiteration of FWI.

In step (iii), we then compute a step length α0 thatdetermines how much to change our velocity model inthis first iteration. We compute the step length usinga quadratic line search algorithm and search in a direc-tion defined by conjugate gradients (Vigh & Starr, 2008;Gong et al., 2008). This line search requires computa-tion of at least 2 synthetic data sets.

Finally, in step (iv), we update the current velocitymodel according to equation 7. Figure 5a is the changeδm in velocity computed in the 1st iteration; in this1st iteration, this change is simply a scaled version of

(a)

(b)

(c)

Figure 2. (a) The LVZ model courtesy of ConocoPhillips,(b) the initial velocity model, and (c) velocity anomalies (onelow velocity zone and one high velocity bar) created by sub-tracting the initial model in (b) from the true model in (a).


(a)

(b)

(c)

Figure 3. (a) The common-shot gather of shot number 13in the recorded data set, (b) the corresponding syntheticcommon-shot gather simulated in the initial velocity model(Figure 2b), and (c) the data residual for this shot.

(a)

(b)

(c)

Figure 4. Gradient of the data misfit function in (a) the 1stiteration, (b) the 2nd iteration and (c) the 5th iteration.


the gradient computed in step (ii). In subsequent itera-tions, the iterative four-step FWI process introduces ad-ditional details, as indicated by the gradients displayedin Figure 4b and c, which correspond to the 2nd and 5thiterations, respectively. Figure 5b and c show the corre-sponding accumulated velocity updates, the differencebetween the current and initial velocity models.

After the 1st iteration, the data residual corre-sponding to shot number 13, as shown in Figure 6a,becomes significantly smaller than that in Figure 3c.However, in subsequent iterations, the data residualsshown in Figure 6b and c increase.

In principle, each iteration of FWI should reducethe data misfit E (m), but in the search for a steplength αi, FWI risks producing unsatisfactory modelswith larger data residuals. Figure 1 plots the data misfitfunction E (m) as a function of the number of iteration.For example, the data residual after the 2nd iterationof FWI is even larger than the residual of the 1st it-eration; a similar case occurs in the 4th iteration. Thisup-and-down relationship between E (m) and the itera-tion number has two main causes. First, FWI sometimesfails to find a step length αi that decreases the data mis-fit function E (m), within a limited number (e.g., 5 inthis paper) of gradient descent trials. We cannot simplystop FWI, and to continue FWI, we must provide a steplength and hope FWI can reduce the data misfit func-tion in subsequent iterations. FWI, in fact, reduces thedata residual in the 3rd iteration, but we encounter an-other increase of the data residual in the 4th iteration.Second, we use the conjugate direction hi instead ofthe gradient direction gi, which guarantees the descentof the data misfit function. In contrast, the conjugatedirection may temporally increase the data residual.

Another problem noted in FWI is that, as shown inFigure 5, the accumulated velocity updates produced byFWI contain the imprint of the seismic wavelet; theseupdates look more like migrated images rather than anyreasonable perturbations to our initial velocity model.Because we use a Ricker wavelet with a peak frequencyof 15 Hz, which lacks low frequencies, local-minima andcycle-skipping problems may take place in the aboveconventional FWI example.

3 IMAGE-GUIDED FWI

Conjugate-gradient methods are guaranteed to mini-mize positive-definite quadratic misfit functions withinM iterations, where M is the number of model parame-ters in the solution vector m (Nocedal & Wright, 2000).More precisely, the convergence rate of a conjugate-gradient method depends on the condition number ofthe Hessian matrix H (Cohen, 1972; Wheeler & Wilton,1988). The condition number is the ratio of the largesteigenvalue of the Hessian matrix H to the smallesteigenvalue, and in practice, FWI is usually ill-posed due

(a)

(b)

(c)

Figure 5. Accumulated velocity updates after (a) 1 itera-tion, (b) 2 iterations and (c) 5 iterations.


(a)

(b)

(c)

Figure 6. Data residual after (a) 1 iteration, (b) 2 iterationsand (c) 5 iterations.

to a typically large condition number of the Hessian ma-trix. A large condition number often tends to appear,especially when an inverse problem has a large numberof model parameters in m, some of which do not causethe data misfit function E (m) to change significantly.If the data misfit function E (m) is insensitive to thechange of a model parameter in the solution vector m,the eigenvalue corresponding to this parameter is smalland may be nearly zero, thereby yielding a large condi-tion number. In this case, the gradient descent methodconverges slowly.

Conversely, if FWI only needs to invert a few modelparameters, to which the data misfit function is sensi-tive, we can reduce the condition number of the Hessianmatrix and thereby the number of required iterations.Pratt et al. (1998, Appendix A) discuss a point col-location scheme to reparameterize the model space mfor this purpose. In this section, our scheme is to useimage-guided interpolation (Hale, 2009a) to reduce thenumber of model parameters in the calculation of thegradient of the data misfit function. We then use thisimage-guided gradient in FWI.

3.1 Fewer model parameters

Similar to the point collocation scheme, subspace meth-ods (Kennett et al., 1988; Oldenburg et al., 1993) recon-struct the finely- and uniformly-sampled (dense) modelm from a sparse model s that contains a much smallernumber of model parameters than the dense model m:

m = Rs , (9)

where R denotes a linear operator that projects modelparameters from the sparse model to the dense model.

Differentiating both sides of equation 9, we have

δm = Rδs . (10)

Then, substituting equation 10 into equation 5, we canreformulate the inverse problem posed in equation 5,with respect to a smaller number of model parametersin the sparse model s, as

HiRδsi = −gi . (11)

However, we cannot solve equation 11 with a solutionlike δsi = − (HiR)−1 gi in the sparse domain s becauseequation 11 is overdetermined, i.e., there are more equa-tions than parameters. Alternatively, we obtain a solu-tion for equation 11 in the sparse domain s:

δsi = −“RT HiR

”−1RT gi , (12)

where RT is the adjoint operator of R. This adjoint op-erator projects model parameters from the dense modelm to the sparse model s.

Like equation 7, the model update δsi can be it-eratively approximated by replacing the inverse of the


projected Hessian matrix`RT HiR

´with a scalar step

length αi:

si+1 = si − αihsi , (13)

where the conjugate direction hsi is determined by

hs0 = RT g0 ,

βi =

`RT gi

´T `RT gi −RT gi−1

´`RT gi−1

´TRT gi−1

=gTi RR

T`gi − gi−1

´gTi−1RR

T gi−1,

hsi = RT gi + βih

si−1 . (14)

In equation 13, the step length can again be achievedwith a quadratic line-search method. Equation 14 dif-fers from equation 8 in that the gradient gi is replacedby RT gi, which implies that equation 13 provides a so-lution for the FWI problem in the sparse domain s.Because of fewer model parameters involved, the pro-jected Hessian matrix

`RT HiR

´can become better-

conditioned and thus equation 13 requires fewer iter-ations than equation 7 to converge to a solution models.

As noted in equation 9, we can apply the linearoperator R to both sides of equation 13 and therebyproject the sparse model update δsi to obtain the densemodel update δmi:

mi+1 = mi − αihmi , (15)

where we compute the search direction hmi by projectingthe sparse conjugate direction hsi to the dense domain:

hm0 = Rhs0 = RR

T g0 ,

βi =

`RT gi

´T `RT gi −RT gi−1

´`RT gi−1

´TRT gi−1

=gTi RR

T`gi − gi−1

´gTi−1RR

T gi−1,

hmi = RRT gi + βih

mi−1 . (16)

Equations 15 and 16 provide a solution for FWI inthe dense space m while taking the advantage of fewermodel parameters.

3.2 Choice of R

The projection operator R can take different forms,including Fourier transform, wavelet transform, cubicsplines, etc. Unfortunately, none of these forms accountsfor the geological information of the subsurface. In thispaper, we implement R with image-guided interpola-tion (IGI) (Hale, 2009a), which uses metric tensor fieldsto guide interpolation of a few sparsely scattered datapoints, making the interpolant conform to structuralfeatures in the gradient image.

3.2.1 Image-guided interpolation

The input of IGI is a set of scattered data, a set

F = {f1, f2, ..., fK}

of K known sample values fk ∈ R that correspond to aset

χ = {x1,x2, ...,xK}

of K known sample points xk ∈ Rn. Combining thesetwo sets forms a space (e.g., the sparse model s), inwhich F and χ denote sample values and coordinates,respectively. The result of the interpolation is a functionq (x) : Rn → R, such that q (xk) = fk. Here, the densemodel m consists of all interpolation points x and valuesq (x).

Image-guided interpolation is a two-step process:

R = QP , (17)

where P and Q denote nearest neighbor interpolationand blended neighbor interpolation, respectively. Wefollow the steps in Hale (2009a) to describe the detailsof P and Q:

(i) P: solve

∇t (x) ·D (x)∇t (x) = 1,x /∈ χ ;t (x) = 0,x ∈ χ (18)

fort (x): the minimum time from x to the nearestknown sample point xk, andp (x): the nearest neighbor interpolantcorresponding to fk, the value of the samplepoint xk nearest to the point x.

(ii) Q: for a specified constant e ≥ 2 (e.g., e = 4in this paper), solve

q (x)− 1e∇ · t2 (x)D (x)∇q (x) = p (x) (19)

for the blended neighbor interpolant q (x).

In equation 18, the metric tensor field D (x) (vanVliet & Verbeek, 1995; Fehmers & Höcker, 2003) repre-sents structural features of the subsurface, such as struc-tural orientation, coherence, and dimensionality, andtherefore the image-guided interpolation result makesgeological sense. In n dimensions, each metric tensorfield D is a symmetric positive-definite n × n matrix(Hale, 2009a). Here, the minimum time t (x) measuresa non-Euclidean distance between a sample point xkand a interpolation point x. By this measurement, wecan determine that a sample point xk is nearest to apoint x if the time t (x) to xk is less than that to anyother sample point.

Letting p and q denote vectors that contain all el-ements in p (x) and q (x), respectively, we can rewrite


(a) (b)

(c) (d)

Figure 7. (a) The original Marmousi model, (b) a decimated Marmousi model, with only 0.2% samples remaining, (c) themetric tensor fields illustrated by ellipses, and (d) a Marmousi model produced by image-guided interpolation.

equation 19 in a matrix-vector form:“I + BT DB

”q = p , (20)

where B corresponds to a finite-difference approxima-tion of the gradient operator (Hale, 2009b). Therefore,q = Qp, where

Q =“I + BT DB

”−1, (21)

and this inverse can be efficiently approximated byconjugate-gradient iterations because I+BT DB is sym-metric and positive-definite (SPD). Intuitively, the near-est neighbor interpolation operator P scatters values fkfrom sample points xk to the interpolation ponits x, andQ smooths the nearest neighbor interpolant p.

Figure 7 illustrates an example of image-guided in-terpolation with a Marmousi velocity model. This ex-ample demonstrates the power of IGI for reducing thenumber of model parameters. Figure 7a shows the orig-inal Marmousi model with 400×500 samples; Figure 7brepresents an undersampled Marmousi model, with only20× 25 (0.2%) samples remaining; ellipses in Figure 7cindicate the metric tensor field D (x) of the Marmousi

model; Figure 7d displays the image-guided interpola-tion result. With IGI, we can reconstruct the Marmousimodel in great detail from only a sparsely-sampledmodel. It is more practical to compute the metric tensorfield from migrated images.

3.2.2 Adjoint image-guided interpolation

Note that QT = Q, so we can configure the adjointimage-guided interpolation as

RT = PT QT = PT Q . (22)

The adjoint operator RT is again a two-step process:

(i) QT or Q: solve equation 19 again to smooththe input image;

(ii) PT : solve equation 18 for t (x) and gatherinformation from the interpolation points x tothe sample points xk.


3.3 Synthetic example of image-guided FWI

Because we choose image-guided interpolation as theoperator to link the dense model m and the sparsemodel s, we refer to the gradient RRT gi in equation 16as the image-guided gradient. We also refer to imple-mentation of FWI using the image-guided gradient asimage-guided FWI, which again consists of four stepsperformed iteratively, beginning with an initial modelm0:

(i) Compute the data difference d− F (mi);(ii) Compute the gradient gi and the image-guided

gradient RRT gi;(iii) Search for a step length αi in the conjugate di-

rection hmi ;(iv) Compute the updated model mi+1 using equa-

tion 15.

Compared with the four steps of conventional FWI, theonly significant difference is the calculation of an image-guided gradient in step (ii). To illustrate the feasibil-ity of image-guided FWI, we test this technique usingthe previous model with the same experimental settingsand compare the image-guided FWI results with con-ventional FWI results.

In step (i), we start with the same initial model m0displayed in Figure 2b, and so we obtain the same dataresidual d− F (m0) displayed in Figure 3c.

In step (ii), we first compute the gradient of thedata misfit function just like step (ii) in the conven-tional FWI, and thereby obtain a gradient displayed inFigure 4a that corresponds to the data residual shownin Figure 3c and the current model m0 shown in Fig-ure 2b, respectively. We then compute the image-guidedgradient. To obtain this gradient, one must compute themetric tensor field D (x) that corresponds to the originalgradient g0 of the data misfit function E (m). Becauseof the structural coincidence between the migrated im-age and the gradient, we can obtain the metric tensorfield D (x) from the migrated image. Figure 8a displaysellipses which correspond to the structural orientationof the subsurface over the migrated image. One alsoneeds to choose several sample points, as depicted byred dots in Figure 8b. In this example, we only select 6samples, two of which are located in the middle of thereflectivities. Figure 8c shows the image-guided gradi-ent RRT g0 computed in this way for the 1st iterationof image-guided FWI.

In step (iii), we use the same quadratic line-searchalgorithm to compute a step length α0. The search di-rection is determined by conjugate gradients in equa-tion 16.

Finally, in step (iv), we update the current velocitymodel according to equation 15. Figure 9a is the changeδm in the velocity model computed in the 1st iterationof image-guided FWI; this change is simply a scaled ver-sion of the image-guided gradient in step (ii). Figure 10adepicts the data residual of shot number 13 after the 1st

(a)

(b)

(c)

Figure 8. (a) The metric tensor field and (b) selected samplelocations overlaid on the migrated image. (c) Image-guidedgradient RRT g0.


iteration; this data residual starts next iteration in step(i).

On the one hand, image-guided FWI with theimage-guided gradient (shown in Figure 8c), can re-cover, even in the 1st iteration, most velocity anoma-lies, as indicated by Figure 9a. On the other hand, acomparison between the data residual shown in Fig-ure 10a and the data residual shown in Figure 6a in-dicates that the 1st iteration of image-guided FWI doesnot reduce the data misfit as significantly as the con-ventional FWI does. This is because the image-guidedgradient RRT g0 employed in image-guided FWI cannotclearly depict the boundaries of the velocity anomaliesdue to the smoothing process Q embedded in the secondstep of the image-guided interpolation R.

We solve this problem that is apparent in the 1stiteration of image-guided FWI by running several iter-ations of conventional FWI to enhance the boundariesof velocity anomalies. Figure 9b and c are accumulatedvelocity updates after the 2nd and 5th iterations, re-spectively. With enhanced boundaries, the data misfitcorresponding to shot number 13 significantly decreases,as shown in Figure 10b and c.

4 DISCUSSION

The synthetic example demonstrates the process ofimage-guided FWI, which only changes one step in thefour-step implementation of conventional FWI. Usingan image-guided gradient, image-guided FWI speeds upthe convergence of FWI.

4.1 Limitation of line search

We used a quadratic line-search method in this paper toseek a scalar step length that determines how much thevelocity model can update. An ideal situation for thisquadratic line search would be that it only requires 2attempts of gradient descent to calculate a step lengththat decreases the data misfit function. Unfortunately,in many cases, even after many attempts of gradientdescent, FWI cannot find a step length to decrease thedata misfit function. Because each gradient descent re-quires a simulation of seismic wavefields of all sourcesin a full model space, the line-search approach is quiteexpensive. Figure 1 clearly indicates the failure of theconventional FWI in searching for a proper step lengthin the 2nd and 4th iterations, within 5 trials of gradientdescent.

Although more sophisticated line-search methodsmay help mitigate the limitations of the quadratic linesearch, we offer the option of image-guided FWI to avoidthe same limitations, as indicated by the change of thedata misfit function in Figure 1. Image-guided FWI suc-cessfully finds a step length to decrease the data misfit

(a)

(b)

(c)

Figure 9. Accumulated velocity updates after (a) 1 iter-ation, (b) 2 iterations and (c) 5 iterations. In (a)-(c), theimage-guided gradient is only used in the first iteration.


(a)

(b)

(c)

Figure 10. Data residual after (a) 1 iteration, (b) 2 itera-tions and (c) 5 iterations. In (a)-(c), the image-guided gra-dient is only used in the first iteration.

(a)

(b)

(c)

Figure 11. Migrated images with (a) the initial model, (b)the FWI model after 5 iterations, and (c) the image-guidedFWI model after 5 iterations. Two red lines in each figuresindicate the correct depth of reflectors.


function in the first 10 iterations, with 5 attempts ofgradient descent.

4.2 Low frequencies

As mentioned before, the absence of low frequenciesin data is one of the major reasons that causes lo-cal minima and cycle-skipping, and thereby preventsFWI from converging to a correct model. Multiscale ap-proaches are proposed to solve the problem by gradu-ally adding high-frequency details to inversion resultsobtained from low-frequency data. Although those mul-tiscale approaches often start from impractically low fre-quencies, a question remains. Do low frequencies in datareally help? As noted earlier, the velocity updated byFWI maintains imprints of the seismic wavelet. For thisreason, even though one can take advantage of low fre-quencies in data, wavelet imprints remain and counter-act the velocity updates. Migrated images can explainthis counteraction.

Figure 11 compares migrated images with the ini-tial model shown in Figure 2b, the updated model withchanges shown in Figure 5c, and the updated modelwith changes shown in Figure 9c, respectively. Becauseof velocity anomalies, deeper reflectors in Figure 11a donot locate at the correct depth; these deeper reflectors inFigure 11b appear at almost the same position as in Fig-ure 11a. This implies that the velocity updated by con-ventional FWI cannot correct the traveltime mismatchin the data set. One reason for this is the wavelet im-print that appears in the velocity updates shown in Fig-ure 5. Only the migrated image, with the image-guidedFWI model, places these deeper reflectors at the correctdepth, as indicated by Figure 11c.

5 CONCLUSIONS

We have proposed image-guided FWI for speeding upthe convergence and mitigating the absence of low fre-quencies. In contrast to multiscale approaches that takeadvantage of unliable low frequencies in the data space,our method reduces the number of model parametersand yields low frequencies in the model space by com-puting the image-guided gradient with image-guidedinterpolation and its adjoint. The synthetic exampleshown in this paper illustrates that image-guided FWIimproves both inversion speed and quality without ap-pending significant additional cost. Because the struc-tural features of the subsurface are taken into consid-eration, models updated by image-guided FWI makegood geological sense. Further investigation on criteriaof selecting sample points is needed for image-guidedinterpolation.

6 ACKNOWLEDGMENT

This work was done in part during Yong Ma’s 2009 sum-mer internship with ConocoPhillips; Yong Ma wants tothank Leming Qu from Boise State University for manythoughtful discussions during the internship. This workis partially sponsored by the research agreement be-tween ConocoPhillips Company and Colorado Schoolof Mines (SST-20090254-SRA). Special thanks to DianeWitters for polishing this manuscript.

REFERENCES

Boonyasiriwat, C., Valasek, P., Routh, P., Cao, W.,Schuster, G. T., and Macy, B., 2009, An efficient mul-tiscale method for time-domain waveform tomogra-phy: Geophysics, 74, no. 6, WCC59–WCC68.

Bunks, C., 1995, Multiscale seismic waveform inver-sion: Geophysics, 60, no. 5, 1457.

Cohen, A. I., 1972, Rate of convergence of several con-jugate gradient algorithms: SIAM Journal on Numer-ical Analysis, 9, no. 2, 248–259.

Fehmers, G. C., and Höcker, C. F. W., 2003, Fast struc-tural interpretation with structure-oriented filtering:Geophysics, 68, no. 4, 1286–1293.

Gong, B., Chen, G., Yingst, D., and Bloor, R., 2008, 3Dwaveform inversion based on reverse time migrationengine: SEG Technical Program Expanded Abstracts,27, no. 1, 1900–1903.

Hale, D., 2009a, Image-guided blended neighbor inter-polation of scattered data: SEG Technical ProgramExpanded Abstracts, 28, no. 1, 1127–1131.

——– 2009b, Structure-oriented smoothing and sem-blance: CWP Report, 635, no. 635.

Kennett, B., Sambridge, M., and Williamson, P., 1988,Subspace methods for large inverse problems withmultiple parameter classes: Geophysical Journal, 94,no. 2, 237–247.

Krebs, J. R., Anderson, J. E., Hinkley, D., Neelamani,R., Lee, S., Baumstein, A., and Lacasse, M.-D., 2009,Fast full-wavefield seismic inversion using encodedsources: Geophysics, 74, no. 6, WCC177–WCC188.

Meng, Z., and Scales, J. A., 1996, 2D tomography inmulti-resolution analysis model space: SEG TechnicalProgram Expanded Abstracts, 15, no. 1, 1126–1129.

Meng, Z., Qu, L., Ma, Y., and Hale, D., 2009, Dipguided full waveform inversion for high resolution ve-locity:, pages 2009 SEG Post–convention Workshop inW8: Full wave–equation methods for complex imag-ing challenges.

Meng, Z., 2009, Dip guided full waveform inversion:US Patent pending, pages 41279–USPRO, Cono-coPhillips.

Mora, P., 1989, Inversion = migration + tomography:Geophysics, 54, no. 12, 1575.

Nocedal, J., and Wright, S. J., 2000, Numerical opti-mization: , Springer.


Oldenburg, D., McGillvray, P., and Ellis, R., 1993,Generalized subspace methods for large-scale inverseproblems: Geophysical journal international, 114, no.1, 12–20.

Operto, S., Ravaut, C., Improta, L., Virieux, J.,Herrero, A., and Dell’Aversana, P., 2004, Quanti-tative imaging of complex structures from densewide-aperture seismic data by multiscale traveltimeand waveform inversions: a case study: GeophysicalProspecting, 52, no. 6, 625–651.

Pratt, R., Shin, C., and Hicks, G., 1998, Gauss-Newtonand full newton methods in frequency-space seis-mic waveform inversion: Geophysical Journal Inter-national, 133, no. 2, 341–362.

Pratt, R. G., 1999, Seismic waveform inversion in thefrequency domain, part 1: Theory and verification ina physical scale model: Geophysics, 64, no. 3, 888.

Sava, P., and Biondi, B., 2004a, Wave-equation mi-gration velocity analysis. i. theory: GeophysicalProspecting, 52, no. 6, 593–606.

——– 2004b, Wave-equation migration velocity anal-ysis. II. subsalt imaging examples: GeophysicalProspecting, 52, no. 6, 607–623.

Shin, C., and Ha, W., 2008, A comparison between thebehavior of objective functions for waveform inversionin the frequency and laplace domains: Geophysics, 73,no. 5, VE119–VE133.

Shin, C., and Min, D., 2006, Waveform inversion usinga logarithmic wavefield: Geophysics, 71, no. 3, R31–R42.

Sirgue, L., and Pratt, R. G., 2004, Efficient waveforminversion and imaging: A strategy for selecting tem-poral frequencies: Geophysics, 69, no. 1, 231.

Stork, C., 1992, Reflection tomography in the postmi-grated domain: Geophysics, 57, no. 5, 680–692.

Symes, W. W., 2008, Migration velocity analysis andwaveform inversion: Geophysical Prospecting, 56, no.6, 765–790.

Tarantola, A., and Valette, B., 1982, Generalized non-linear inverse problems solved using the least-squarescriterion: Reviews of Geophysics, 20, no. 2, 219–232.

Tarantola, A., 1984, Inversion of seismic-reflection datain the acoustic approximation: Geophysics, 49, no. 8,1259–1266.

Tarantola, A., 1986, A strategy for nonlinear elasticinversion of seismic reflection data: Geophysics, 51,no. 10, 1893—1903.

Tarantola, A., 2005, Inverse problem theory and meth-ods for model parameter estimation: , Society for In-dustrial and Applied Mathematics.

Tromp, J., Tape, C., and Liu, Q., 2005, Seismic tomog-raphy, adjoint methods, time reversal and banana-doughnut kernels: Geophysical Journal International,160, no. 1, 195–216.

van Vliet, L. J., and Verbeek, P. W., 1995, Estima-tors for orientation and anisotropy in digitized im-ages: Proceeding of the First Annual Conference of

the Advanced School for Computing and Imaging,pages 442–450.

Vasco, D., and Majer, E., 1993, Wavepath travel-timetomography: Geophysical Journal International, 115,no. 3, 1055–1069.

Vigh, D., and Starr, E., 2008, 3D prestack plane-wave, full-waveform inversion: Geophysics, 73, no. 5,VE135–VE144.

Wheeler, J., and Wilton, D., 1988, Comparison of con-vergence rates of the conjugate gradient method ap-plied to various integral equation formulations: Com-parison of convergence rates of the conjugate gradientmethod applied to various integral equation formu-lations:, Antennas and Propagation Society Interna-tional Symposium, 229–232.

Woodward, M. J., 1992, Wave-equation tomography:Geophysics, 57, no. 1, 15–26.

Yilmaz, O., and Chambers, R., 1984, Migration veloc-ity analysis by wave-field extrapolation: Geophysics,49, no. 10, 1664–1674.

Zelt, C. A., and Barton, P. J., 1998, Three-dimensionalseismic refraction tomography: A comparison of twomethods applied to data from the faeroe basin: J.Geophys. Res., 103, no. B4, 7187–7210.

Documents

Full waveform inversion with image-guided gradient...142 Y. Ma, D. Hale, Z. Meng & B. Gong. 1 INTRODUCTION With greater computing power, seismic full waveform inversion (FWI) (Tarantola,