Support vector regression for estimating earthquake response spectra

Bull Earthquake Eng (2012) 10:1205–1219DOI 10.1007/s10518-012-9350-2

ORIGINAL RESEARCH PAPER

Support vector regression for estimating earthquakeresponse spectra

Jale Tezcan · Qiang Cheng

Received: 11 December 2011 / Accepted: 22 March 2012 / Published online: 7 April 2012© Springer Science+Business Media B.V. 2012

Abstract This study uses support vector regression (SVR), a supervised machine learningalgorithm, to model the average horizontal response spectrum as a nonparametric functionof a set of predictor ground motion variables. Traditional ground motion prediction equa-tions (GMPEs) are derived using parametric regression, where a fixed functional form isselected, and the model coefficients are determined by minimizing the errors on the trainingset. The SVR model is nonparametric; there is no need to assume a fixed functional form.Using nonlinear basis functions, the data points are mapped into a high dimensional featurespace, where nonlinear input-output relationships can be expressed as a linear combinationof nonlinear functions, using a subset of the data points. The combination weights are deter-mined by minimizing the generalization error, using a formal mechanism to characterize thetrade-off between the model complexity and the quality of fit to the data. Minimization of thegeneralization error instead of the fitting error leads to better generation to unseen data, andthus reduces the risk of over-fitting for a given number of data points. The SVR model is notbased on a specific probability distribution, and is readily applicable to non-Gaussian data.An example application is presented for vertical strike-slip earthquakes, and the predictionsfrom the SVR model are compared to the recently developed GMPEs. The results demon-strate the validity of the proposed model, and suggest that it can be used as an alternative tothe conventional ground motion prediction models.

Keywords Spectral acceleration · Ground motion prediction ·Support vector regression · Nonparametric regression · Generalization error

J. Tezcan (B)Department of Civil and Environmental Engineering, Southern Illinois University Carbondale,1230 Lincoln Dr. MC 6603, Carbondale, IL 62901, USAe-mail: [email protected]

Q. ChengDepartment of Computer Science, Southern Illinois University Carbondale,1000 Faner Dr., Carbondale, IL 62901, USAe-mail: [email protected]

123

1206 Bull Earthquake Eng (2012) 10:1205–1219

1 Introduction

A critical step in the design of earthquake resistant structures is the description of the seismichazard. Ground motion prediction equations (GMPEs) play an important role in the char-acterization of seismic hazard and in the development of the seismic hazard maps used inthe building codes. Recently, as part of the NGA project, five GMPEs have been developedfor the shallow crustal earthquakes in the Western United States (WUS). These models areAS08 (Abrahamson and Silva 2008); BA08 (Boore and Atkinson 2008); CB08 (Campbelland Bozorgnia 2008); CY08 (Chiou and Youngs 2008) and I08 (Idriss 2008). Although thesemodels were intended for the WUS, they were also found to be applicable to Europe (e.g.Stafford et al. 2008).

While the NGA models represent the state-of-the-art in ground motion modeling, they mayproduce inconsistent and unrealistic estimates especially for low probability events (Musson2009). The differences in the selected set of records, and the precise form of the adoptedfunction contribute to the differences in the predictions of the NGA models (e.g. Abrahamsonet al. 2008; Peruš and Fajfar 1997; Musson 2009).

Selection of a proper functional form is not a straightforward task. It not only requiresproper identification and inclusion of the input variables dominating the peak ground motionparameter being modeled (e.g. Douglas and Smit 2001), but also the parametric descriptionof the relationship between the input variables and the output. Nonparametric models, by notrequiring a predetermined functional form, offer an attractive alternative to their parametriccounterparts. To date, several nonparametric models for the horizontal ground motion havebeen developed (e.g. Brillinger and Preisler 1984; Anderson and Lei 1994; Peruš and Fajfar1997, 2010; Anderson 1997). While having the advantage of flexibility, nonparametric mod-els typically require a large number of observations. The number of observations requiredto maintain a given level of accuracy increases exponentially with the number of predic-tive variables. This phenomenon is known as the “curse of dimensionality”, a term coined byBellman (1961). Models with a large number of coefficients become increasingly susceptibleto over-fitting as the number of available training samples decreases. The risk of over-fittingis higher if the performance of the model is judged by the goodness of fit to the data. Asdiscussed by Kuehn et al. (2009), the goodness-of-fit to the data is not a reliable indicator ofthe predictive capability of the model on samples outside the training data.

Kernel regression is a nonparametric approach that helps alleviate the dimensionalityproblem. In kernel regression, the training data is implicitly mapped into a high dimensionalspace, where linear regression is performed. The mapping is implicit because the mappingfunction itself is never evaluated. Instead, a “kernel function” which returns the dot product ofthe transformed points, is computed. The kernel function always returns a scalar regardless ofthe dimension of the space in which the data points are located. An immediate consequenceis that regression can be performed in a very high, even infinite, dimensional space with noincrease in the number of coefficients to be determined. In other words, the number of coeffi-cients becomes independent of the number of predictive variables. Sparse kernel regression,where the model is expressed as a linear combination of nonlinear kernel functions evaluatedat a subset of the input points, has recently been used in ground motion modeling (Tezcanand Piolatto 2012; Tezcan and Cheng 2012).

This paper uses Support Vector Regression (SVR), a special case of kernel regression,to develop a ground motion model that can be used as an alternative to the conventionalGMPEs. The SVR model is proposed for the 5 %-damped horizontal spectral values, andthe predictions from the SVR model are compared to the NGA models. Based on the con-cept of support vectors proposed by Vapnik (2000), SVR is a supervised learning technique

123

Bull Earthquake Eng (2012) 10:1205–1219 1207

characterized by its compact representation and high generalization capability (Drucker et al.1997). In SVR, the relationship between the input and the output is learned directly from thetraining data, without a fixed functional form or any assumptions on the underlying prob-ability distribution. What sets SVR apart from other supervised learning approaches (e.g.parametric regression and nonparametric regression via conventional neural networks) is theoptimization paradigm it uses. Instead of minimizing the fitting error on the training data,SVR minimizes a residual function that takes into account both the model complexity andthe quality of fit to the data Shawe-Taylor and Cristianini (2004). By reducing the model’ssusceptibility to overfitting, this approach makes SVR superior to minimizing fitting errors(Burges 1998).

The remainder of this paper is organized as follows. Section 2 constructs the ground motionmodel. Following a description of the dataset and predictive variables, the SVR algorithm isintroduced and the training results are presented. In Sect. 3, the predictions from the SVRmodel for vertical strike-slip faults are presented and the results are compared to the NGAmodels. In Sect. 4, main conclusions from this study are summarized and the advantages andlimitations of the proposed approach are discussed.

2 Construction of the SVR-based ground motion model

2.1 Dataset

To train the SVR model, we used the mainshocks used in the AS08 model (Abrahamson andSilva 2008). The AS08 dataset consists of 2,754 records from 135 events, 41 of which areclassified as aftershocks. The remaining 94 events (with event class MS, FS or Swarm) areclassified as mainshocks. More detailed information about these records can be found in theoriginal paper (Abrahamson and Silva 2008).

Our data set consists of 1,482 recordings from 1,068 stations and 94 mainshock events.The magnitude-distance distribution of the selected records is shown in Fig. 1.

2.2 Input and output variables

While the NGA models do not use the same predictive variable set, moment magnitude,source-to-site distance, soil conditions, and fault mechanism are common. The SVR modeluses the following predictors: moment magnitude (M), natural logarithm of the closest dis-tance between the station and the rupture surface in kilometers (lnR), shear wave velocityin the top 30 m of the soil profile in meters per second (Vs30), and two flags (FRV , FN M )

representing reverse faults and normal faults, respectively. For the two flags, we adopt thedefinition used in the AS08 model:

FRV ={

1, 30◦ ≤ rake ≤ 150◦0, otherwise

(1)

and

FN M ={

1,−120◦ ≤ rake ≤ −60◦0, otherwise

. (2)

Therefore, the input vector representing i th record has the following form:

inputi = [Mi ln (Ri ) Vs30i FRV i FN Mi ] . (3)

123


In the ground motion data used in this study, the ranges of the predictive variables are4.53 ≤ M ≤ 7.90, 0.07 km ≤ R ≤ 199.80 km, 116 m/s ≤ Vs30 ≤ 1525 m/s. To preventnumerical problems due to large variations between the ranges of the variables, the elementsof the input vectors are linearly scaled to [-1 1] as a pre-processing step. When scaled to [-11], the input vector for the i th record becomes:

xi =[

2Mi − 12.43

3.37,

2 ln (Ri )− 2.64

7.96,

2Vs30i − 1641

1409, 2FRV i − 1, 2FN Mi − 1

]. (4)

The target values consist of the natural logarithm of 5 %-damped spectral accelerationvalues for a given vibration period (T):

yi = ln(Si (T)) (5)

where Si (T) is the median value of the geometric mean of the two orthogonal horizon-tal components, computed using non-redundant rotations between 0◦ and 90◦ (Boore et al.2006).

2.3 The SVR algorithm

Consider the set of ground motion records D = {(xi , yi ), i = 1, . . . , N } where xi and yi areinput and output variables defined in Eqs. (4) and (5), respectively. The goal in regression isto find a function y = f (x) that describes the input-output relationship, assuming that therecords are independent, and identically distributed (iid).

In SVR, the function f (x) is a linear combination of nonlinear basis functions ψ(x) =[ψ1(x), ψ2(x), . . . , ψN (x)] centered on each xi :

f (x) = wTψ(x)+ b. (6)

In Eq. (6), w is a column vector consisting of the weights wi , i = 1 : N , and b is the inter-cept. This function represents a hyperplane in an N -dimensional space. The goal in SVR isto locate the hyperplane that minimizes the following residual function

R(w, δ, δ∗

) = (1/2)wTw + CN∑

i=1

(δi + δ∗i

). (7)

The residual function in Eq. (7) is composed of two terms with different objectives. Thefirst term is the Euclidean norm of the weight vector, multiplied by 1/2 for notational con-venience. By forcing small w values, this term encodes a preference for a smoother decisionsurface reducing the risk of over-fitting. The second term measures the accuracy of theselected function using the ε-insensitive loss function (Vapnik 2000). Because noise andoutliers are often present in the data, requiring that the function approximate all data pointswith the same precision may not be feasible. To relax this constraint, we introduce positiveslack variables δi and δ∗i for points above and below the ε-insensitive zone, respectively (seeFig. 2). The slack variables are given by:

δi = yi − f (xi , w)− ε for yi > f (xi , w)+ ε (8)

δ∗i = −yi + f (xi , w)− ε for yi < f (xi , w)− ε. (9)

The trade-off between the complexity and accuracy of the function is controlled by the costparameter, C > 0. The weights are determined by minimizing the risk function given inEq. (7) subject to the following constraints:

123


Fig. 1 Magnitude-distance distribution of the ground motion records

ε*δ

δ

εsupport vector

non-support vector

( )f x

Fig. 2 The tolerance zone, slack variables and support vectors

yi − wTψ(x)− b ≤ δi + ε, i = 1 : N (10)

wTψ(x)+ b − yi ≤ δ∗i + ε, i = 1 : N (11)

δi , δ∗i ≥ 0, i = 1 : N (12)

After finding the optimal weights and the bias term (see Appendix), the regression functioncan be written as a linear combination of kernel functions:

f (x) =Nsv∑j=1

z j K (x, x j )+ b (13)

where K (xi , x j ) = ψ(x j )Tψ(x) is a user-defined kernel function. In this paper, we restrict

our attention to the Gaussian radial basis function (RBF) defined as

K (xi , x j ) = exp(−γ (xi − x j )T(xi − x j )), γ > 0 (14)

which offers good robustness properties and guarantees the convergence of the optimiza-tion problem (Shawe-Taylor and Cristianini 2004). Our selection of RBF is also motivatedby the fact that it allows relaxing the i.i.d. assumption commonly used in many regressionalgorithms (Steinwart et al. 2009). Specifically, Steinwart et al. (2009) has shown that whenRBF is used in SVR, the predictions on a test data set are consistent even when the trainingdata includes correlated samples, as long as the training data and the test data are statisticallysimilar.

123


2.4 Model selection

Our goal in regression is to obtain a model y = f (x) that generalizes well to unseen data.More specifically, what has been learned from the training data D = {(xi , yi ); i = 1, . . . , N }should be applicable to other x drawn from the same distribution as the training data. Selectingthe function based on the training error typically leads to overfitting. This should not be sur-prising considering that given a set of N points, a polynomial of order N +1 achieves a perfectfit to the data, regardless of the actual functional relationship. This approach results in a high-variance model which gives large prediction errors unless x coincides with one of the trainingdata points(x = xi ). On the other hand, if the model is too simple (e.g. a linear function todescribe a nonlinear relationship), it will not be able to satisfactorily describe the input-outputrelationship, resulting in high bias. Both high-variance and high-bias models lead to poorestimates when presented with new (unseen) samples that are not in the training set (x �= xi ).

Ideally, we would like to minimize the expected error on the unseen samples (the gen-eralization error). Clearly, the generalization error cannot be computed in most practicalapplications, because its computation requires the knowledge of the probability distributionof x , in addition to the conditional distribution of y given x , both of which are unknown.However, assuming that the training points (x = xi ) are independent, identically distributedsamples, the generalization error can be estimated based on the training set.

A commonly used method to estimate the generalization error is cross validation (Webb2002). In 10-fold cross validation, given a set of candidate models, the training data is ran-domly partitioned into 10 equal-sized sets; one set is retained for validation and the remaining9 sets are used for training. This process is repeated 10 times using each set only once forvalidation and the mean squared error (MSE) over the validation set is calculated. The MSEvalues from each repetition are averaged to obtain the 10-fold cross validation error corre-sponding to the selected set of model parameters. For each candidate model, this processis repeated, and the model with the lowest cross validation error is determined. Finally, theselected model is trained using all 10 sets.

In SVR, the generalization error is determined by the cost parameter (C), the toleranceparameter (ε), and the kernel parameter (γ ). As γ , C and ε are continuous parameters, wecannot perform cross validation for all possible settings, and a finite subset must be chosen.

To select a finite subset, we need to consider how these parameters affect the performanceof the SVR model. The set of (γ , C, ε) values corresponding to a given level of generalizationperformance is not unique. For example, one can find an infinite number of (C, ε) pairs thatproduce the same cross-validation error, for a given kernel function. Cherkassky and Ma(2002) has shown that the performance of SVR depends heavily on the parameter ε, and theparameter C has negligible effect on the generalization performance as long as C is largerthan a certain threshold determined from the training data. A conservative value for C can bedetermined using (Cherkassky and Ma 2004)

C = max(∣∣μy + 3σy

∣∣ , ∣∣μy − 3σy∣∣) (15)

where μy and σy represent the average and the standard deviation of the training outputs,ln(S(T)). It is also known that the performance of SVR is not very sensitive to the parameterγ as long as a reasonable value is selected (Chapelle et al. 2002). A commonly used valuefor the kernel parameter γ is (e.g. Cherkassky and Ma 2004)

γ = 1

2p2 where p = r n√

0.3, (16)

where n is the number of predictor variables and r is the range of input values.

123


Table 1 Cost parameter (C), tolerance parameter (ε), kernel parameter (γ ), number of support vectors (Nsv),intercept term (b), mean-squared error (MSE) and squared correlation coefficient (R2)

T (s) C ε γ Nsv b MSE R2

0.00 5.88 0.51 0.202 429 −4.49 0.26 0.88

0.05 5.76 0.55 0.202 433 −4.33 0.30 0.87

0.10 5.44 0.57 0.202 436 −4.20 0.32 0.86

0.15 5.23 0.55 0.202 443 −4.25 0.31 0.86

0.20 5.18 0.55 0.202 462 −4.23 0.31 0.86

0.30 5.34 0.56 0.202 461 −3.77 0.31 0.87

0.50 5.93 0.58 0.202 449 −3.63 0.34 0.89

0.75 6.58 0.61 0.202 453 −3.61 0.37 0.89

1.00 7.07 0.62 0.202 435 −3.66 0.39 0.90

1.50 7.96 0.64 0.202 446 −4.19 0.41 0.91

2.00 8.76 0.66 0.202 451 −4.30 0.44 0.92

3.00 9.97 0.67 0.202 482 −4.99 0.46 0.93

4.00 10.82 0.69 0.202 490 −5.46 0.49 0.93

In the light of the above discussion, instead of searching for the best model in an arbitrarilylarge space of (γ , C, ε) values, we have set the C and γ values using the equations givenabove and determined the ε values using 10-fold cross-validation. The C values have beencomputed using Eq. (15), using the mean and standard deviation of ln(S(T)) for the vibrationperiods listed in Table 1. We have computed the kernel parameter γ using the Eq. (16). TheSVR model proposed in this paper has 5 predictor variables, each scaled to [−1 1]. Usingn = 5 and r = 2, γ = 0.202 is found. The ε values obtained from 10-fold cross validationare shown in Table 1. Using these (γ , C, ε) values, the support vectors (x j ), their weights(z j ) and the intercept term b have been computed as described in the Appendix.

Due to space limitations, the list of support vectors and their coefficients are not presentedin this paper. They can be computed as described in the Appendix using the (γ , C, ε) val-ues listed in Table 1. Alternatively, the list of support vectors and their coefficients can beobtained by e-mail from the authors.

The resulting number of support vectors NSV, the intercept term (b), mean-squared errors(MSE) and squared correlation coefficients (R2) are shown in Table 1.

Figure 3 shows the standardized residuals for T = 1 s. The residuals do not show a trendwith respect to magnitude, distance or the shear wave velocity. The residuals for other periods(not shown) are similar.

2.5 Prediction procedure

The procedure for predicting the ln(S(T)) for a given set of predictor variables (M, ln(R),Vs30, FRV , FN M ) is as follows:

1. Compute the scaled input xi using Eq. (4)2. Using xi from the previous step, and γ = 0.202, evaluate the kernel function Eq. (14)

for each support vector x j , j = 1, . . . , Nsv .3. Evaluate Eq. (13) to obtain the ln(S(T)).

123


Fig. 3 Standardized residuals for T = 1 s

The SVR model is not is based on a specific probability distribution, and is readily appli-cable to non-Gaussian data. If, however, zero-mean Gaussian noise about the ln(S(T)) valuesis assumed, the MSE values listed in Table 1 can be treated as the variance of the noise (σ 2

in the NGA models).

3 Comparison with the NGA models

The SVR model has been used to predict the median ground motion for different magnitude,distance, and shear wave velocity and the results have been compared to the NGA models.To facilitate comparisons between the models, vertical strike-slip faulting has been assumed.This assumption allows relating the two distance measures, the closest horizontal distanceto the surface projection of the rupture plane (Rjb) used in the Boore and Atkinson (2008)model and the closest distance to the rupture plane (Rrup) used in the rest of the models, with-out introducing additional source parameters. For vertical strike-slip faults, the two distancemeasures are related by the equation

Rjb =√

R2rup − Z2

tor, (17)

where Ztor is the depth to top of rupture. In evaluating the NGA models, Ztor has been takenas the median value from the NGA database, 6 km for M = 5, 3 km for M = 6, 1 km forM = 7 and 0 for M = 8 (Abrahamson et al. 2008). The parameter Z1.0 in the AS08 and CY08models which represents the depth to Vs = 1 km/s has been determined using Eqs. (18) and(19) respectively, as suggested by these models’ developers (Abrahamson and Silva 2008;Chiou and Youngs 2008).

123


Fig. 4 Median ground motion for Vs30 = 270 m/s,T = 0.2 s for vertical strike-slip faults

ln Z1.0 =

⎧⎪⎪⎨⎪⎪⎩

6.745 for Vs30 < 180 m/s

6.745 − 1.35 ln(

Vs30180

)for 180 < Vs30 < 500 m/s

5.394 − 4.48 ln(

Vs30500

)for Vs30 > 500 m/s.

(18)

ln Z1.0 = 28.5 − 3.82

8ln

(V8

s30 + 378.78) . (19)

The parameter Z2.5 in the CB08 model, which represents the depth to Vs = 2.5 km/s has beencomputed as suggested by the model’s developers, using the following equation (Campbelland Bozorgnia 2006)

Z2.5 = 0.519 + 3.595Z1.0 (20)

The parameter Z1.0. in Eq. (20) has been determined using Eq. (18).Figure 4 shows the median spectral accelerations for T = 0.2 s for soil sites (V s30 =

270 m/s) for distances up to 200 km. The shaded area represents the range of predictions fromthe four NGA models (the I08 model is not applicable to this case). To avoid crowding in thefigure, individual model predictions were not shown. The black dots represent the spectralaccelerations from strike-slip earthquakes with M = 6 ± 0.25 (left plot) and M = 7 ± 0.25(right plot), recorded on sites with V s30 = 270 ± 125 m/s. For T = 0.2 s, the SVR modelis in general agreement with the recorded values and the NGA predictions. For T = 1 s (seeFig. 5), the SVR predictions are in the upper range of NGA values.

Figures 6 and 7 compare the median spectral accelerations for rock sites (V s30 =760 m/s). For T = 0.2 s, the SVR predictions generally exceed the NGA values. For T = 1 s,the SVR predictions tend to be lower than the NGA values.

Figure 8 shows the median spectral accelerations at Rrup = 10 km for soil sites (V s30 =270 m/s). For M = 7, the SVR predictions tend to be larger than the NGA values, especiallyfor mid-periods. The predictions for rock sites (V s30 = 760 m/s) are shown in Fig. 9. Forperiods longer than 0.5 s, SVR values tend to be lower than the NGA predictions.

123


Fig. 5 Median ground motion for Vs30 = 270 m/s,T = 1 s for vertical strike-slip faults

Fig. 6 Median spectral acceleration for Vs30 = 760 m/s,T = 0.2 s for vertical strike-slip faults

4 Discussion and conclusions

In this paper, a nonparametric ground motion model has been developed using support vectorregression (SVR), a supervised machine learning technique characterized by its good gener-alization performance. The proposed model uses magnitude, distance, shear wave velocityand the faulting mechanism as the predictors and the 5 %-damped spectral acceleration valueat a given period as the target variable. The predictions from the proposed model have beencompared to the NGA models for vertical strike-slip faulting, and it has been shown that theproposed model is a viable alternative to conventional ground motion models.

The main limitation of the proposed approach is that although the structure of the resultingmodel is simple (a linear combination of exponential functions), this simplicity comes at the

123


Fig. 7 Median spectral acceleration for Vs30 = 760 m/s,T = 1 s for vertical strike-slip faults

Fig. 8 Median spectral acceleration for Vs30 = 270 m/s,Rrup = 10 km for vertical strike-slip faults

expense of physical interpretability. The SVR algorithm requires a set of learning parameters,some of which may not have a clear intuitive meaning. While cross validation can be used toselect these parameters, the computational cost may become prohibitive when dealing withvery large datasets with thousands of training examples.

Unlike standard GMPEs, the SVR model does not allow using any known physical con-straints. Not imposing a functional form can be an advantage or a disadvantage dependingon the level of confidence on the selected functional form and the availability of the data.Currently, no consensus exists as to the correct form of the predictive equation. Therefore,imposing a functional form introduces subjective judgment into the predictions. On the otherhand, nonparametric models do not take advantage of any known physical relationships, andthey rely solely on the available data. Consequently, they cannot be used for scenarios beyond

123


Fig. 9 Median spectral acceleration for Vs30 = 760 m/s,Rrup = 10 km for vertical strike-slip faults

the range of the training data. However, we do not consider this to be serious drawback ofthe proposed model as the reliability of empirical ground motion models in general, whenapplied to scenarios outside the range of the dataset, is controversial (Bommer et al. 2007).While the current practice of ground motion modeling is based heavily on parametric models,given the continual growth of seismic databases, and the ever increasing computer power, webelieve that nonparametric methods like ours will gain wider acceptance from the seismichazard community in the near future.

Another issue that deserves attention is the existing correlations in the database. Likemany regression algorithms, the SVR algorithm assumes that the samples come from realiza-tions of an i.i.d process, while ground motion databases usually contain correlated records.In SVR regression, the problem with correlated records is not as severe as in the least-squares-regression, as many of the highly correlated records are eliminated during optimiza-tion (when the support vectors are selected), somewhat alleviating the issue of the violationof the i.i.d assumption. Further, as Steinwart et al. (2009) has shown, when used with theRBF kernel, the SVR gives consistent predictions even when the training data includes cor-related samples, as long as the data satisfies the law of large numbers. Roughly speaking, thiscorresponds to relaxing the i.i.d assumption by requiring only statistical similarity betweenthe training and test samples. We certainly cannot claim that the proposed model is immuneto the violations of the i.i.d. assumption. However, considering the inherent robustness ofthe SVR algorithm and the fact that we did not include any aftershocks in this study, webelieve that the accuracy of the proposed model is not significantly affected by the existingcorrelations in the data. We plan to investigate ground motion variability in more detail inthe future.

Like any regression model, the accuracy of the SVR model is limited by the selected setof predictive variables. In this study, we used magnitude, distance, shear wave velocity andfault mechanism, which are the common variables in NGA models. The SVR algorithm isindependent of the number of the predictive variables used in the model; additional variablescan be introduced with no change in the algorithm. In this study, no attempt was made toidentify or include additional predictive variables to improve the accuracy of predictions, or

123


to increase the ability of the model to learn complex phenomena. This will be an extensionto our current work.

Acknowledgments This study was supported in part by a grant (CMMI-1100735) from the National ScienceFoundation

Appendix

The SVR model minimizes the error function

(1/2)wTw + CN∑

i=1

(δi + δ∗i ) (A.1)

subject to

yi − wTψ(x)− b ≤ δi + ε, i = 1 : N (A.2)

wTψ(x)+ b − yi ≤ δ∗i + ε, i = 1 : N (A.3)

δi , δ∗i ≥ 0, i = 1 : N . (A.4)

Burges (1998) has shown that solution of this optimization problem corresponds to select-ing a model which gives the lowest upper bound on the generalization error.

This problem cannot be treated as a standard quadratic optimization problem due to thepresence of the nonlinear function ψ(x) in the inequality constraints. This can be remediedby recasting the problem in a dual form Shawe-Taylor and Cristianini (2004). Using themethod Lagrange multipliers (Fletcher 1981), the dual optimization problem is written as:

maximizeN∑

i=1

yi(αi − α∗

i

) − ε

N∑i=1

yi(αi + α∗

i

)

−1

2

N∑i, j=1

(αi − α∗

i

) (α j − α∗

j

)ψ(xi )

Tψ(x j ) (A.5)

subject to

N∑i=1

α∗i =

N∑i=1

αi (A.6)

0 ≤ αi ≤ C, i = 1 : N (A.7)

0 ≤ α∗i ≤ C, i = 1 : N (A.8)

where αi and α∗i are the Lagrange multipliers. After the optimal values of the Lagrange

multipliers are determined, the weight vector can be computed using

w =N∑

i=1

(αi − α∗

i

)ψ(xi ) (A.9)

All the points inside the ε-tube have αi = α∗i = 0 and thus they can be removed from the

model. The remaining points, i.e. the points outside the tube and the ones on the boundary,determine the regression function. These points are called the support vectors. In general,support vectors constitute a small fraction of the input vectors.

123


Substituting Eq. (A.9) into Eq. (6) and removing the non-support vectors from the model,the regression function becomes

f (x) =Nsv∑j=1

(α j − α∗

j

)ψ

(x j

)Tψ(x)+ b (A.10)

where Nsv is the number of support vectors.Note that in both the dual objective function Eq. (A.5) and the regression function

Eq. (A.10), the nonlinear function ψ(x) appears in pairs. If a kernel function that oper-ates directly on pairs of input points can be defined such that K (xi , x j ) = ψ(x j )

Tψ(xi ),

the nonlinear function does not need to be evaluated. Using the kernel function, and settingz j = α j − α∗

j the regression equation can be written as

f (x) =Nsv∑j=1

z j K (x, x j )+ b. (A.11)

The bias term has not yet been calculated, as it does not appear in the dual optimization prob-lem. The bias can be determined using the support vectors on the boundary of the ε-tube,where the prediction error is known to be ∓ε. Specifically, if the number of boundary supportvectors is Nbsv , the bias term is given by

b = 1

Nbsv

Nbsv∑i=1

⎛⎝yi − εsgn (zi )−

Nsv∑j=1

z j K (x j , xi )

⎞⎠. (A.12)

References

Abrahamson N, Atkinson G, Boore D, Bozorgnia Y, Campbell K, Chiou B, Idriss IM, Silva W, YoungsR (2008) Comparisons of the NGA ground-motion relations. Earthq Spectra 24(1): 45–66

Abrahamson N, Silva W (2008) Summary of the Abrahamson & Silva NGA ground-motion relations. Earthqspectra 24(1):67–97

Anderson JG (1997) Nonparametric description of peak acceleration above a subduction thrust. Seismol ResLett 68(1):86–93

Anderson JG, Lei Y (1994) Nonparametric description of peak acceleration as a function of magnitude, dis-tance, and site in Guerrero, Mexico. Bull Seismol Soc Am 84(3):1003–1017

Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, NJBommer JJ, Stafford PJ, Alarcón JE, Akkar S (2007) The influence of magnitude range on empirical ground-

motion prediction. Bull Seismol Soc Am 97(6):2152–2170Boore DM, Atkinson GM (2008) Ground-motion prediction equations for the average horizontal component

of PGA, PGV, and 5%-damped PSA at spectral periods between 0.01 s and 10.0 s. Earthq Spectra24(1):99–138

Boore DM, Watson-Lamprey J, Abrahamson NA (2006) Orientation-independent measures of ground motion.Bull Seismol Soc Am 96(4A):1502–1511

Brillinger DR, Preisler HK (1984) An exploratory analysis of the Joyner-Boore attenuation data. Bull SeismolSoc Am 74(4):1441–1450

Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov2(2):121–167

Campbell KW, Bozorgnia Y (2006) Campbell-Bozorgnia NGA empirical ground motion model for the aver-age horizontal component of PGA, PGV and SA at selected spectral periods ranging from 0.01–10.0 s.Interim Report for USGS Review

Campbell KW, Bozorgnia Y (2008) NGA ground motion model for the geometric mean horizontal componentof PGA, PGV, PGD and 5% damped linear elastic response spectra for periods ranging from 0.01 to 10s.Earthq Spectra 24(1):139–171

123


Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vectormachines. Mach Learn 46(1):131–159

Cherkassky V, Ma Y (2002) Selection of meta-parameters for support vector regression. Artif Neural NetwICANN 2002:687–693

Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression.Neural Netw 17(1):113–126

Chiou BSJ, Youngs RR (2008) An NGA model for the average horizontal component of peak ground motionand response spectra. Earthq Spectra 24(1):173–215

Douglas J, Smit PM (2001) How accurate can strong ground motion attenuation relations be?. Bull SeismolSoc Am 91(6):1917–1923

Drucker H, Burges CJC, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. In:Mozer MC, Jordan JT, Petsche T (eds) Advances in neural information processing systems 9. MIT Press,Cambridge, MA, pp 155–161

Fletcher R (1981) Practical methods of optimization, vol 2: constrained optimization. Wiley, New York, NYIdriss IM (2008) An NGA empirical model for estimating the horizontal spectral values generated by shallow

crustal earthquakes. Earthq Spectra 24(1):217–242Kuehn NM, Scherbaum F, Riggelsen C (2009) Deriving empirical ground-motion models: balancing data con-

straints and physical assumptions to optimize prediction capability. Bull Seismol Soc Am 99(4):2335–2347

Musson RMW (2009) Ground motion and probabilistic hazard. Bull Earthquake Eng 7(3):575–589Peruš I, Fajfar P (1997) A non-parametric approach to attenuation relations. J Earthq Eng 1(2):319–340Peruš I, Fajfar P (2010) Ground motion prediction by a non parametric approach. Earthq Eng Struct Dyn

39(12):1395–1416Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press,

New York, NYStafford PJ, Strasser FO, Bommer JJ (2008) An evaluation of the applicability of the NGA models to ground-

motion prediction in the Euro-Mediterranean region. Bull Earthquake Eng 6(2):149–177Steinwart I, Hush D, Scovel C (2009) Learning from dependent observations. J Multivar Anal 100(1):175–194Tezcan J, Cheng Q (2012) A nonparametric characterization of vertical ground motion effects. Earthq Eng

Struct Dyn 41(3):515–530Tezcan J, Piolatto A (2012) A probabilistic nonparametric model for the vertical-to-horizontal spectral ratio.

J Earthq Eng 16(1):142–157Vapnik VN (2000) The nature of statistical learning theory. Springer, New York, NYWebb A (2002) Statistical pattern recognition. Wiley, New York, NY

123

Documents

Support vector regression for estimating earthquake response spectra