Decentralised one-class kernel classification-based damage detection and localisationweb.mit.edu/liss/papers/SCHM2016-Long.pdf?1.pdf · 2016-10-17 · Decentralised one-class kernel

Decentralised one-class kernel classification-based damagedetection and localisation

James Long and Oral Büyüköztürk*,†

Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue,Cambridge, Massachusetts 02139, USA

SUMMARY

In this paper, a data-based damage detection algorithm that uses a novel one-class kernel classifier for detectionand localisation of damage is presented. The demands of wireless sensing are carefully considered in the develop-ment of this fully decentralised and automated methodology. The one-class kernel classifier proposed in this paperis trained through a faster and simpler to implement iterative procedure than other kernel classification methods,while retaining the same advantages over parametric methods, making it especially attractive for embedded dam-age detection. Acceleration time series at each sensor location are processed into autoregressive and continuouswavelet transform-based damage-sensitive features. Baseline values of these features are used to train the classifier,which can then classify features from new tests as damaged or undamaged, as well as outputting a localisation in-dex, which can be used to identify the location of damage in the structure. This methodology is evaluated usingacceleration data taken from a steel-frame laboratory structure under various damage scenarios. A number of para-metric studies are also conducted to investigate the effect of sampling frequency and baseline data sample size.Copyright © 2016 John Wiley & Sons, Ltd.

Received 7 March 2016; Revised 23 June 2016; Accepted 17 July 2016

KEY WORDS: data-based damage detection; machine learning; smart sensing; one-class kernel methods

1. INTRODUCTION

The aim of Structural Health Monitoring (SHM) is to provide warning when the condition of a struc-ture deteriorates, by analysing data from sensors placed on the structure. A system that reliablyachieves this goal has the obvious benefits of preventing disastrous structural collapses, extending in-frastructure life time, and reducing maintenance costs. The advent of ‘smart sensors’, as described bySpencer et al. [1], equipped with sensors, microcontrollers, memory and wireless radios, has dramati-cally reduced the cost of deploying dense networks of sensors, potentially making widespread applica-tion of SHM feasible.

However, the goal of achieving low-cost, widely applicable damage detection systems requiresmethodologies that account for the constraints of wireless sensing. In particular, algorithms that miti-gate the need to transmit full acceleration series are desirable for a number of reasons. The transmissionof full vibration time series is likely to saturate wireless bandwidth, in addition to quickly depletingbattery life. While energy harvesting techniques may mitigate the need to replace batteries manually,the reduction of power use is still a desirable goal. By carrying out some, or all, of the damage detec-tion methodology prior to transmission, the full time series can be condensed to a set of damage-sensitive features (DSFs), or even a single scalar number indicating a decision on whether the structure

*Correspondence to: Oral Büyüköztürk, Professor of Civil and Environmental Engineering, Massachusetts Institute of Tech-nology, Room 1-281, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.†E-mail: [email protected]

STRUCTURAL CONTROL AND HEALTH MONITORINGStruct. Control Health Monit. (2016)Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/stc.1930

Copyright © 2016 John Wiley & Sons, Ltd.

has incurred damage. Lynch et al. [2] report that by processing data into DSFs on a microcontroller atthe sensor level prior to data transmission, dramatic reductions in power consumption can be achieved.This decentralised approach not only reduces power consumption, but also bandwidth usage. Addition-ally, in the event that wireless transmission is interrupted, the reduction in the volume of relevant datafacilitates temporary storage until transmission can be successfully completed.

Previous research, [2,3], which has focused on the goal of developing automated, ‘smart sensing’SHM technology, as described earlier, has adopted a data-based approach to damage identification,where time series of data acquired from sensors are processed into damage-sensitive features, and thenstatistical pattern recognition is conducted to compare DSFs with baseline values. For many applica-tions, particularly in infrastructure monitoring, this comparison can most easily be made using a nov-elty detection approach, meaning new tests are compared only with nominally ‘normal’ baselinevalues. It is important to note that this approach assumes that the structure is in a safe condition whenmonitoring first starts, and can only then be used to diagnose changes from the initial, baseline condi-tion. Nevertheless, this methodology is sensible and attractive and will be adopted in this paper.

The traditional supervised support vector machine (SVM) [4] is now a well-established approach todamage detection. The supervised SVM requires data from both damaged and undamaged cases in or-der to train a model for damage detection. The one-class SVM, in contrast, does not require the use ofdata from the damaged structure to train the model. The significance of this difference for monitoringof infrastructure is clear. One-class methods are applicable to damage detection on existing structures,where the damaged data is not available a priori.

In this paper, we discuss the one-class SVM and the related least-squares one-class SVM as possiblecandidates for use in a distributed data-based damage detection and localisation system. We then intro-duce a novel variant of these methods, which we call the ‘early stopping one-class kernel leastsquares’, which is more suited to the computing constraints of the smart sensing paradigm. We developand experimentally verify a novel methodology that uses this early stopping one-class kernel leastsquares method for detection and localisation of damage in the context of distributed monitoring ofcivil infrastructure. An important aspect of the proposed methodology is the ability to localise damage.This methodology is fully automated and suitable for embedding on board a smart sensor, and its ef-ficacy will be demonstrated in this paper using experimental data taken from a steel-frame laboratorystructure under various damage scenarios.

Furthermore, using the developed methodology, we present several novel experimental results. Wedemonstrate that the reliability of damage detection is strongly influenced by sampling frequency andthat surprisingly for a steel-frame civil structure, sampling rates greater than 1000Hz can improve per-formance. We also investigate the effect of damage-sensitive features on damage localisation perfor-mance and demonstrate that the use of high-frequency, nonstationary vibration features can improvedamage localisation.

In what follows, we describe related state of the art research efforts, and in view of the drawbacks ofthese approaches, present the novel developments of the methodology proposed in this paper.

2. REVIEW OF RELATED WORK

Lynch et al. [2] developed functioning smart sensing hardware, which has the capability to extractautoregressive (AR)-based features from time series, onboard a sensor equipped with a microcontrol-ler. The damage-sensitive feature used is the standard deviation of the residual error of an AR exoge-nous model. For statistical pattern recognition, first, a prescreening step is carried out to identify whichentry in the baseline database has the most similar AR model, to reduce the effect of operational var-iation. Once the most similar baseline measurement is identified, the ratio between the baseline andnew damage-sensitive features is computed. A threshold value for this ratio is chosen by ‘engineeringjudgement’. This approach, while suitable for onboard computation, requires the ad hoc selection ofthresholds, and is not suitable for the use of multivariate vectors of damage-sensitive features, whichmay capture a greater range of damage types and scenarios. Worden et al. [5] published one of the firstSHM studies to explicitly address the issue of statistical pattern recognition as part of a comprehensivedamage detection methodology. In this work, the Mahalanobis squared distance is used as a

J. LONG AND O. BÜYÜKÖZTÜRK

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)DOI: 10.1002/stc

discordancy measure between new features and baseline values. A threshold value of this measure,above which new tests are classified as damaged, is calculated by conducting a Monte Carlo simulationand retaining the value above which only some small percentage of the simulated values fall. This out-lier analysis procedure has been adopted by many other SHM researchers, for example, Gul and Catbas[6]. The drawback of this approach is that the Monte Carlo procedure makes assumptions on the prob-ability distribution of the data. These assumptions may be reasonable in some cases, but depending onthe choice of damage-sensitive features, or variations in environmental conditions, feature vectors mayhave non-Gaussian or even multimodal distributions.

Nair et al. [7] combine AR coefficients into a single damage-sensitive feature. Statistical pattern rec-ognition is then conducted by using a t hypothesis test to establish the statistical significance of the dif-ference in mean between the baseline DSFs and the new damaged DSFs. This approach also hasdrawbacks. The mean value of a DSF representing a damaged condition may be very close to the meanvalue of the baseline data. Consider the case of a bimodal distribution of baseline damage-sensitive fea-tures: the mean of such a distribution lies far from the actual baseline data and would likely suggest theoccurrence of damage, but using this methodology new data, which lies near this mean value, would beconsidered to be from the baseline condition of the structure.

Nair and Kiremidjian [3] suggest the use of Gaussian mixtures modelling (GMM) for statistical pat-tern recognition. GMM models data as the superposition of Gaussian distributions. In this work, it isreasoned that once damage occurs the number of mixtures in the model will be altered. This approachovercomes many of the issues with parametric methods. However, outside of the realm of SHM, non-parametric kernel methods for novelty detection have consistently shown excellent performance, andmay outperform GMM. In particular, the one-class SVM (OCSVM), developed by Scholkopf [8],has performed well on novelty detection for hyperspectral imagery problems [9], document classifica-tion [10] and seizure detection [11].

Oh and Sohn [12] used the least squares OCSVM for the purpose of data normalisation underenvironmental and operating uncertainty, rather than as a model for damage detection andlocalisation. Oh and Sohn [12] used the least squares SVM as a means to extract nonlinear principlecomponents, which are then used as features for a damage classifier based on the sequential proba-bility ratio test. Hayton et al. [13] proposed the use of the OCSVM for the detection of abnormalitiesin the vibration shaft of an operating jet engine, using simple features based on harmonics of therotation frequency.

The early stopping one-class kernel least squares algorithm we propose in this paper can be solvedthrough a simple iterative procedure rather than the quadratic programming solution required to solvethe SVM formulation used in Hayton, and as such offers potential advantages for embedded damagedetection. Furthermore, our approach uses a distributed network of sensors, operating independentlyto detect damage at a location that is unknown a priori, where in [13], a single measurement locationis considered. We also address the issue of damage localisation and propose the use of a localisationindex from our one-class classification algorithm, the efficacy of which is experimentally validatedin this study. Lastly, we advocate high-frequency sampling and experimentally demonstrate thatdamage detection performance is significantly improved by higher sampling rates.

We believe that the early stopping one-class kernel least squares classifier proposed in this paper isan ideal candidate for powerful, low-cost statistical pattern recognition for embedded structural damagedetection and localization. The ability to accurately describe multimodal or non-Gaussian data distribu-tions represents an advance over popular parametric methods for SHM, while performance of similaralgorithms in other fields suggests that our approach may outperform current nonparametric methodsused in SHM.

3. OVERVIEW OF DAMAGE IDENTIFICATION METHODOLOGY

The damage detection methodology presented in this paper has been developed for the purpose of em-bedding the data-based damage detection approach at the individual sensor level, creating a low-power,low-cost, automated smart sensing system. The data-based damage detection approach requires the fol-lowing steps:

ONE-CLASS KERNEL CLASSIFICATION FOR DAMAGE DETECTION AND LOCALISATION


• Acquisition of raw data from the structure.• Feature extraction.• Statistical pattern recognition for damage detection and localization.

The statistical pattern recognition methodology presented in this paper is based on a one-class ker-nel classification scheme, which will be explained in detail later. The classifier is trained using a data-base of features extracted from tests on the baseline condition of the structure.

This strategy can be used for either new or existing structures. In either case, the monitoring processbegins with the collection of baseline data prior to the live monitoring phase. The baseline does not nec-essarily have to correspond with the original intact state of the structure. In many cases, we are moreinterested in whether structures are degrading from their current state rather than differences from theoriginally constructed. Indeed, even new structures often contain defects or behave slightly differentlythan intended by designers. Therefore, rather than attempt to quantify and model every deviation fromthe designer intention, the data-based approach is intended to detect changes from the current state.

In the initial phase, data acquisition and feature extraction are carried out onboard each sensor. Oncethe database is sufficient, a one-class kernel classifier is trained for each sensor. Depending on the in-dividual circumstance this training can be completed either on board the sensor, or offline at a centralbase station: for example, if wireless connectivity is poor, or the sensor is multiple hops from the cen-tral server, it may be advantageous to carry out the training on board the sensor. Once each sensor hascompleted the training phase, the system enters the live testing phase. From this point on, for everynew test, data is acquired, features are extracted and pattern recognition is carried out onboard each in-dividual sensor.

4. FEATURE EXTRACTION

The requirements of a low-cost, low-power, flexible damage detection system place a number of con-straints on the feature-extraction process. While a significant research effort has been dedicated to thesuggestion of damage-sensitive features for structural damage detection, many of these features are notappropriate for embedding at the sensor level. Specifically, in addition to being sensitive to damage,features should also require low power for computation, should only require data from one sensor lo-cation, should not require explicit comparison with previously computed features prior to statisticalpattern recognition (as is the case with many damage indices) and ideally should exhibit sensitivitywith proximity to the damage location, in order to allow localization. Based on these requirements, areview of the existing literature was conducted to identify suitable features.

Many of the early studies in SHM focused on the fact that changes in structure can manifest them-selves as changes in natural frequency, mode shape or modal damping. Advances in the highly relatedfield of modal analysis have provided algorithms that can identify in an automated fashion the naturalfrequency, mode shapes and damping using only output from sensors. Stochastic subspace identifica-tion, developed by Van Overschee and De Moor [14], and frequency domain decomposition, devel-oped by Brincker et al. [15], are two such algorithms that can identify modal parameters in anaccurate and automated fashion. However, modal parameters are generally considered to be relativelyinsensitive to damage, and their identification requires output from multiple sensor locations. There-fore, the use of modal parameters as damage-sensitive features will not be adopted in this study, as theyare not well suited to the decentralised approach.

Outside the realm of modal parameters, researchers have focused on the potential nonlinear effectsof damage. Damage in a structure because of cracking, yielding or connection loosening may cause anincrease in nonlinearity. If this effect can be captured from the measured sensor data, it may enable ear-lier and more accurate detection of damage. Todd [16] uses nonlinear dynamics concepts to define thelocal attractor variance as a damage-sensitive feature. The structure is provided with a chaotic excita-tion, and the response is analysed in state-space, where it is hypothesised that damage will causechanges to the steady state trajectory, or ‘attractor’. This is potentially a very powerful methodology,but unfortunately requires precise measurement of the input excitation, which may not be possiblefor many real-life structures.



Empirical mode decomposition and the related Hilbert–Huang transform are nonlinear signal pro-cessing techniques that have been used for SHM feature extraction. Unlike Fourier or wavelet transfor-mation, empirical mode decomposition does not specify basis functions for signal expansion, but ratheruses an empirical, iterative algorithm to expand the signal into well-behaved ‘intrinsic modes’. Thisadaptive method preserves nonlinearity in the signal and allows the ‘modes’ to have time-varying fre-quency and amplitude. While some very promising applications of this algorithm to damage detectionhave been demonstrated in the literature [17,18], the computational effort required is high, likely mak-ing this approach unsuitable for embedded sensing application.

Ying et al. [19] note that no one feature is likely to be universally sensitive to all potential damagescenarios, robust to all operational variations, for all types of structure; and therefore instead of tryingto find one optimal feature, the strategy should be to identify a number of potentially useful features forstatistical pattern recognition. With these considerations in mind, promising features are identified anddescribed subsequently.

4.1. Autoregressive time series analysis

Autoregressive time series models have been proposed as a means for extracting damage-sensitive fea-tures by a number of researchers, for example, by Sohn and Farrar [20]. While various related algo-rithms, such as AR moving average models and AR with exogenous input models, have also beendemonstrated to provide useful damage-sensitive features for SHM, in this paper, we focus on the sim-ple AR model, as its potential for on-board processing has already been shown [2]. Before the ARmodel is fitted, all time series are normalised to have zero mean and unit standard deviation. The simpleAR model is then given by

x tð Þ ¼Xpk¼1

φk x t � kð Þ þ ex tð Þ (1)

where x(t) is a single acceleration time series, ϕk are the autoregression coefficients, p is the order ofthe model and ex(t) is the residual error. The Burg method for estimating the AR model is chosen here.Figueiredo et al. [21] investigated the effect of the AR model order on damage detection and evaluatedseveral different strategies for selecting this order. One such strategy uses the Akaike information cri-terion (AIC) to calculate a score, estimating the quality of the model, at different orders. Given a set ofcandidate models, the best model is the one with the smallest AIC score. The AIC score is given by

AIC pð Þ ¼ ln ρp� �

þ 2pþ 1N

(2)

where ρp is the variance of the residual error at order p and N is the length of the time series. Shown inFigure 1 is an example of the variation of the AIC score for model orders between 1 and 25 evaluatedfor the real experimental data used in this study. Although the AIC score continues to decrease as themodel order increases, the decrease is much more gradual once the model order is larger than 7, and

Figure 1. Akaike information criterion for various model orders.



given the constraints of wireless sensing, there is an advantage in choosing a lower model order. Amodel order of 7 will initially be chosen for the damage detection results in this study. A further para-metric study on the effect of increasing the model order is also conducted in Section 10.

Because of the linear nature of the model these features are most likely to capture changes such asreductions in stiffness, cross-section or mass.

4.2. Continuous wavelet transform

Another promising method for extraction of damage-sensitive features is the use of wavelet transformsfor processing acceleration histories. Unlike traditional Fourier analysis, or the time series approach de-scribed in the previous section, wavelet analysis is nonstationary, providing information in both thetime and scale domains, which may yield more insightful information on transient phenomena. Nairand Kiremidjian [22] also demonstrated that the energy of the wavelet coefficients obtained from a con-tinuous wavelet transform is directly related to the modal properties of the structure and can thereforebe used as a damage-sensitive feature. The continuous wavelet transform at scale a and time b is givenby

X a; bð Þ ¼ 1ffiffiffia

p ∫þ∞

�∞x tð Þψ t � b

a

� �dt (3)

whereψ t�ba

� �is the mother wavelet function, which has been scaled by the factor a, and translated by b.

In this paper, the Morlet wavelet is chosen to be the mother wavelet:

ψ tð Þ ¼ e jω0te�t22 (4)

where j is the imaginary unit, ω0 is the central frequency of the mother wavelet and t is time. Nair andKiremidjian [22] defined the damage-sensitive features as the total energy contained at each waveletscale.

Ea ¼XTb¼1

jX a; bð Þj2 (5)

The choice of wavelet scales impacts the usefulness of wavelet-based features for both damage de-tection and localisation. Noh et al. [23] provide a detailed discussion on choosing optimal wavelet ba-ses for structural damage detection and adopt a strategy in which the basis is chosen to minimise theShannon entropy of the wavelet scalogram for the undamaged structure. In this paper, we adopt aslightly different approach. It is anticipated that the lowest wavelet scales will yield the most informa-tive data for damage localisation, and as such, we are most interested in capturing shifts in high-frequency components of the acceleration signal. We therefore adopt an empirical approach, choosingthe wavelet scales such that their equivalent Fourier frequencies correspond with peaks and troughs inthe higher end of the power spectrum density of the undamaged acceleration signals. Shown inFigure 2 is the power spectral density of an acceleration signal from the undamaged structure, withthe equivalent Fourier frequency of the chosen wavelet scales indicated. These scales have been chosento match well with the frequency peaks at higher frequencies, especially at 1300 and 1050Hz. Waveletanalysis allows us to trade off resolution in the frequency domain for resolution in the time domain, byadjusting the parameter ω0. We choose the lowest admissible value of this parameter, lowering the res-olution in the frequency domain, but increasing it in the time domain. Capturing local, high-frequencyvibration may aid in localising damage, and so we anticipate that choosing lower frequency resolutionbut higher time resolution may enable the capture of these vibrations.

4.3. Detection versus localisation

The ability to localise damage using the methodology outlined in this paper depends on the sensitivityof extracted features with proximity to the damage location. We believe the continuous wavelettransform-based features are suitable choices for damage localisation, particularly in the case of dam-ages involving discontinuities, such as loosening connections or cracks. These types of damages maymanifest themselves as local, higher frequency vibrations, which damp out quickly both in time and



space, thereby providing the desired sensitivity to damage location. In this study, we will use wavelet-based damage-sensitive features, corresponding with the total energy contained at each of the first fiveath scales, in an attempt to capture this local, high-frequency information. We expect the AR features,because of their linear, stationary nature and because they contain information about the entire fre-quency response of the structure, to be more suitable for damage detection purposes than localisation.

4.4. Data fusion

The feature extraction stage is completed by combining or fusing the features into a vector that pro-vides more comprehensive information than any individual feature. Triaxial sensors are used in thisstudy, providing acceleration time series in the x, y and z directions for each test. For damage detectionpurposes, we combine the first seven AR coefficients, with the first five wavelet energies, for each axis,to give a total of 36 features for each test, at each sensor location. For damage localisation, we use justthe five wavelet-based features to capture local, higher frequency information.

5. STATISTICAL PATTERN RECOGNITION

In data-based damage detection, the extraction of relevant damage-sensitive features is followed by sta-tistical pattern recognition, which for civil structures is typically a novelty detection problem. In thispaper, we present a novelty detection approach for damage detection and localization based on one-class kernel classification methods. We first describe the OCSVM algorithm and the closely relatedleast squares OCSVM. We then introduce and propose a novel variant of these algorithms; the ‘earlystopping one-class kernel least squares’, which will be used for damage detection and localisation inthis paper. It should be noted that all three of these algorithms are suitable for use within the generalframework for distributed damage detection and localisation proposed in this paper, but that the newlyproposed algorithm is simpler to implement, and provides a computational advantage under certainconditions, described later in this section.

5.1. One-class support vector machine

Support vector machine methods have attracted some attention from researchers in SHM, but almostexclusively in the context of binary or multi-class classification, where data from the expected dam-aged state of the structure is available during the training stage. In contrast, the OCSVM requires onlyexamples of data from the baseline state of the structure to decide if a new test point is abnormal andshould be cause for concern. This is crucial, as we cannot realistically expect to be able to generate ex-amples of data from the expected damage scenarios for most civil structures. Perhaps in the case of asmall, mass-produced component, it may be possible to acquire examples of data from damaged con-ditions by inducing cracks, notches and holes in a test component and undertaking extensive testing. In

Figure 2. Comparison of chosen wavelet central frequencies with the power spectral density of the accelerationsignal.



the vast majority of cases, particularly for the monitoring of existing large-scale infrastructure, it is ob-viously not practical to generate this data: hence, the usefulness of one-class classifiers, or novelty de-tectors, such as the OCSVM. The more commonly used binary version of SVM aims to separate twoclasses of training data by transforming them to a nonlinear feature space and finding a linear functionthat optimally separates them. The OCSVM, on the other hand, does not have any data for the secondclass, and so instead seeks to find a hyperplane, which separates the training data from the origin, as aproxy for the second class [8]. This is formulated by the following convex optimization problem:

minw∈ℝ2 ξ∈ℝN ;ρ∈ℝ

12∥w∥2 � ρþ 1

νN

Xi

ξ i

subject to w�ϕ xið Þ≥ρ� ξ i ; ξ i≥0(6)

where ν is a bound on the fraction of training examples classified as outliers, N is the number of train-ing examples, ξ i are slack variables and w �ϕ(xi)� ρ is the separating hyperplane in feature space. Thedecision function for a new test point x is then given by

f xð Þ ¼ sgn w�ϕ xið Þ � ρð Þ (7)

Using Lagrangian multipliers, and the kernel trick, the dual problem can be formulated as

minα

12

Xi;j

αj αi K xj; xi� �

subject to 0≤αi≤1N

;X

αi ¼ 1 (8)

In this dual formulation, the explicit calculation of the feature space mapping, ϕ(x), appears no-where. Because of Mercer’s theorem, an inner product in feature space can be replaced by a kernelfunction K(xj, xi), provided the kernel function is positive semi-definite. This allows for the replace-ment of ϕ(xj) �ϕ(xi), with the kernel function. This is known as the kernel trick, and it allows for theefficient computation of nonlinear decision boundaries. In this paper, the Gaussian kernel will be used

K x; yð Þ ¼ exp �∥x� y∥2

σ2

� �(9)

where x and y are vectors describing single training examples, and σ is a free parameter. The choice ofthis parameter can dramatically affect the performance of the OCSVM. Small values of σ can lead tooverfitting of the data, and thus, poor generalisation. Large values of σ underfit the data, resulting in aninability to detect non-trivial patterns. Long and Buyukozturk [24] investigated three potential methodsfor automatic selection of this parameter, and found the method developed by Khazai et al. [25] to givegood results. This method is also used for this paper.

Most of the Lagrange multipliers, αi, in Equation (8), will evaluate to zero. Any non-zero αi is calleda support vector. Only the support vectors are required to evaluate the decision function for a new pointx. Once the convex minimization problem in Equation (8) has been solved, the OCSVM is ready toevaluate new test points. The binary decision function on new test points x is given by

f xð Þ ¼ sgnXno: SVsi

αiK x; xið Þ � ρ

!(10)

where ρ can be recovered as

ρ ¼Xj

αjK xk ; xj� �

; αj∈ 0 ;1νN

� �(11)

5.2. Least squares one-class support vector machine

The least squares OCSVM is conceptually very similar to the OCSVM described in the previous sec-tion. However, instead of minimising the so-called hinge loss, we can use the quadratic loss, leading tothe formulation in Equation (12), proposed by Choi et al. [26]:

minw∈ℝ2 ρ∈ℝ

12∥w∥2 � ρþ 1

2CXi

ρ� w�ϕ xið Þð Þ2 (12)

where C is a regularising parameter, which penalises the L2 norm of the solution. Introducing



Lagrange multipliers as shown in [26], the problem can be reduced to the system of subsequent equa-tions, and then solved for α and ρ by block matrix inversion.

0 eT

e K þ I=C

" #� �ρ

a

¼ 1

0

(13)

where e is the unit vector and I is the identity matrixChoi et al. [26] describe this approach as finding a hyperplane that has ‘maximum distance from the

origin, and with respect to which the sum of the squares of errors are minimised’. The decision functionis now given by Equation (14).

f xð Þ ¼ w�ϕ xð Þ � ρ ¼ kTc� ρ

where ki ¼ K xi; xð Þ (14)

The decision function outputs a measure of how far a new point lies from the optimal hyperplane, orin other words, how dissimilar it is to the training data. Obtaining a binary classification decision canbe achieved by computing the value of the decision function in Equation (14) for each training pointand finding the kth percentile of this value, where k is the percentage of training examples we wishto be classified as normal. Designating this value as k, the decision function on whether a new pointis anomalous is given by

g xð Þ ¼ sgn k � kTc� ρ� ��

(15)

5.3. Early stopping one-class kernel least squares

The solution of the least squares OCSVM yields a hyperplane with maximum distance from the origin,which also minimises the sum of the squares of errors with respect to the training data. ExaminingEquation (12) from the point of view of regularisation rather than from a geometric point of view tellsa different story: The least squares OCSVM is in fact a Tikhonov regularised least squares problem.The right-hand side of Equation (12) is a simple least squares minimisation that finds a function thatmost closely matches the labels of the training data (all the same in the one-class case), while the termon the left-hand side, 1

2∥ w∥2 � ρ , penalises the L2 norm of the solution, thereby preventingoverfitting. The necessity of regularisation is abundantly clear in the one-class case: a function withconstant value of 1 will have a sum of square errors of exactly 0, but will be useless in evaluating un-seen data.

In [27], Lo Gerfo et al. point out that there is a strong analog between ill-posed inverse problemsand statistical learning. For example, Tikhonov regularisation, one of the most popular approachesfor regularisation of ill-posed inverse problems, is now also commonly used in machine learningand statistics, where it is known as ridge regression, or regularised least squares.

In a similar vein, Lo Gerfo et al. [27] point out that a supervised ordinary least squares, which issolved by gradient descent, but is stopped early, is a special case of the Landweber iteration used insolving ill-posed inverse problems. In this algorithm, the number of iterations completed before earlystopping plays the role of a regularisation parameter: the earlier the iteration is halted, the less prone tooverfitting. In this paper, we propose that a similar approach can be applied to the problem of one-classlearning.

Dropping the L2 norm regularisation from Equation (12), the remaining term is a simple leastsquares minimisation:

minw∈ℝ2 ρ∈ℝ

1N

Xi

ρ� w�ϕ xið Þð Þ2 (16)

As per the representer theorem, Equation (12) can be rewritten in kernel form as

minα∈ℝN ρ∈ℝ

1N

Kαk � ρe 22

�� (17)

Similarly to [27], the solution can now be written as a Landweber iteration, and is equivalent to theminimisation of Equation (17) using gradient descent. It should be noted that in this scheme the param-eter ρ is not optimised, but rather set to a constant value.



α0 ¼ 0

αiþ1 :¼ αi þ 1N

ρe� Kαið ÞT ; i ¼ 0;…; t(18)

where t is the number of iterations performed before stopping. The decision function is identical to thatgiven in Equation (14), and can also be modified in the same way as Equation (15) to output a binarydecision. In practice, this early stopping one-class kernel least squares algorithm gives very similar re-sults to the least-squares OCSVM proposed by Choi [26], and there are a number of potential advan-tages of our proposed approach. First, the early stopping one-class approach is very simple, and can beimplemented in a few lines of code. Secondly, our approach requires exactly tn2 multiplications, wheret is the number of iterations, while the least squares OCSVM requires the inversion of an n×n matrix,which is anO n3ð Þ operation. If the number of iterations is significantly less than the number of trainingexamples, this can represent a large saving in computation. In the context of smart sensing, it is plau-sible that the early stopping one-class kernel least squares could be trained directly on a smart sensingdevice. Whether this is advantageous or not depends on the size of the training data and the wirelessconnectivity. It should nonetheless be noted that with modern microcontroller processing speeds sur-passing 100MHz, training a classifier on a set of 200 examples could be achieved in a fraction of asecond, without requiring convex optimisation or matrix inversion libraries as in the case of theOCSVM and least squares OCSVM, respectively.

The number of iterations performed controls the complexity of the solution: more iterations leads toa more complex model, and eventually to overfitting. Conversely, too few iterations will produce amodel that is too simple. The number of iterations is clearly an important hyperparameter. We proposea simple approach to choosing this hyperparameter: during initial training, we hold out 30% of thetraining examples. At regular intervals of the iteration, the current model is used to predict the labelsof the held-out data. As the number of iterations increases, and overfitting of the training data startsto occur, the model will begin to predict higher numbers of the held-out data as anomalous. The chosennumber of iterations should be the largest number before this degradation starts to occur. This approachis illustrated in Figure 3. Here, it can be seen that at approximately 100 iterations the accuracy on theheld-out data starts to decrease. It should also be noted that optimising this hyperparameter requiresvery little additional computation. If the results of the α coefficients are stored when predictions aremade on the held-out data, the iteration can simply resume using the previous values as theinitialisation point.

5.4. Damage localisation

Damage localization is typically a more difficult problem than pure detection. In the data-based smartsensing paradigm, this is made more challenging by virtue of the fact that we would like sensors to

Figure 3. Choice of number of iterations.



operate independently, avoiding communication between each other. If the damage-sensitive featurescomputed during feature extraction exhibit sensitivity with proximity to damage, that is, the valueschange more the closer they are to the damage location, localization using independently operatingsensors is possible. Nair et al. [7] demonstrated this concept using a localisation index based on theEuclidean distance between damaged and baseline clouds of AR feature vectors at each sensor. In the-ory, if the feature vectors are sensitive to the proximity of damage, the highest value of the localizationindex will be at the sensor closest to the damage.

All three one-class kernel methods described in the previous section can also output a distance met-ric, in addition to a binary decision, which can be used as a localisation index. In this paper, a locali-zation methodology, which is thematically similar to that of Nair et al. [7], is proposed, but which usesa more sophisticated localization index based on the early stopping one-class kernel least squares. Thedecision function given by Equation (14) is a measure of how dissimilar a new test point is to the train-ing data, and as such is a natural choice for a localisation index. Large values of this localisation indexindicate new data, which is more anomalous. This distance is a more accurate descriptor of proximityto the training data than a Euclidean distance from the mean of the training data, or a Mahalanobis dis-tance that imposes an elliptical shape on the data. We expect that the sensor location closest to the dam-age will output the largest value of the localisation index. If effective, this localisation index has anobvious benefit: it requires no additional analysis or memory on top of that already needed for the dam-age detection process. A visualisation of the early stopping one-class kernel least squares-based local-ization index is shown below in Figure 4. Because we are using the Gaussian kernel, the localisationindex (LI) will tend towards an asymptotic value as we move away from the baseline data, makingit difficult to distinguish between values that are distant from the baseline, as in the case of more majordamage. To counteract this effect, the Gaussian bandwidth parameter σ can be adjusted to give a func-tion that moves more gradually towards the asymptotic value. For the localisation results presented inthis study, the bandwidth parameter was increased by a factor of 10 compared with the detectionresults.

6. EXPERIMENTAL RESULTS

The efficacy of the damage identification methods presented in this paper will now be demonstratedusing data acquired from an experimental laboratory structure, which has been developed specificallyfor the purposes of testing and validating SHM algorithms. The structure is a modular, three-story, two-bay steel-frame structure with bolted connections. Eighteen triaxial accelerometers measure the vibra-tion response of the structure. The locations of these sensors are illustrated in Figure 5. The structure isexcited by a shaker, located in close proximity to sensor no. 18. This shaker is programmed to input

Figure 4. Contours of localisation index.



random, white Gaussian noise. In effect, because of the frequency response of the shaker itself, this ex-citation provided to the structure is drawn from a spectrum of approximately 5 to 350Hz.

As previously discussed, any data-based damage detection methodology requires comparison withdata from the baseline condition of the structure in order to detect damage. With this in mind, an initialdatabase of 258 10-s samples was taken at a sampling rate of 3000Hz at each sensor location. A testdataset was then acquired, which includes data from the same, intact structure, as well as from five dif-ferent damaged configurations of the structure. These scenarios are shown in Table I. The structure is asteel frame, consisting of beam and column elements. These elements are connected with a bolted con-nection containing four bolts. Damage is induced by loosening the bolts at the connection, and locationis described by the closest sensor. Note that this does not mean the sensor itself is damaged, just thatthe damaged connection is adjacent to the sensor location. Minor damage corresponds with looseningtwo of the four bolts, while for major damage all four bolts are loosened.

Figure 6 shows average values of the first four natural frequencies of the structure in all configura-tions. Error bars show the standard deviation. These natural frequencies were calculated using a simplepeak picking strategy where the natural frequency is defined as the frequency with the maximum valuewithin a user-specified window identified by visual inspection of the frequency spectrum. The minordamage scenarios show very similar natural frequencies to the undamaged cases, while the major dam-age scenarios show changes of less than 5% compared with the undamaged structure.

7. DAMAGE DETECTION RESULTS

Representative results of the damage detection performance are shown in this section for discussion.Results in this section are shown in the form of column charts. As described previously, we train a sep-arate novelty detection model for each sensor location, and hence each set of three columns in the chartcorresponds to the model for that sensor location. Figure 7 shows the fraction of tests from damage sce-narios number 4 and 5, described in Table I, which were classified as damaged for each sensor location.For comparison, results from the undamaged test scenario (Test 01–45) are plotted also to illustrate thefalse positive rate.

The damage detection performance is excellent, for both major and minor damage scenarios, with100% detection rates at the damage location, and false positive rates below 5% at every sensor loca-tion. We can also observe that the major damage scenario, which is at the connection closest to sensor17, is detected by all 18 classifiers, while the minor damage detection rate drops at sensor locations

Figure 5. (a) Diagram of sensor locations on lab structure. (b) Photograph of lab structure. Schematic diagram andphotograph of instrumented lab structure.



farther from the damage location. However, we can see that the simple damage detection process doesnot provide any real information on the location of the damage. Damage detection results for all otherdamage scenarios show similar performance.

The results demonstrated in Figure 7 show a very low false positive rate under laboratory condi-tions. To further examine the robustness of the proposed methodology in higher noise environments,a supplementary investigation was conducted. To investigate the effect of noise, we artificially contam-inated our experimental data with varying levels of simulated white noise. Specifically, for a single

Table I. Experimental damage scenarios and locations.

Scenario Test number Damage type Damage location Excitation

1 01–45 No damage — Shaker white noise2 46–75 Two bolts loosened at connection Sensor 1 Shaker white noise3 76–105 Four bolts loosened at connection Sensor 1 Shaker white noise4 106–135 Two bolts loosened at connection Sensor 17 Shaker white noise5 136–165 Four bolts loosened at connection Sensor 17 Shaker white noise6 165–195 Four bolts loosened at both connections Sensor 17 Shaker white noise

Figure 6. Natural frequencies of the experimental structure.



sensor location (sensor 1), the undamaged test data was corrupted with artificial white noise at varioussignal to noise ratios. Although the experimental data already contains noise, for the purposes of thisinvestigation, we considered the experimental data to be pure signal. For a specific desired signal tonoise ratio, we then added random white noise of variance σ2 such that

σ2noiseσ2signal

¼ SNR (19)

where σ2signal is the variance of the zero mean acceleration signal, σ2noise is the variance of the simulatedGaussian white noise and SNR is the desired signal to noise ratio. The original data is taken from theundamaged structure, as is the data used to train the one-class kernel classifier. As the signal to noiseratio decreases, meaning higher noise contamination, we expect that the classifier will start to exhibit ahigher false positive rate, as the artificially noisy data becomes sufficiently anomalous compared withthe uncontaminated training data. The results of this investigation are shown in Figure 8, with the sig-nal to noise ratio plotted in decibels for convenience. We can see that as the signal to noise ratio dropsinto the low 30-dB range, the false positive rate starts to increase, but that at higher signal to noise ra-tios the false positive rate is very low. It should be noted that this circumstance is particularly challeng-ing, as none of the artificially noisy data is included in the baseline training data, and as such this can beviewed as an extreme case of noise contamination.

8. DAMAGE LOCALISATION RESULTS

As we have seen in Figure 7, the binary damage detection methodology does not typically provide uswith any information on the location of the damage. We will now present results using the localisation

Figure 7. Detection of damage at sensor 17.

Figure 8. Effect of artificial noise contamination on false positive rate.



index derived from the early stopping one-class kernel least squares, using wavelet-based features.Localisation results from all damage scenarios are shown subsequently.

8.1. Damage at sensor 1

In Figure 9, the average value of the LI for the minor and major damage scenario at sensor 1 is shown.There are 30 individual tests in each damaged scenario, so we average the values of the localisationindex returned for each test at each sensor location and the plot them as a column chart, with error barsindicating the standard deviation of the 30 tests. For both major and minor damage scenarios, thelocalisation index is highest as sensor 1, the closest sensor to the damage location.

8.2. Damage at sensor 17

The results of the damage localisation methodology, for the damage scenarios at sensor 17, are shownin this section. In Figure 10, the average value of the LI for both the major and minor damage scenariosat sensor 17 are shown. Again, for both major and minor damage scenarios, the highest average valueof the LI occurs at sensor 17, closest to the damage location. The error bars indicate that for the majordamage scenario the maximum value of the LI is likely to occur at sensor 17 for almost every test,while for the minor damage scenario some tests may show the maximum value at sensor 16, whichis 0.6m from the damage location.

8.3. Combined damage at sensor 1 and sensor 17

To create a damage scenario with multiple damage locations, four bolts were loosened at the connec-tion nearest to both sensor 1 and sensor 17. Referring to Figure 5, we can see that these damage

Figure 9. Localisation of damage at sensor 1 using wavelet features.

Figure 10. Localisation of damage at sensor 17 using wavelet features.



locations are on diagonally opposite corners of the three-story two-bay steel-frame structure, makingthe localisation of these damages a difficult proposition. Plotted in Figure 11 are the results of the LIfor this combined damage scenario. Here, we see that the values of the LI are highest at sensor 1and sensor 17, and considerably lower at other sensor locations, effectively identifying the locationof the damages.

To gain more insight into the damage detection and localisation process, and the nature of the datawe are analysing, we will now proceed to show selected visualisations of the data.

9. DATA VISUALISATION

To visualise how the data is distributed and the nature of the early stopping one-class kernel leastsquares decision boundary, we perform principle component analysis on the 30 dimensional featurevectors, retaining the two largest principle components. We then fit the classifier to this two-dimensional data to allow visualisation of the decision boundary.

9.1. Visualisation of damage scenarios at sensor 1

Baseline training data for sensor 1 and the decision boundary generated are plotted in Figure 12. Datafrom the baseline test case, the minor and major damage scenarios at sensor 1 are also shown. This vi-sualisation provides insight into how and why the damage detection and localisation process works.Referring to Figure 12, we can see that the baseline data and undamaged test data are in good agree-ment, coming from the same underlying system, leading to low false positive rates in damage detec-tion. In addition, we see that both major and minor damage are well detected by the decisionboundary and that the major damage is clearly much further from the baseline data.

Data from sensor 4 is plotted in Figure 13. This data, from a sensor farther from the damage loca-tion, allows us to understand how damage can be localised. In contrast to the data from sensor 1, wecan see that the major damage scenario data is much closer to the baseline data. This leads to a highervalue of the LI at sensor 1, and clear localisation of damage for the major damage scenario. Figure 13also illustrates some of the advantages the early stopping one-class kernel least squares classificationmethod demonstrated in this paper has over previously reported SHM pattern recognition algorithms.The data from sensor 4 is asymmetric and non-Gaussian, exhibiting a steep drop-off in density to thepositive y direction from the region of highest probability density. The data from the damaged scenar-ios lies quite close to the baseline data in a Euclidean space, but clearly in a region of low probability

Figure 11. Localisation of combined damage with wavelet features.



density. The early stopping one-class kernel least squares, because it is a nonparametric kernel method,allows us to identify these patterns and correctly classify the data from the damaged scenarios.

10. PARAMETRIC STUDIES

In this section, a number of parametric studies will be conducted to understand where improvements inefficiency to the methodology presented in this paper can be made. In order to easily evaluate globalperformance of the algorithm, we will first define a performance metric, which takes into account bothfalse positives and false negatives across the whole network of sensors. Because our system has 18 in-dividual components acting independently, we need to clarify exactly what the error rate is. We willdefine the false positive rate over all 18 individual sensor components, but the false negative rate willbe calculated only at the sensor location closest to the damage. The false positive rate in this experi-mental study is defined as the total fraction of tests from scenario 1 (the undamaged test scenario),which were classified as damaged, at all sensor locations. For the damage scenarios, the false negativerate is defined as the fraction of tests at the sensor location closest to the damage, which were classifiedas healthy. So for damage at sensor 1, this quantity is only calculated for sensor 1. We can use the F1

Figure 12. Two-dimensional visualisation of data and decision boundary from sensor 1, for the damage scenario atsensor 1.

Figure 13. Two-dimensional visualisation of data and decision boundary from sensor 4, for the damage scenario atsensor 1.



score as a metric that accounts for both the false positive and false negative rates:

F1 ¼ 2� SEN �SPCSEN þ SPC

SEN ¼ Fraction of damaged tests classified as damaged

SPC ¼ Fraction of undamaged tests classified as undamaged

A high sensitivity score indicates the ability to correctly identify damage at the sensor location,while a high specificity score indicates the ability to correctly classify healthy tests. The F1 score isthe harmonic mean of these two scores, and a higher F1 score indicates better performance of themethodology.

10.1. Sensitivity to sample size

Thus far, all results presented have been generated using a baseline training dataset of 258 samples. Inthis section, we will investigate what effect, if any, the size of the training dataset has on the damagedetection performance. To investigate what sample size we require before reliable damage detectionresults are obtained, the damage detection algorithm was executed using training sample sizes varyingfrom 1 to 258. Additionally, to investigate if the effect of sample sizes depended on the dimension ofthe feature vector, this analysis was carried out for three different feature vector types: a 15 dimen-sional feature vector of AR features, a 30 dimensional feature vector of combined AR and wavelet fea-tures, and a 45 dimensional feature vector, again of combined AR and wavelet features. The damagedetection performance was evaluated using the F1 metric. The results of this analysis are shown inFigure 14.

We can see that at sample sizes above approximately 50, the performance is very stable. Once thesample size goes below 50, the performance disimproves quickly. The effect of feature vector dimen-sion is not pronounced, but somewhat expectedly, the 45 dimensional features require a slightly biggersample size before reaching a stable, reliable damage detection performance level.

11. EFFECT OF DATA ACQUISITION PARAMETERS

A key consideration in the design of algorithms for smart sensing is the computational and memoryrequirements of the sensors. One particularly important decision is the choice of sampling frequency.When a signal moves from the analog domain to the digital domain, it must be discretised. How closelyspaced the discretisation is controls the sampling frequency. Lower sampling frequency will lead tolower power devices, but potentially with a loss in performance. Thus far, all results presented in thispaper were obtained from a signal sampled at 3000Hz. A parametric study on the effect of varying thesampling frequency on damage detection performance is presented in this section.

Figure 14. Effect of sample size on damage detection performance.



To investigate the effect of the sampling frequency on damage detection performance, the original3000-Hz data was resampled. The 3000-Hz data was downsampled to 2000, 1500, 1000, 750, 600,500, 400, 300, 200, 150, 120 and 100Hz. The damage detection algorithm was then executed on thedownsampled data. The F1 score was used to evaluate damage detection performance. Results areshown in Figure 15.

Figure 15 shows the variation in F1 score with increasing sampling frequency. The effect of sam-pling frequency on the damage detection performance is clear. Sampling rates of greater than1500Hz show relatively stable and high F1 scores, but as the sampling rate decreases from 1500Hz,the performance degrades; even a decrease from 1500 to 1000Hz shows a noticeable difference. Al-though most of the vibration of the laboratory structure occurs below 100Hz, it seems clear that detect-ing the damage relies on capturing higher frequency vibration features.

12. SUMMARY AND CONCLUSIONS

A novel one-class kernel classification algorithm for data-based structural damage detection has beenproposed in this paper. This algorithm is conceptually related to the OCSVM, and one-class kernelleast squares, but can be solved through a simple iterative procedure, rather than through quadratic pro-gramming or linear system solution. Under typical conditions, this leads to faster training and has theadditional advantage of simple implementation, and smaller code size, a significant benefit in an em-bedded context. Using this algorithm, a comprehensive, fully automated, decentralised damage detec-tion and localisation strategy for SHM has been developed. Raw acceleration data obtained fromsensors placed on a structure is first processed into damage-sensitive features. The proposed one-classkernel classifier is then applied to compare these features with baseline values from the intact structure.The efficacy of the algorithms has been demonstrated using data taken from a steel-frame laboratorystructure, under various different damage scenarios. In light of recent trends towards smart sensing net-works, which are capable of executing damage detection algorithms on board sensors, careful consid-eration of computational requirements has been given. AR and wavelet-based features were chosen forfeature extraction in this paper, as they meet the needs of a decentralised methodology.

Once the acceleration data has been processed into these damage-sensitive features, damage detec-tion is accomplished using the early stopping one-class kernel least squares algorithm, which returns abinary value indicating whether the structure is damaged or not. This approach has numerous advan-tages over previous approaches in the SHM literature, and is particularly suitable for embedding ona microcontroller. For all damage scenarios contained in the test data from the steel laboratory struc-ture, damage is detected extremely reliably, with a low false positive rate.

The effect of the sample size of baseline data was also investigated. Two hundred fifty-eight sam-ples from the intact laboratory structure were initially collected and used to train the classifier. To in-vestigate whether damage detection performance degrades when this sample size is smaller, thedamage detection algorithm was carried out on various smaller training data sets. Performance was

Figure 15. Effect of sampling frequency on damage detection performance.



stable above a sample size of 50, but quickly deteriorated below this number. An additional parametricstudy was carried out to investigate the effect of sampling frequency. The initial data was acquired at asampling frequency of 3000Hz. This data was downsampled to lower frequencies. As the samplingfrequency of the data decreased, the damage detection performance deteriorated significantly.

A binary decision is returned at each location by the early stopping one-class kernel least squares.This typically does not provide any information on the location of damage, and therefore a damagelocalisation methodology has been developed and presented here. This localisation methodology usesthe decision function of the early stopping kernel least squares as a localisation index, at each sensorlocation. Results from the laboratory structure data are promising.

In conclusion, a novel one-class classification algorithm for damage detection and localisation hasbeen proposed in this paper, and an automated, decentralised data-based damage detection andlocalisation methodology has been developed using this algorithm. The efficacy of the approach hasbeen evaluated on real experimental data from a steel-frame structure instrumented with accelerometersand induced with several different damage scenarios. With this method, damage is detected reliably,with a very low false alarm rate, and the location of damage is identified successfully.

ACKNOWLEDGEMENTS

The authors acknowledge the support provided by Royal Dutch Shell, and thank chief scientists Dr. Dirk Smit andDr. Sergio Kapust for their oversight of this work. Thanks to Justin Chen and Reza Mohammadi for their assis-tance in data collection. Also, thanks to Dr. Michael Feng and his team from the Charles Stark Draper Laboratoryfor their collaboration in the development of the laboratory structural model and sensor systems.

FUNDING

This research was supported by Royal Dutch Shell and by the Linde Family Foundation.

REFERENCES

1. Spencer BF, Ruiz-Sandoval ME, Kurata N. Smart sensing technology: opportunities and challenges. Structural Control andHealth Monitoring 2004; 11(4):349–368.

2. Lynch JP, Sundararajan A, Law KH, Kiremidjian AS, Carryer E. Embedding damage detection algorithms in a wirelesssensing unit for operational power efficiency. Smart Materials and Structures 2004; 13(4):800.

3. Nair KK, Kiremidjian AS. Time series based structural damage detection algorithm using Gaussian mixtures modeling.Journal of Dynamic Systems, Measurement, and Control 2007; 129(3):285–293.

4. Cortes C, Vapnik V. Support-vector networks. Machine Learning 1995; 20(3):273–297.5. Worden K, Manson G, Fieller N. Damage detection using outlier analysis. Journal of Sound and Vibration 2000; 229(3):

647–667.6. Gul M, Necati CF. Statistical pattern recognition for structural health monitoring using time series modeling: theory and

experimental verifications. Mechanical Systems and Signal Processing 2009; 23(7):2192–2204.7. Nair KK, Kiremidjian AS, Law KH. Time series-based damage detection and localization algorithm with application to the

ASCE benchmark structure. Journal of Sound and Vibration 2006; 291(1):349–368.8. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC. Estimating the support of a high-dimensional distribu-

tion. Neural Computation 2001; 13(7):1443–1471.9. Banerjee A, Burlina P, Diehl C. A support vector method for anomaly detection in hyperspectral imagery. EEE Transactions

on Geoscience and Remote Sensing 2006; 44(8):2282–2291.10. Manevitz LM, Yousef M. One-class SVMs for document classification. The Journal of Machine Learning Research 2002;

2:139–154.11. Gardner AB, Krieger AM, Vachtsevanos G, Litt B. One-class novelty detection for seizure analysis from intracranial EEG.

The Journal of Machine Learning Research 2006; 7:1025–1044.12. Oh CK, Sohn H. Damage diagnosis under environmental and operational variations using unsupervised support vector ma-

chine. Journal of Sound and Vibration 2009; 325(1):224–239.13. Hayton P, Utete S, King D, King S, Anuzis P, Tarassenko L. Static and dynamic novelty detection methods for jet engine

health monitoring. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and EngineeringSciences 2007; 365(1851):493–514.

14. Van Overschee P, De Moor B. Subspace identification for linear systems: Theory—Implementation—Applications.Springer Science & Business Media 2012.

15. Brincker R, Zhang L, Andersen P. Modal identification of output-only systems using frequency domain decomposition.Smart Materials and Structures 2001; 10(3):441.

16. Todd M, Nichols J, Pecora L, Virgin L. Vibration-based damage assessment utilizing state space geometry changes: localattractor variance ratio. Smart Materials and Structures 2001; 10(5):1000.

17. Loutridis S. Damage detection in gear systems using empirical mode decomposition. Engineering Structures 2004; 26(12):1833–1841.



18. Xu Y, Chen J. Structural damage detection using empirical mode decomposition: experimental investigation. Journal ofEngineering Mechanics 2004; 130(11):1279–1288.

19. Ying Y, Garrett JH Jr, Oppenheim IJ, Soibelman L, Harley JB, Shi J, Jin Y. Toward data-driven structural health monitoring:application of machine learning and signal processing to damage detection. Journal of Computing in Civil Engineering2012; 27(6):667–680.

20. Sohn H, Farrar CR. Damage diagnosis using time series analysis of vibration signals. Smart Materials and Structures 2001;10(3):446.

21. Figueiredo E, Figueiras J, Park G, Farrar CR, Worden K. Influence of the autoregressive model order on damage detection.Computer-Aided Civil and Infrastructure Engineering 2011; 26(3):225–238.

22. Nair KK, Kiremidjian AS. Derivation of a damage sensitive feature using the Haar wavelet transform. Journal of AppliedMechanics 2009; 76(6):061015-061015-9.

23. Young Noh H, Krishnan Nair K, Lignos DG, Kiremidjian AS. Use of wavelet-based damage-sensitive features for structuraldamage diagnosis using strong motion data. Journal of Structural Engineering 2011; 137(10):1215–1228.

24. Long J, Buyukozturk O. Automated structural damage detection using one-class machine learning. In Dynamics of CivilStructures, Volume 4. Springer, 2014; 117–128.

25. Khazai S, Homayouni S, Safari A, Mojaradi B. Anomaly detection in hyperspectral images based on an adaptive supportvector method. IEEE Geoscience and Remote Sensing Letters 2011; 8(4):646–650.

26. Choi YS. Least squares one-class support vector machine. Pattern Recognition Letters 2009; 30(13):1236–1240.27. Gerfo LL, Rosasco L, Odone F, Vito ED, Verri A. Spectral algorithms for supervised learning. Neural Computation 2008;

20(7):1873–1897.

A. NOTATION

The following symbols are used in this paper:

SYMBOLS

a wavelet scale numberb wavelet time shiftC regularisation parameterex(t) autoregressive residualEa wavelet energy at scale aHz Hertzj the imaginary unitN number of training examples given to classifierℝn real coordinate space of n dimensionssgn Signum functionK(x, y) kernel functiont timeX(a, b) wavelet transform at scale a and time bw coefficients describing classifier hyperplanex(t) vibration time seriesαj Lagrangian multiplierν OCSVM parameter controlling outlier fractionξ i slack variableρ constant bias term of hyperplaneσ Gaussian kernel parameterϕk autoregressive model coefficientsφ(x) feature space induced by kernelψ(t) wavelet functionω0 Morlet mother wavelet central frequency

ACRONYMS

AR autoregressiveARMA autoregressive moving averageASCE American Society of Civil EngineersCWT continuous wavelet transformDSF damage sensitive featureFFT fast Fourier transformLI localisation index



MEMS microelectromechanical systemsOCSVM one-class support vector machinePCA principle component analysisSHM Structural Health MonitoringSPR statistical pattern recognitionSVM support vector machineSVs support vectors



Documents

Decentralised one-class kernel classification-based damage detection and localisationweb.mit.edu/liss/papers/SCHM2016-Long.pdf?1.pdf · 2016-10-17 · Decentralised one-class kernel