Bearing Defect Identification

8/11/2019 Bearing Defect Identification

http://slidepdf.com/reader/full/bearing-defect-identification 1/7

Bearing defect identification based on acoustic

emission signals

Botond Cseke

Faculty of ScienceRadboud University Nijmegen

Email: [email protected]

Tom Heskes

Faculty of ScienceRadboud University Nijmegen

Email: [email protected]

Abstract— In this paper we classify seeded bearing defectsbased on acoustic emission data. We use data from recordingsof the experiment carried out by Al-Ghamd and Mba [1].The classification method is based on autoregression modelfeatures and acoustic emission features such as root mean square,maximum amplitude and kurtosis value. We use support vectormachines and k-nearest neighbor methods as classification tools.Autoregression model features improve significantly the results

obtained for acoustic emission features only.

I. INTRODUCTION

Acoustic emission signal analysis (AE) is a standard tool

for monitoring the “health” state of materials and therefore of

various mechanical equipments. Quoting from Ganji [2]

AE is the monitoring technique which analyses

elastic waves naturally generated above the human

hearing threshold (> 20 kHz). It is associated with

the range of phenomena which generate broadband

activity from the transient release of stored elastic

energy from localized sources. ...AE has been proven to be useful for condition monitoring of

bearing states. Ganji [2], Ganji and Holsnijders [3] provide an

AE–signal feature based interpretation – peak value, root of

mean squared values (RMS), kurtosis value, crest factor, form

factor, AE count – of lubrication conditions, while Jamaludin

and Mba [4], [5] provide an autoregression parameter based

clustering of acoustic emission signatures in case of slowly

rolling bearings. Recently, Al-Ghamd and Mba [1] conducted

an experiment for detecting the presence and size of seeded

defects in radially loaded bearings. Their analysis was based

on measuring signal features like: RMS, kurtosis, maximum

amplitude. We will briefly describe their experiment in section

III.

In this paper we use standard machine learning tools such

as support vector machines (SVM) and the k-nearest neighbor

(kNN) method to analyze and classify features extracted

from the AE signals recorded during the above mentioned

experiments.

Section II describes the feature extraction methods and the

machine learning tools we used. It has two parts: section II-A

presents AE signal features used in [2] and the autoregression

models (AR) while section II-B presents in brief the machine

learning tools and techniques employed in our analysis.

In section III we describe the dataset we worked with and

the experiment we conducted for classifying the AE signatures.

We end with a discussion and conclusion in sections IV and V.

II. FEATURES AND A LGORITHMS

This section is intended to give a brief description of the

framework in which we embedded the problem. We made use

of AE signal characteristics employed in [1]–[5] in order tocreate a set of AE signal features which can be used in a

classification task. We give a brief description of AE signal

features, AR models and support vector machines. Readers

interested in the results should skip this section or read it

later, if needed.

A. Features

Acoustic emission signal features: In his report Ganji [2]

classifies AE signals into 3 broad classes:

1. Burst activity: the signal has the form of a sequence of

transients and each of them can be roughly described

as an exponentially decaying sinusoidal. These burstsmay overlap and can have varying amplitudes and decay

factors. The most common method to detect the “arrival”

of bursts is to set a threshold value and check if and when

the signal value exceeds it.

2. Continuous activity: due to the high frequency of bursts

and the wide range of indistinguishable burst character-

istics (amplitude, decay factor) the signal has a random

oscillatory appearance.

3. Mixed mode activity: the burst activity is superimposed

on a continuous activity, meaning that some of the bursts

have distinguishable characteristics.

Because of the enormous amount and redundancy of data

that an AE sensor can provide, most of the monitoring tools

restrict themselves to the measurement of a few relevant

quantities. Empirical studies (see [2]) show that the most

important ones are:

• peak value, maxima of signal at peaks

• RMS value

• kurtosis value, characterization of signal value distribu-

tion by 4th order statistics

• crest factor, peak vale divided by RMS

• form factor, RMS value divided by mean value

• AE count, count of the burst events.



From these we have chosen to measure those that were also

measured in the experiment carried our by Al-Ghamd and Mba

(see [1]), i.e maximum amplitude or crest factor, root mean

square (we use the term power) and kurtosis value.

When dealing with time series data, one usually first verifies

whether or not the data can be modelled by autoregressive

(AR) processes. So it turned out that the AE signal recordings

of the Al-Ghamd and Mba experiment can be modelled byAR processes of second order. We give a brief introduction to

AR processes and summarize a few important characteristics

to be used later in this paper.

AR models for Time Series modeling: An autoregressive

process of order p – abbreviated by AR( p) – on a discrete

time domain is defined by the linear model

yt =

pj=1

φtyt−j + ǫt

where ǫt–s are normally distributed and independent. Usually

t starts at 1 and we have to specify the first p values of the

process – or their distribution. In the following we work withfinite time domain i.e t runs trough {1, . . . , T }.

Using the notation y = (y1:T ) and ǫt ∼ N (0, s), we can

write the probabilistic model in the form

p(y|Y p, φ , s) = p(y1: p)T

t= p+1

N (yt|φT y(t−1):(t− p), s)

where the parameters of the model are φ, s and the parameters

of the distribution for the first p terms.

For better understanding we can rewrite the model in a

vectorized form

p(y|Y p, φ , s) ∝ exp

−

(y −YT p φ)T (y −YT

p φ)

2s

where we have used the notation y = (y p+1, . . . , yT )T ,(Y p)i,: = y( p+i−1):i, i = 1, . . . , T − p and considered the

first p terms given.

We can perform both Maximum Likelihood (ML) and

Bayesian estimation of the model parameters. The ML method

is equivalent to the least square estimation yielding the pa-

rameter estimates φ = (YT p Y p)−1YT

p y and s = 1T − p(y −

Y pφ)T (y − Y pφ).

Bayesian estimation is usually performed with the so-called

reference or improper priors p(φ, s) ∝ 1/s. Calculating

p(φ, s|y,Y p) = p(y|Y p, φ , s) p(φ, s)

p(y|Y p)

one obtains that the posterior marginal of φ is a multivariate

Student–t distribution with T − 2 p degrees of freedom

p(φ|y,Y p) ∝

1 +

(φ − φ)T XT pX p(φ − φ)

(T − p)v

−(T − p)/2

which for large T values is roughly N (φ|φ, s(YT p Y p)−1). For

a more detailed description of parameter estimation in AR

models the reader is referred to [6].

In the following we give a short characterization of the

AR(2) processes in terms of autoregression parameters based

on [6]. An AR( p) process is stationary if the autoregression

polynomial defined by

Φ(u) = 1 −

p

j=1

φjuj

has roots with moduli greater than unity – in our case p = 2.

For simplicity by the term autoregression polynomial we will

refer to u pΦ(1/u). It is easy to see that the roots of the former

and the latter are reciprocals of each other. The stationarity

condition translated to AR(2) coefficients is as follows: −2 <φ1 < 2, φ1 < 1−φ2 and φ1 > φ2−1. The roots can be: (1) two

real roots if φ21 + 4φ2 ≥ 0 or (2) a pair of complex conjugate

roots if φ21 + 4φ2 < 0 (for an easy graphical representation

see figure 4). In the latter case the model behaves like an

exponentially damped cosine wave with phase and amplitude

characteristics varying in response to the noise ǫt. In order to

have both stationarity and complex roots the condition −1 <

φ2 < −φ21/4 must be satisfied. One may also verify that theforecast function E [yt+k|y1:t] has the form Ak cos(ωk + ϕ),

where A and ω are the moduli and phase of these complex

conjugate roots; ϕ is a phase translation.

B. Algorithms

In this section we will show how the probabilistic model

and its parameters can be used to characterize time series

data. In order to be able to define features related to the AR

model described in the previous section we present in brief

the classification tool we used during the data analysis.

Support Vector Machines: Support vector machines (SVM)

as classification tools have been widely used in machinelearning since the mid-nineties and their applications for

different types of problems are still active areas of research.

In the following we shall give a very brief description. For a

comprehensive tutorial interested readers are referred to [7].

SVMs come from an area of machine learning called

statistical learning theory (SLT). SLT classification deals with

the following task: given a set of data pairs {(xi, yi)}ni=1 with

xi-s belonging to some predefined set X and yi ∈ {−1, 1},

select a class of functions (from X to {−1, 1}) and a function

from that class for which the error function defined by the sum

of misclassifications and the complexity of the function class

is minimal. In general this procedure is done in two steps.

First we choose the class and then we choose the function –

from that class – which produces the smallest misclassification

error. Usually X is a Euclidean space and the function class

implemented by SVM is the class of linear separators i.e.sign(wT x+ b)|w ∈ X , b ∈ R

.

If the data is separable, the SVM chooses the linear sepa-

rator which produces the largest margin: it is equally close

to the convex hulls of the two sets or it has the smallest

average distance from the points. Otherwise, if the data is not

separable it optimizes both w.r.t. large margin and number of

misclassifications.



Finding the optimal hyperplane resumes to a quadratic

convex optimization problem. Once the optimum is found the

function value for a new input point x∗ is given by

f (x∗) = sign

ni=1

yiαixT i x∗

(1)

where the αi-s are dual optimal parameters of the problem.In general a high percentage of αi-s are zero, so the function

value can be calculated from the xi points corresponding to

non-zero αi-s. These vectors are called support vectors.

Another important characteristic of the hyperplane opti-

mization problem is that both the optimization procedure and

the calculation of function values involve only the scalar

product between the elements of X , therefore instead of the

usual Euclidean scalar product one may use other – non-linear

– scalar product functions too. Theoretically, this corresponds

to mapping the points of X into another space through the

eigenfunctions of the new scalar product and do the linear

separation there. The procedure is often called “kernel trick”

and leads to non-linear separating functions: denoting the

above mentioned new scalar product by K (·, ·) we can rewrite

equation 1 as

f (x∗) = sign

ni=1

yiαiK (xi,x∗)

.

Since the optimization is still carried out in X and the

only thing we need the data for is the calculation of the

pairwise scalar products, the algorithm is insensitive to the

dimensionality of the input space. Figure 1 visualizes two

SVM settings.

Fisher kernels for probabilistic models: It often happensthat the quality or size of the data does not allow us to use

it directly in SVM. Time series are a good example because

we often have sequences of different size or sequences that

are not aligned. We have a probability model for the inputs

and we would like to enhance the SVM using information

from this model. The SVM requires metric relations between

inputs, so our goal is to build such relations based on the

probability model. The first thing that naturally pops up is the

difference in log-likelihood values, but this only tells us about

the relation between the samples and the distribution (or its

parameters). To be able to capture the relation between the

samples one has to use the gradient space of the distribution

w.r.t. the parameters. For a given sample x, the gradient of the

log-likelihood ∂ ∂θ log p(x|θ)(≡ s(x; θ)) w.r.t. the parameters

tells us the direction and scale of change in parameter space

induced by x (in statistical literature this quantity is called

the Fisher score). Therefore, one may think that if for two

samples x and x′ the gradients s(x; θ) and s(x′; θ) are close

to each other, then it means that they generate the same change

in parameters and they can be assumed similar with regard to

that parameter or probabilistic model. Now, taking into account

the set of probability models { p(x|θ)|θ} two issues have to be

considered: (1) the Newton direction F (θ)−1s(x; θ) provides

Fig. 1. An example of a Linear SVM on a separable dataset (upper) and

radial basis SVM on a linearly non-separable dataset (lower). The 2 classes areplotted by ◦-s and ×-es, the solid curve represents the classification boundarycorespondig to the 0-level curve while the dashed curves represent the -1 and1 level curves. Contours around the points are proportional to the α values

of the point.

a theoretically better motivated measure of change in param-

eters; (2) the set of probability distributions parameterized by

θ has a local metric defined by F (θ). Here

F (θ) = −Eθ ∂ 2

∂θ∂θT log p(x|θ)

is the Fisher information matrix of the model.

Following this line of arguments Jaakkola and Haussler [8]

propose the scalar product

K (x,x′) = s(x; θ)T F (θ)−1s(x′; θ)

and the “easier to calculate” substitute K (x,x′) =s(x; θ)T s(x′; θ). It is easy to see that these simplify to the

usage of features F (θ)−1

2 s(x; θ) and s(x; θ) together with

the standard scalar product (from now on we will refer to

the former as Fisher features). For a detailed explanation the

reader is referred to [8].



With the aid of Fisher score and Fisher features we can

define AR model based features to be used with SVMs. The

calculation of Fisher score and Fisher matrix for AR models

is presented in the appendix.

III. EXPERIMENTAL RESULTS

In this section we describe in a nutshell the dataset we were

working on and present the results of our analysis. A. Description of experiments

Dataset: Our analysis is based on the dataset created by

A.M. Al-Ghamd and D.Mba [1]. In this paper the authors

investigate the relationship between AE signal RMS, ampli-

tude and kurtosis for a range of defect conditions like smooth

defects, point defects, line defects and rough defects.

The experiment was carried out on a Split Coper 01B40MEX

type 01C/40GR bearing with the following parameters: internal

bore diameter 40 mm, external diameter 84 mm, diameter of

roller 12 mm, diameter of roller centers 166 mm and number

of rollers 10. There were two measurement devices: an AE

sensor and a resonancy type accelerometer. For our analysiswe used only the AE signals.

For measuring AE signatures a piezoelectric AE sensor

(Physical Acoustic Corporation type WD) with operating

frequency range 100-1000 kHz was used. The sensor was

placed on the bearing housing and its pre-amplification was

set to 40dB. The signal output from the pre-amplifier was

connected to a data-acquisition card which provided sampling

rate of 10 MHz with 16-bit precision. There were anti-aliasing

filters (100 kHz-1.2 MHz) built into the data-acquisition card.

The broadband piezoelectric transducer was differentially con-

nected to the pre-amplifier. Sequences of 256000 data points

were recorded with sampling rates varying from 2 MHz to 8MHz, depending on the experiment type. In each experiment

around 20 such sequences were recorded.

There were two test programs (1) AE source identification

and defects of varying severities: five test conditions of varying

severities were simulated on the outer race of the test bearing;

the defects were positioned on the top-dead-center (2) defects

of varying sizes: a point defect was increased in length and

width in various ways.

In test program (1) there were 5 types of measurements as

follows:

(1) baseline defect-free operating conditions where the bear-

ing was operated with no defects;

(2) smooth defect with a surface discontinuity not influencing

the average surface roughness;

(3) point defect of size 0.85 × 0.85 mm2 (abbreviated from

now on by PD);

(4) line defect of size 5.6 × 1.2 mm2 (abbreviated from now

on by LD);

(5) rough defect of size 17.5 × 0.9 mm2 (abbreviated from

now on by RD).

There were 4 types of speed conditions: 600 rpm, 1000 rpm,

2000 rpm, 3000 rpm and 3 types of load conditions: 0.1 kN,

4.43 kN, 8,86 kN.

1.5

1.6

1.7

1.8

−0.95

−0.9

−0.85

−0.8

−0.75

0

1

2

3

x 10−3

φ1

φ2

M S E

Fig. 2. A plot of the ML parameter estimates. The axes correspond to the

φ1, φ2 and s parameters. Circles, squares and triangles correspond to the PD,LD and RD conditions.

Experiment design: Our analysis was carried out on the datarecorded from test program (1). We analyzed defect conditions

(3)-(5) and used only 10 data sequences of each defect, speed

and load conditions. Therefore we formulated it as a 3 class

classification problem with a dataset of 360 sequences of

length 256000 each.

According to the subsections of section II-A we calculated

a set of features from each sequence and we used them in the

subsequent analysis.

AR(2) models seemed to fit well the data sequences we used

(see figure 2), therefore we calculated 3 sets of AR related

features. These were:

(1) the ML parameter estimations of each sequence;

(2) the Fisher scores of each data sequence based on the AR

model;

(3) the Fisher features of each data sequence based on the

AR model;

(4) the amplitude and period of complex conjugate roots of

the autoregression polynomial.

In addition we also extracted the AE related features such as:

(5) power or RMS;

(6) kurtosis;

(7) maximum amplitude.

See section II-A and I for more details about these quantities.

In figure 2 we see the plots of the ML parameter estimates

for each observation sequence. As we can see, the values of

the mean squared error (MSE) are reasonably small, and the

parameters φ1 and φ2 vary from 1.4 to 1.8 and from −0.75to −0.95 respectively. According to the conditions in II-A

and as can be seen in figures 3 and 4, the measurements

are well approximated by stationary AR(2) processes and the

autoregressive polynomials have complex roots.

Figures 5 and 6 show the Fisher scores and the Fisher

features described in section II-B. It seems that these quantities

provide a better separation w.r.t. class attributes, but there is an

area of high concentration where all 3 classes overlap. This can



0.86 0.88 0.9 0.92 0.94 0.96

11

11.5

12

12.5

13

13.5

14

14.5

15

15.5

16

Amplitude (volts)

P e r i o d

( t i m e − s t e p s )

Fig. 3. Absolute value and wavelengths of the autoregressive polynomialroots. Circles, squares and triangles correspond to the PD, LD and RDconditions.

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

−1

−0.5

0

0.5

1

φ1

φ 2

Fig. 4. Characterization of AR(2) processes. Processes with coefficients

below the solid line correspond to stationary processes while the ones withinthe area separated by the dashed curve correspond to AR(2)-s with complexroots. The “patch” on the figure represents the ML parametes estimates forelements of the dataset in our consideration.

be due to the fact that all the scores and features are calculated

relative to the ML parameter estimates of the whole dataset.

We also measured the AE signal characteristics presented in

section II-A. The measurement results for the average signal

power are shown in figure 7. We observe that the signal power

increases both with defect severity and speed. For PD and RD

it also increases with the load, however for LD it seems to

show an interesting behavior: it peaks for the second load

condition.

The kurtosis values are plotted in figure 9. They “peak”

roughly at LD high speed and RD low speed and show slow

increase for PD and LD conditions and fast decay at RD

conditions. Its changes w.r.t. load conditions vary.

The measurements for the AE features are similar to the

ones presented in Al-Ghamd and Mba and therefore for more

−5

0

5

10x 10

5

−5

0

5

x 105

−10

−8

−6

−4

−2

0

x 105

dL/dφ2dL/dφ

1

d L / d v

Fig. 5. Fisher scores. Circles, squares and triangles correspond to the PD,

LD and RD conditions.

−10

0

10

−10

−5

0

5

−150

−100

−50

0

φ1 featureφ

2 feature

s f e a t u r e

Fig. 6. Fisher features. Circles, squares and triangles correspond to the PD,LD and RD conditions.

detailed explanations the reader is referred to [1].

B. Classification Results

Once the feature extraction part of the data analysis pro-

cedure was carried out, we used k-Nearest Neighbor (kNN)

and SVM methods to classify the data. Two types of SVMs

were used: (1) with linear scalar product (SVMlin), provid-

ing linear separation boundaries (2) with nonlinear scalar

product given by he radial basis function K (x,x′; σ) =exp(− 1

2σ2 ||x− x′||2) (abbreviated from now on by SVMrbf).

All these methods have some parameters to be tuned: the

parameter of kNN is the number of neighbors k and the

parameter of SVMlin is the percentage of allowed missclassi-

fications. SVMrbf has two parameters: percentage of allowed

missclassifications and the scalar product parameter σ .

Since these methods are designed for dealing with 2 class

problems only, we employed the one against the rest classifi-

cation method: used 3 different classifiers of the same type to

separate one class from the others. Prediction for a new input



0 50 100 150 200 250 300 350

−12

−11

−10

−9

−8

−7

−6

−5

−4

−3

−2

Samples

L o g p

o w e r ( v o l t s )

Fig. 7. Logarithm of the power (volts). The 360 examples are divided inthe following way: every 120 represent a defect condition (PD, LD and RDin order) then within these every 30 represents a speed condition (600 rpm,1000 rpm, 2000 rpm and 3000 rpm) then within these every 10 represent aload condition (0.1 kN, 4.43 kN, and 8,86 kN). For example the 10 samples

with LD, 2000 rpm and 8,86 kN can be found between positions 200-210.

0 50 100 150 200 250 300 350−4

−3

−2

−1

0

1

2

3

Samples

L

o g m a x i m u m a

m p l i t u d e ( v o l t s )

Fig. 8. Logarithm of maximum amplitude. Same sample identificationmethod applies like in the case of figure 7

is made by voting.

In order to test the methods we used 10 times 5-fold cross-

validation and analyzed the mean value of the classification

error. (The n-times cross-validation method is used both for

testing and model fitting: we split the data set into n folds,

then we fit the model’s parameters on the first n − 1 ones and

test the model’s prediction performance on the nth one, we

repeat the procedure by circularly permuting the folds. The

procedure is finished when we have performed all n possible

cases and averaged the classification error.)

We have used 2 types of settings: (1) when features are

considered alone (2) the AR related and AE related features

are used together.

The results of the classification task are shown in table I.

0 50 100 150 200 250 300 3501

2

3

4

5

6

7

Samples

L o g k

u r t o s i s

Fig. 9. Logarithm of kurtosis. Same sample identification method applieslike in the case of figure 7

kNN SVMlin SVMrbf

log AE features 0.244 0.481 0.228AR 0.180 0.285 0.168AR roots 0.168 0.278 0.162Fisher score 0.145 0.431 0.235Fisher features 0.118 0.428 0.227

AR and log AE features 0.106 0.204 0.089AR roots and log AE features 0.093 0.181 0.081Fisher score and log AE features 0.173 0.266 0.175Fisher features and log AE features 0.173 0.267 0.161

TABLE I

CLASSIFICATION ERRORS.

IV. DISCUSSION

As we can see in table I, AR model based features perform

better than the AE signal based features. When combined they

produce better results than separately. The best performances

are achieved with the AR coefficients and the amplitude

and period given by the complex conjugate roots of the

autoregression polynomial.

Overall, the plain AR parameters or their corresponding root

characteristics seem to yield better classification performance

than the Fisher scores and Fisher features. This may be

due to the fact that the AR parameters themselves are more

homogeneously distributed (compare figure 2 with figures

5 and 6), which makes it easier to separate them. kNN’s

performance is less sensitive to inhomogeneity: it takes into

account the k nearest neighbors, no matter how far these are

apart. This might explain why Fisher scores and Fisher features

do much better for kNN than for SVMrbf. Apart from that,

the performance of kNN and SVMrbf is roughly the same.

For calculating the function value for a new input kNN

uses all the data in the dataset (of features), while SVM uses

only a fraction of them (the support vectors, see section II-B).

Because of its good performance but high cost kNN is only

used as a benchmark method. The SVM results are considered

more relevant. As we can see in table I the best performance



achieved with SVMs is around 90% classification rate.

V. CONCLUSION

In our analysis we focused on the classification of bear-

ing defects based on acoustic emission signals. We brought

together the probabilistic model related features with the AE

features and used them jointly to complete the task. We can

conclude that using both improves classification performance.Our future goal is to improve performance with the introduc-

tion of frequency and “burst–form” based features and to use

methods that are computationally less expensive.

ACKNOWLEDGMENTS

The authors would like to thank Ali Ganji and Bas van der

Vorst for supervising the work and Abdullah M. Al-Ghamd

and David Mba for providing the data.

REFERENCES

[1] A. M. Al-Ghmad and D. Mba, “A comparative experimental study of

the use of acoustic emission and vibration analysis for bearing defect

identification and estimation of defect size,” Mechanical Systems and Signal Processing, vol. 20, pp. 1537–1571, 2006.

[2] A. Ganji, “Acoustic emission to assess bearing lubrication condition: a

pre-study,” SKF E.R.C., Tech. Rep., 2003.[3] A. Ganji and J. Holsnijders, “Acoustic emission measurements focused

on bearing lubrication,” SKF E.R.C., Tech. Rep., 2004.[4] N. Jamaludin and D. Mba, “Monitoring extremely slowly rolling element

bearings: part I,” NDT&E International, vol. 35, pp. 349–358, 2002.

[5] ——, “Monitoring extremely slowly rolling element bearings: part II,” NDT&E International, vol. 35, pp. 359–366, 2002.

[6] R. Prado and M. West, “Time series modelling, inference and forecasting,”2005, manuscript. (It can be found on M.West’s webpage.).

[7] C. J. C. Burges, “A tutorial on support vector machines for pattern

recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2.[8] T. Jaakkola and D. Haussler, “Exploiting generative models in discrim-

inative classifiers,” in Proceedings of the 1998 conference on Advances

in neural information processing systems II , 1999, pp. 487 – 493.

APPENDIX

FISHER SCORE AND F ISHER M ATRIX FOR A R( p) MODELS

In the sequel we present the calculation of the Fisher score

and Fisher matrix for AR( p) models. For ease in computation

instead of s we use the so-called precision parameter v =log(1/s) and define

L(φ, v) = log p(y|Y p, φ , v).

The Fisher score is given by

∂

∂φ L(φ, v) = exp(v)(YT

p Y p)(φ(y,Y p) − φ)∂

∂vL(φ, v) = −

1

2 exp(v)Q(y, φ;Y p) +

T

2

and the elements of Fisher matrix are

−E

∂ 2

∂φ∂φT L(φ, v)

= exp(v)YT

p Y p

−E

∂ 2

∂φ∂vL(φ, v)

= 0

−E

∂ 2

∂v2L(φ, v)

=

1

2.

We assumed that the sequences in the dataset are indepen-

dently sampled, therefore the Fisher matrix of the model for

the whole dataset is given by the sum of Fisher matrices of

each sample.

Documents

Bearing Defect Identification