Upload
mhsgh2003
View
216
Download
0
Embed Size (px)
Citation preview
8/11/2019 Bearing Defect Identification
http://slidepdf.com/reader/full/bearing-defect-identification 1/7
Bearing defect identification based on acoustic
emission signals
Botond Cseke
Faculty of ScienceRadboud University Nijmegen
Email: [email protected]
Tom Heskes
Faculty of ScienceRadboud University Nijmegen
Email: [email protected]
Abstract— In this paper we classify seeded bearing defectsbased on acoustic emission data. We use data from recordingsof the experiment carried out by Al-Ghamd and Mba [1].The classification method is based on autoregression modelfeatures and acoustic emission features such as root mean square,maximum amplitude and kurtosis value. We use support vectormachines and k-nearest neighbor methods as classification tools.Autoregression model features improve significantly the results
obtained for acoustic emission features only.
I. INTRODUCTION
Acoustic emission signal analysis (AE) is a standard tool
for monitoring the “health” state of materials and therefore of
various mechanical equipments. Quoting from Ganji [2]
AE is the monitoring technique which analyses
elastic waves naturally generated above the human
hearing threshold (> 20 kHz). It is associated with
the range of phenomena which generate broadband
activity from the transient release of stored elastic
energy from localized sources. ...AE has been proven to be useful for condition monitoring of
bearing states. Ganji [2], Ganji and Holsnijders [3] provide an
AE–signal feature based interpretation – peak value, root of
mean squared values (RMS), kurtosis value, crest factor, form
factor, AE count – of lubrication conditions, while Jamaludin
and Mba [4], [5] provide an autoregression parameter based
clustering of acoustic emission signatures in case of slowly
rolling bearings. Recently, Al-Ghamd and Mba [1] conducted
an experiment for detecting the presence and size of seeded
defects in radially loaded bearings. Their analysis was based
on measuring signal features like: RMS, kurtosis, maximum
amplitude. We will briefly describe their experiment in section
III.
In this paper we use standard machine learning tools such
as support vector machines (SVM) and the k-nearest neighbor
(kNN) method to analyze and classify features extracted
from the AE signals recorded during the above mentioned
experiments.
Section II describes the feature extraction methods and the
machine learning tools we used. It has two parts: section II-A
presents AE signal features used in [2] and the autoregression
models (AR) while section II-B presents in brief the machine
learning tools and techniques employed in our analysis.
In section III we describe the dataset we worked with and
the experiment we conducted for classifying the AE signatures.
We end with a discussion and conclusion in sections IV and V.
II. FEATURES AND A LGORITHMS
This section is intended to give a brief description of the
framework in which we embedded the problem. We made use
of AE signal characteristics employed in [1]–[5] in order tocreate a set of AE signal features which can be used in a
classification task. We give a brief description of AE signal
features, AR models and support vector machines. Readers
interested in the results should skip this section or read it
later, if needed.
A. Features
Acoustic emission signal features: In his report Ganji [2]
classifies AE signals into 3 broad classes:
1. Burst activity: the signal has the form of a sequence of
transients and each of them can be roughly described
as an exponentially decaying sinusoidal. These burstsmay overlap and can have varying amplitudes and decay
factors. The most common method to detect the “arrival”
of bursts is to set a threshold value and check if and when
the signal value exceeds it.
2. Continuous activity: due to the high frequency of bursts
and the wide range of indistinguishable burst character-
istics (amplitude, decay factor) the signal has a random
oscillatory appearance.
3. Mixed mode activity: the burst activity is superimposed
on a continuous activity, meaning that some of the bursts
have distinguishable characteristics.
Because of the enormous amount and redundancy of data
that an AE sensor can provide, most of the monitoring tools
restrict themselves to the measurement of a few relevant
quantities. Empirical studies (see [2]) show that the most
important ones are:
• peak value, maxima of signal at peaks
• RMS value
• kurtosis value, characterization of signal value distribu-
tion by 4th order statistics
• crest factor, peak vale divided by RMS
• form factor, RMS value divided by mean value
• AE count, count of the burst events.
8/11/2019 Bearing Defect Identification
http://slidepdf.com/reader/full/bearing-defect-identification 2/7
From these we have chosen to measure those that were also
measured in the experiment carried our by Al-Ghamd and Mba
(see [1]), i.e maximum amplitude or crest factor, root mean
square (we use the term power) and kurtosis value.
When dealing with time series data, one usually first verifies
whether or not the data can be modelled by autoregressive
(AR) processes. So it turned out that the AE signal recordings
of the Al-Ghamd and Mba experiment can be modelled byAR processes of second order. We give a brief introduction to
AR processes and summarize a few important characteristics
to be used later in this paper.
AR models for Time Series modeling: An autoregressive
process of order p – abbreviated by AR( p) – on a discrete
time domain is defined by the linear model
yt =
pj=1
φtyt−j + ǫt
where ǫt–s are normally distributed and independent. Usually
t starts at 1 and we have to specify the first p values of the
process – or their distribution. In the following we work withfinite time domain i.e t runs trough {1, . . . , T }.
Using the notation y = (y1:T ) and ǫt ∼ N (0, s), we can
write the probabilistic model in the form
p(y|Y p, φ , s) = p(y1: p)T
t= p+1
N (yt|φT y(t−1):(t− p), s)
where the parameters of the model are φ, s and the parameters
of the distribution for the first p terms.
For better understanding we can rewrite the model in a
vectorized form
p(y|Y p, φ , s) ∝ exp
−
(y −YT p φ)T (y −YT
p φ)
2s
where we have used the notation y = (y p+1, . . . , yT )T ,(Y p)i,: = y( p+i−1):i, i = 1, . . . , T − p and considered the
first p terms given.
We can perform both Maximum Likelihood (ML) and
Bayesian estimation of the model parameters. The ML method
is equivalent to the least square estimation yielding the pa-
rameter estimates φ = (YT p Y p)−1YT
p y and s = 1T − p(y −
Y pφ)T (y − Y pφ).
Bayesian estimation is usually performed with the so-called
reference or improper priors p(φ, s) ∝ 1/s. Calculating
p(φ, s|y,Y p) = p(y|Y p, φ , s) p(φ, s)
p(y|Y p)
one obtains that the posterior marginal of φ is a multivariate
Student–t distribution with T − 2 p degrees of freedom
p(φ|y,Y p) ∝
1 +
(φ − φ)T XT pX p(φ − φ)
(T − p)v
−(T − p)/2
which for large T values is roughly N (φ|φ, s(YT p Y p)−1). For
a more detailed description of parameter estimation in AR
models the reader is referred to [6].
In the following we give a short characterization of the
AR(2) processes in terms of autoregression parameters based
on [6]. An AR( p) process is stationary if the autoregression
polynomial defined by
Φ(u) = 1 −
p
j=1
φjuj
has roots with moduli greater than unity – in our case p = 2.
For simplicity by the term autoregression polynomial we will
refer to u pΦ(1/u). It is easy to see that the roots of the former
and the latter are reciprocals of each other. The stationarity
condition translated to AR(2) coefficients is as follows: −2 <φ1 < 2, φ1 < 1−φ2 and φ1 > φ2−1. The roots can be: (1) two
real roots if φ21 + 4φ2 ≥ 0 or (2) a pair of complex conjugate
roots if φ21 + 4φ2 < 0 (for an easy graphical representation
see figure 4). In the latter case the model behaves like an
exponentially damped cosine wave with phase and amplitude
characteristics varying in response to the noise ǫt. In order to
have both stationarity and complex roots the condition −1 <
φ2 < −φ21/4 must be satisfied. One may also verify that theforecast function E [yt+k|y1:t] has the form Ak cos(ωk + ϕ),
where A and ω are the moduli and phase of these complex
conjugate roots; ϕ is a phase translation.
B. Algorithms
In this section we will show how the probabilistic model
and its parameters can be used to characterize time series
data. In order to be able to define features related to the AR
model described in the previous section we present in brief
the classification tool we used during the data analysis.
Support Vector Machines: Support vector machines (SVM)
as classification tools have been widely used in machinelearning since the mid-nineties and their applications for
different types of problems are still active areas of research.
In the following we shall give a very brief description. For a
comprehensive tutorial interested readers are referred to [7].
SVMs come from an area of machine learning called
statistical learning theory (SLT). SLT classification deals with
the following task: given a set of data pairs {(xi, yi)}ni=1 with
xi-s belonging to some predefined set X and yi ∈ {−1, 1},
select a class of functions (from X to {−1, 1}) and a function
from that class for which the error function defined by the sum
of misclassifications and the complexity of the function class
is minimal. In general this procedure is done in two steps.
First we choose the class and then we choose the function –
from that class – which produces the smallest misclassification
error. Usually X is a Euclidean space and the function class
implemented by SVM is the class of linear separators i.e.sign(wT x+ b)|w ∈ X , b ∈ R
.
If the data is separable, the SVM chooses the linear sepa-
rator which produces the largest margin: it is equally close
to the convex hulls of the two sets or it has the smallest
average distance from the points. Otherwise, if the data is not
separable it optimizes both w.r.t. large margin and number of
misclassifications.
8/11/2019 Bearing Defect Identification
http://slidepdf.com/reader/full/bearing-defect-identification 3/7
Finding the optimal hyperplane resumes to a quadratic
convex optimization problem. Once the optimum is found the
function value for a new input point x∗ is given by
f (x∗) = sign
ni=1
yiαixT i x∗
(1)
where the αi-s are dual optimal parameters of the problem.In general a high percentage of αi-s are zero, so the function
value can be calculated from the xi points corresponding to
non-zero αi-s. These vectors are called support vectors.
Another important characteristic of the hyperplane opti-
mization problem is that both the optimization procedure and
the calculation of function values involve only the scalar
product between the elements of X , therefore instead of the
usual Euclidean scalar product one may use other – non-linear
– scalar product functions too. Theoretically, this corresponds
to mapping the points of X into another space through the
eigenfunctions of the new scalar product and do the linear
separation there. The procedure is often called “kernel trick”
and leads to non-linear separating functions: denoting the
above mentioned new scalar product by K (·, ·) we can rewrite
equation 1 as
f (x∗) = sign
ni=1
yiαiK (xi,x∗)
.
Since the optimization is still carried out in X and the
only thing we need the data for is the calculation of the
pairwise scalar products, the algorithm is insensitive to the
dimensionality of the input space. Figure 1 visualizes two
SVM settings.
Fisher kernels for probabilistic models: It often happensthat the quality or size of the data does not allow us to use
it directly in SVM. Time series are a good example because
we often have sequences of different size or sequences that
are not aligned. We have a probability model for the inputs
and we would like to enhance the SVM using information
from this model. The SVM requires metric relations between
inputs, so our goal is to build such relations based on the
probability model. The first thing that naturally pops up is the
difference in log-likelihood values, but this only tells us about
the relation between the samples and the distribution (or its
parameters). To be able to capture the relation between the
samples one has to use the gradient space of the distribution
w.r.t. the parameters. For a given sample x, the gradient of the
log-likelihood ∂ ∂θ log p(x|θ)(≡ s(x; θ)) w.r.t. the parameters
tells us the direction and scale of change in parameter space
induced by x (in statistical literature this quantity is called
the Fisher score). Therefore, one may think that if for two
samples x and x′ the gradients s(x; θ) and s(x′; θ) are close
to each other, then it means that they generate the same change
in parameters and they can be assumed similar with regard to
that parameter or probabilistic model. Now, taking into account
the set of probability models { p(x|θ)|θ} two issues have to be
considered: (1) the Newton direction F (θ)−1s(x; θ) provides
Fig. 1. An example of a Linear SVM on a separable dataset (upper) and
radial basis SVM on a linearly non-separable dataset (lower). The 2 classes areplotted by ◦-s and ×-es, the solid curve represents the classification boundarycorespondig to the 0-level curve while the dashed curves represent the -1 and1 level curves. Contours around the points are proportional to the α values
of the point.
a theoretically better motivated measure of change in param-
eters; (2) the set of probability distributions parameterized by
θ has a local metric defined by F (θ). Here
F (θ) = −Eθ ∂ 2
∂θ∂θT log p(x|θ)
is the Fisher information matrix of the model.
Following this line of arguments Jaakkola and Haussler [8]
propose the scalar product
K (x,x′) = s(x; θ)T F (θ)−1s(x′; θ)
and the “easier to calculate” substitute K (x,x′) =s(x; θ)T s(x′; θ). It is easy to see that these simplify to the
usage of features F (θ)−1
2 s(x; θ) and s(x; θ) together with
the standard scalar product (from now on we will refer to
the former as Fisher features). For a detailed explanation the
reader is referred to [8].
8/11/2019 Bearing Defect Identification
http://slidepdf.com/reader/full/bearing-defect-identification 4/7
With the aid of Fisher score and Fisher features we can
define AR model based features to be used with SVMs. The
calculation of Fisher score and Fisher matrix for AR models
is presented in the appendix.
III. EXPERIMENTAL RESULTS
In this section we describe in a nutshell the dataset we were
working on and present the results of our analysis. A. Description of experiments
Dataset: Our analysis is based on the dataset created by
A.M. Al-Ghamd and D.Mba [1]. In this paper the authors
investigate the relationship between AE signal RMS, ampli-
tude and kurtosis for a range of defect conditions like smooth
defects, point defects, line defects and rough defects.
The experiment was carried out on a Split Coper 01B40MEX
type 01C/40GR bearing with the following parameters: internal
bore diameter 40 mm, external diameter 84 mm, diameter of
roller 12 mm, diameter of roller centers 166 mm and number
of rollers 10. There were two measurement devices: an AE
sensor and a resonancy type accelerometer. For our analysiswe used only the AE signals.
For measuring AE signatures a piezoelectric AE sensor
(Physical Acoustic Corporation type WD) with operating
frequency range 100-1000 kHz was used. The sensor was
placed on the bearing housing and its pre-amplification was
set to 40dB. The signal output from the pre-amplifier was
connected to a data-acquisition card which provided sampling
rate of 10 MHz with 16-bit precision. There were anti-aliasing
filters (100 kHz-1.2 MHz) built into the data-acquisition card.
The broadband piezoelectric transducer was differentially con-
nected to the pre-amplifier. Sequences of 256000 data points
were recorded with sampling rates varying from 2 MHz to 8MHz, depending on the experiment type. In each experiment
around 20 such sequences were recorded.
There were two test programs (1) AE source identification
and defects of varying severities: five test conditions of varying
severities were simulated on the outer race of the test bearing;
the defects were positioned on the top-dead-center (2) defects
of varying sizes: a point defect was increased in length and
width in various ways.
In test program (1) there were 5 types of measurements as
follows:
(1) baseline defect-free operating conditions where the bear-
ing was operated with no defects;
(2) smooth defect with a surface discontinuity not influencing
the average surface roughness;
(3) point defect of size 0.85 × 0.85 mm2 (abbreviated from
now on by PD);
(4) line defect of size 5.6 × 1.2 mm2 (abbreviated from now
on by LD);
(5) rough defect of size 17.5 × 0.9 mm2 (abbreviated from
now on by RD).
There were 4 types of speed conditions: 600 rpm, 1000 rpm,
2000 rpm, 3000 rpm and 3 types of load conditions: 0.1 kN,
4.43 kN, 8,86 kN.
1.5
1.6
1.7
1.8
−0.95
−0.9
−0.85
−0.8
−0.75
0
1
2
3
x 10−3
φ1
φ2
M S E
Fig. 2. A plot of the ML parameter estimates. The axes correspond to the
φ1, φ2 and s parameters. Circles, squares and triangles correspond to the PD,LD and RD conditions.
Experiment design: Our analysis was carried out on the datarecorded from test program (1). We analyzed defect conditions
(3)-(5) and used only 10 data sequences of each defect, speed
and load conditions. Therefore we formulated it as a 3 class
classification problem with a dataset of 360 sequences of
length 256000 each.
According to the subsections of section II-A we calculated
a set of features from each sequence and we used them in the
subsequent analysis.
AR(2) models seemed to fit well the data sequences we used
(see figure 2), therefore we calculated 3 sets of AR related
features. These were:
(1) the ML parameter estimations of each sequence;
(2) the Fisher scores of each data sequence based on the AR
model;
(3) the Fisher features of each data sequence based on the
AR model;
(4) the amplitude and period of complex conjugate roots of
the autoregression polynomial.
In addition we also extracted the AE related features such as:
(5) power or RMS;
(6) kurtosis;
(7) maximum amplitude.
See section II-A and I for more details about these quantities.
In figure 2 we see the plots of the ML parameter estimates
for each observation sequence. As we can see, the values of
the mean squared error (MSE) are reasonably small, and the
parameters φ1 and φ2 vary from 1.4 to 1.8 and from −0.75to −0.95 respectively. According to the conditions in II-A
and as can be seen in figures 3 and 4, the measurements
are well approximated by stationary AR(2) processes and the
autoregressive polynomials have complex roots.
Figures 5 and 6 show the Fisher scores and the Fisher
features described in section II-B. It seems that these quantities
provide a better separation w.r.t. class attributes, but there is an
area of high concentration where all 3 classes overlap. This can
8/11/2019 Bearing Defect Identification
http://slidepdf.com/reader/full/bearing-defect-identification 5/7
0.86 0.88 0.9 0.92 0.94 0.96
11
11.5
12
12.5
13
13.5
14
14.5
15
15.5
16
Amplitude (volts)
P e r i o d
( t i m e − s t e p s )
Fig. 3. Absolute value and wavelengths of the autoregressive polynomialroots. Circles, squares and triangles correspond to the PD, LD and RDconditions.
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
−1
−0.5
0
0.5
1
φ1
φ 2
Fig. 4. Characterization of AR(2) processes. Processes with coefficients
below the solid line correspond to stationary processes while the ones withinthe area separated by the dashed curve correspond to AR(2)-s with complexroots. The “patch” on the figure represents the ML parametes estimates forelements of the dataset in our consideration.
be due to the fact that all the scores and features are calculated
relative to the ML parameter estimates of the whole dataset.
We also measured the AE signal characteristics presented in
section II-A. The measurement results for the average signal
power are shown in figure 7. We observe that the signal power
increases both with defect severity and speed. For PD and RD
it also increases with the load, however for LD it seems to
show an interesting behavior: it peaks for the second load
condition.
The kurtosis values are plotted in figure 9. They “peak”
roughly at LD high speed and RD low speed and show slow
increase for PD and LD conditions and fast decay at RD
conditions. Its changes w.r.t. load conditions vary.
The measurements for the AE features are similar to the
ones presented in Al-Ghamd and Mba and therefore for more
−5
0
5
10x 10
5
−5
0
5
x 105
−10
−8
−6
−4
−2
0
x 105
dL/dφ2dL/dφ
1
d L / d v
Fig. 5. Fisher scores. Circles, squares and triangles correspond to the PD,
LD and RD conditions.
−10
0
10
−10
−5
0
5
−150
−100
−50
0
φ1 featureφ
2 feature
s f e a t u r e
Fig. 6. Fisher features. Circles, squares and triangles correspond to the PD,LD and RD conditions.
detailed explanations the reader is referred to [1].
B. Classification Results
Once the feature extraction part of the data analysis pro-
cedure was carried out, we used k-Nearest Neighbor (kNN)
and SVM methods to classify the data. Two types of SVMs
were used: (1) with linear scalar product (SVMlin), provid-
ing linear separation boundaries (2) with nonlinear scalar
product given by he radial basis function K (x,x′; σ) =exp(− 1
2σ2 ||x− x′||2) (abbreviated from now on by SVMrbf).
All these methods have some parameters to be tuned: the
parameter of kNN is the number of neighbors k and the
parameter of SVMlin is the percentage of allowed missclassi-
fications. SVMrbf has two parameters: percentage of allowed
missclassifications and the scalar product parameter σ .
Since these methods are designed for dealing with 2 class
problems only, we employed the one against the rest classifi-
cation method: used 3 different classifiers of the same type to
separate one class from the others. Prediction for a new input
8/11/2019 Bearing Defect Identification
http://slidepdf.com/reader/full/bearing-defect-identification 6/7
0 50 100 150 200 250 300 350
−12
−11
−10
−9
−8
−7
−6
−5
−4
−3
−2
Samples
L o g p
o w e r ( v o l t s )
Fig. 7. Logarithm of the power (volts). The 360 examples are divided inthe following way: every 120 represent a defect condition (PD, LD and RDin order) then within these every 30 represents a speed condition (600 rpm,1000 rpm, 2000 rpm and 3000 rpm) then within these every 10 represent aload condition (0.1 kN, 4.43 kN, and 8,86 kN). For example the 10 samples
with LD, 2000 rpm and 8,86 kN can be found between positions 200-210.
0 50 100 150 200 250 300 350−4
−3
−2
−1
0
1
2
3
Samples
L
o g m a x i m u m a
m p l i t u d e ( v o l t s )
Fig. 8. Logarithm of maximum amplitude. Same sample identificationmethod applies like in the case of figure 7
is made by voting.
In order to test the methods we used 10 times 5-fold cross-
validation and analyzed the mean value of the classification
error. (The n-times cross-validation method is used both for
testing and model fitting: we split the data set into n folds,
then we fit the model’s parameters on the first n − 1 ones and
test the model’s prediction performance on the nth one, we
repeat the procedure by circularly permuting the folds. The
procedure is finished when we have performed all n possible
cases and averaged the classification error.)
We have used 2 types of settings: (1) when features are
considered alone (2) the AR related and AE related features
are used together.
The results of the classification task are shown in table I.
0 50 100 150 200 250 300 3501
2
3
4
5
6
7
Samples
L o g k
u r t o s i s
Fig. 9. Logarithm of kurtosis. Same sample identification method applieslike in the case of figure 7
kNN SVMlin SVMrbf
log AE features 0.244 0.481 0.228AR 0.180 0.285 0.168AR roots 0.168 0.278 0.162Fisher score 0.145 0.431 0.235Fisher features 0.118 0.428 0.227
AR and log AE features 0.106 0.204 0.089AR roots and log AE features 0.093 0.181 0.081Fisher score and log AE features 0.173 0.266 0.175Fisher features and log AE features 0.173 0.267 0.161
TABLE I
CLASSIFICATION ERRORS.
IV. DISCUSSION
As we can see in table I, AR model based features perform
better than the AE signal based features. When combined they
produce better results than separately. The best performances
are achieved with the AR coefficients and the amplitude
and period given by the complex conjugate roots of the
autoregression polynomial.
Overall, the plain AR parameters or their corresponding root
characteristics seem to yield better classification performance
than the Fisher scores and Fisher features. This may be
due to the fact that the AR parameters themselves are more
homogeneously distributed (compare figure 2 with figures
5 and 6), which makes it easier to separate them. kNN’s
performance is less sensitive to inhomogeneity: it takes into
account the k nearest neighbors, no matter how far these are
apart. This might explain why Fisher scores and Fisher features
do much better for kNN than for SVMrbf. Apart from that,
the performance of kNN and SVMrbf is roughly the same.
For calculating the function value for a new input kNN
uses all the data in the dataset (of features), while SVM uses
only a fraction of them (the support vectors, see section II-B).
Because of its good performance but high cost kNN is only
used as a benchmark method. The SVM results are considered
more relevant. As we can see in table I the best performance
8/11/2019 Bearing Defect Identification
http://slidepdf.com/reader/full/bearing-defect-identification 7/7
achieved with SVMs is around 90% classification rate.
V. CONCLUSION
In our analysis we focused on the classification of bear-
ing defects based on acoustic emission signals. We brought
together the probabilistic model related features with the AE
features and used them jointly to complete the task. We can
conclude that using both improves classification performance.Our future goal is to improve performance with the introduc-
tion of frequency and “burst–form” based features and to use
methods that are computationally less expensive.
ACKNOWLEDGMENTS
The authors would like to thank Ali Ganji and Bas van der
Vorst for supervising the work and Abdullah M. Al-Ghamd
and David Mba for providing the data.
REFERENCES
[1] A. M. Al-Ghmad and D. Mba, “A comparative experimental study of
the use of acoustic emission and vibration analysis for bearing defect
identification and estimation of defect size,” Mechanical Systems and Signal Processing, vol. 20, pp. 1537–1571, 2006.
[2] A. Ganji, “Acoustic emission to assess bearing lubrication condition: a
pre-study,” SKF E.R.C., Tech. Rep., 2003.[3] A. Ganji and J. Holsnijders, “Acoustic emission measurements focused
on bearing lubrication,” SKF E.R.C., Tech. Rep., 2004.[4] N. Jamaludin and D. Mba, “Monitoring extremely slowly rolling element
bearings: part I,” NDT&E International, vol. 35, pp. 349–358, 2002.
[5] ——, “Monitoring extremely slowly rolling element bearings: part II,” NDT&E International, vol. 35, pp. 359–366, 2002.
[6] R. Prado and M. West, “Time series modelling, inference and forecasting,”2005, manuscript. (It can be found on M.West’s webpage.).
[7] C. J. C. Burges, “A tutorial on support vector machines for pattern
recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2.[8] T. Jaakkola and D. Haussler, “Exploiting generative models in discrim-
inative classifiers,” in Proceedings of the 1998 conference on Advances
in neural information processing systems II , 1999, pp. 487 – 493.
APPENDIX
FISHER SCORE AND F ISHER M ATRIX FOR A R( p) MODELS
In the sequel we present the calculation of the Fisher score
and Fisher matrix for AR( p) models. For ease in computation
instead of s we use the so-called precision parameter v =log(1/s) and define
L(φ, v) = log p(y|Y p, φ , v).
The Fisher score is given by
∂
∂φ L(φ, v) = exp(v)(YT
p Y p)(φ(y,Y p) − φ)∂
∂vL(φ, v) = −
1
2 exp(v)Q(y, φ;Y p) +
T
2
and the elements of Fisher matrix are
−E
∂ 2
∂φ∂φT L(φ, v)
= exp(v)YT
p Y p
−E
∂ 2
∂φ∂vL(φ, v)
= 0
−E
∂ 2
∂v2L(φ, v)
=
1
2.
We assumed that the sequences in the dataset are indepen-
dently sampled, therefore the Fisher matrix of the model for
the whole dataset is given by the sum of Fisher matrices of
each sample.