Upload
adriana-calin
View
217
Download
0
Embed Size (px)
Citation preview
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
1/12
InformaticaEconomicvol. 13, no. 3/2009 153
Partially Supervised Approach in Signal Recognition
Catalina COCIANU1, Luminita STATE2, Doru CONSTANTIN2, Corina SARARU21Academy of Economic Studies, Bucharest, Romania
2
[email protected], [email protected], [email protected] of Pitesti, Pitesti, Romania,
The paper focuses on the potential of principal directions based approaches in signal
classification and recognition. In probabilistic models, the classes are represented in terms of
multivariate density functions, and an object coming from a certain class is modeled as a
random vector whose repartition has the density function corresponding to this class. In cases
when there is no statistical information concerning the set of density functions corresponding
to the classes involved in the recognition process, usually estimates based on the information
extracted from available data are used instead. In the proposed methodology, the
characteristics of a class are given by a set of eigen vectors of the sample covariance matrix.
The overall dissimilarity of an object X with a given class C is computed as the disturbance of
the structure of C, when X is allotted to C. A series of tests concerning the behavior of the
proposed recognition algorithm are reported in the final section of the paper.
Keywords: signal processing, classification, pattern recognition, compression/decompression
ntroductionMachine recognition of patterns can be
viewed as a two-fold task, comprisinglearning the invariant properties of a set ofsamples characterizing a class, and deciding
that a new sample is a possible member ofthe class by noting that it has propertiescommon to those of the set of samples.A typical pattern recognition system consistsof three phases. These are data acquisition,feature extraction and classification. In thedata acquisition phase, depending on theenvironment within which the objects are to
be classified, data are gathered using a set ofsensors. These are then passed on to thefeature extraction phase, where the
dimensionality of the data is reduced bymeasuring and retaining only somecharacteristic features or properties. In a
broader perspective, this stage significantlyinfluences the entire recognition process.Finally, in the classification phase, theextracted features are passed on to theclassifier that evaluates the incominginformation and makes a final decision. This
phase basically establishes a transformationbetween the features and the classes.
Therefore pattern recognition can bedescribed as a transformation from the
measurement space M to the feature space Fand finally to the decision space D.The problem of classification basicallyestablishes a transformation FD betweenthe features and the classes. In other words, it
provides a partitioning of the feature spaceinto regions, one region for each category ofinput. That is, it attempts to assign every data
point in the entire feature space to one of thepossible (say K) classes. Different forms oftransformation can be a Bayesian rule ofcomputing a posteriori class probability,nearest neighbor rule, linear discriminantfunctions, perceptron rule, nearest prototyperule, etc. Classifiers are usually, but notalways, designed with labeled data, in which
case these problems are sometimes referredto as supervised classification (where the
parameters of a classifier function D arelearned). Some common examples of thesupervised pattern classification techniquesare the nearest neighbor rule, Bayes'maximum likelihood classifier and the
perceptron rule [4], [5], [10].The process of feature selection aims to mapa data space into a feature space that, intheory, has precisely the same dimension as
the original data space. The mapping isdesigned such that the available data set isrepresented by a reduced number of effective
1
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
2/12
154 Informatica Economicvol. 13, no. 3/2009
features, the most informative ones (that isby retaining most of the intrinsic informationcontent of the data). Principal ComponentAnalysis (PCA) has been broadly used in a
large series of signal and image processing,pattern recognition and data analysisapplications. Classical PCA is based on thesecond-order statistics of the data and, in
particular, on the eigen-structure of the datacovariance matrix and accordingly, the PCAneural models incorporate only cells withlinear activation functions. More recently,several generalizations of the classical PCAmodels to non-Gaussian models, theIndependent Component Analysis (ICA) and
the Blind Source Separation techniques(BSS) have become a very attractive and
promising framework in developing moreefficient signal processing algorithms. [10]The paper focuses on the potential of
principal directions-based approaches insignal classification. The structure of a classis represented in terms of the estimates of its
principal directions computed from data, theoverall dissimilarity of a particular objectwith a given class being given by thedisturbance of the structure, when theobject is identified as a member of this class.A series of conclusions experimentallyestablished are exposed in the final section ofthe paper.
2 The Proposed Methodology for
Classification Purposes
According to the well known resultestablished by Karhunen and Loeve, a set of
principal directions corresponds to themaximum variability of the cloud from
metric point of view, and they are also almostoptimal from informational point of view, the
principal directions being recommended bythe maximum entropy principle as the most
reliable characteristics of the repartition.PCA allows the identification of a lineartransform such that the axes of the resultedcoordinate system correspond to the largestvariability of the investigated signal. Thesignal features corresponding to the newcoordinate system are uncorrelated, that is, incase of normal models these components areindependent. The advantages of using
principal components reside from the factthat the information contained by each band
is maximum for the whole set of bits [3].The principal directions of a class are given
by a set of unit orthogonal eigen vectors ofthe covariance matrix.When the available data is represented by a
set of objects NXXX ,...,, 21 , belonging to a
certain class C, the covariance matrix isestimated by the sample covariance matrix,
( )( )=
=N
i
TNiNiN
N 1
1
1 XX (1)
where =
=N
iiN
N 1
1 X .
Let us denote by NnNN ...21 the
eigen values and by NnN
,...,1 a set of
orthonormal eigen vectors of N .
If a new example XN+1
( )( ) NT
NNNNNNNN
XX 1
1
1111
++= +++
coming from thesame class has to be included in the sample,the new estimate of the covariance matrix
can be recomputed as,
(2)
Using first order approximations [11], theestimates of the eigen values and eigenvectors respectively are given by,
( ) ( ) NiNTN
iNiN
TNi
Ni
Ni 1
1 +
+=+= (3)
( )=
+
+=
n
ijj
NjNj
Ni
NiN
Tj
NNiNi1
1
(4)
Let NnN
,...,1 be set of principal directions
of the class C computed using N . When
the example XN+1
( )=
+=
n
k
Nk
Nkd
nD
1
,1 1
is identified as a memberof the class C, then the disturbance implied
by extending C is expressed as,
(6)
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
3/12
InformaticaEconomicvol. 13, no. 3/2009 155
where d is the Euclidian distance and11
1 ,..., ++ Nn
N are the principal directions
computed using 1
+N .
Let { }MCCCH ,...,, 21= be a set of classes,where the class CjcontentsNjelements. Thenew object X is allotted to Cj
=
=
=
+n
k
N
jk
N
jkjj
dn
D1
1
,,,
1
, one of theclasses for which
=
+
n
k
N
pk
N
pkMp
ppdn 1
1
,,1
,1
min (7)
In order to protect against misclassifications,
due to insufficient closeness to any class,we implement this recognition techniqueusing a threshold T>0 such that the exampleX is allotted to Cj
Briefly, the recognition procedure isdescribed below [2].
only if relation (7) holdsandD
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
4/12
156 Informatica Economicvol. 13, no. 3/2009
Using first order approximations, from (11)we get,
111111
11
11
11
++++++
+
+
+
+
++
++
++
N
i
N
i
N
i
N
i
N
i
N
i
NiN
NiN
Ni
Ni
(12)
hence,
( ) ( ) + ++++++ 111111 NiNTN
iNiN
TNi
( ) 211111 +++++ + NiNiNiTN
iNi (13)
Using ( ) ( ) 1111 ++++ = NTN
i
TNi
Ni we
obtain ,
( ) ( )
( ) 1111
11
1111
++++
+
+
++++
+
+
NiNi
T
NiNi
NiN
TNi
Ni
TNi
Ni
(14)
hence ( ) 1111 ++++ = NiNTN
iNi that
is,
( ) 1111 ++++ += NiNTN
iNi
Ni (15)
The first order approximations of the
orthonormal eigen vectors of N can be
derived using the expansion of each vector1+
Ni in the basis represented by the
orthonormal eigen vectors of 1
+N ,
=
++=
n
j
Njji
Ni b
1
1,
1 (16)
Where
( ) 11, ++ = NiT
Njjib (17)
Using the orthonormality, we get,
( ) ( )=++= +++++ 1121211 21 NiTN
iNi
Ni
Ni ( ) ( )1121 ++ += Ni
TNi (18)
that is
( ) 11, ++ = NiTN
iiib =0 (19)
Using (11), the approximation,
111111
11
++++++
++ ++
Ni
Ni
Ni
Ni
NiN
NiN (20)
holds for each ni1 .
For nij 1 , from (20) we obtain the
following equations:
( ) ( ) + ++++++ 111111 NiNTN
jNiN
TNj
( ) ( ) 111111 ++++++ + NiTN
jNi
Ni
TNj
Ni (21)
( ) ( )
( ) 111
11
111
1
+++
+
+
++
+
+
+
Ni
TNj
Ni
NiN
TNj
NiN
TNj
(22)
( ) ( ) ( ) 111111111 +++++++++ + NiTN
jNi
NiN
TNj
Ni
TNj
Nj (23)
From (23) we get,
( ) ( )11
11
1
11,
++
+
+
+
++
==Nj
Ni
NiN
TNjN
i
TNjjib
(24)
Consequently, the first order approximationof the eigen vectors of N are,
( )
=
+
++
++
+
+++
++
n
ijj
NjN
jNi
NiN
TNjN
iNi
Ni
1
1
11
11
1
111
(25)
3 Experiments and Conclusive Remarks
Several tests on the proposed recognitionprocedure were performed on differentclasses of signals. The results proved very
good performance in terms of the recognitionerror.Test 1. The evaluation of error using theleaving one out method.
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
5/12
InformaticaEconomicvol. 13, no. 3/2009 157
Sequentially, one of the given examples isremoved from the sample. The classifier isdesigned using the rest of 2NP-1 examples(that is the characteristics of the classes are
computed in terms of the NP, NP-1remaining examples) and the removedexample is classified into one of resulted
classes. The error is evaluated asNP
F
2 ,
where F is the number of misclassifiedexamples.
Letni
NPini
NPi 1
,2
1
,1, ,
{ } { } ni
NPini
NPi 1
,2
1
,1, be the
characteristics of the classes and the
corresponding eigen values at the initial
moment and 21 , NPNP ,21 , NPNP the
sample means and the sample covariancematrices respectively. Let X be the removedexample. In case X comes from the firstclass, the new characteristics are,
( )
=
+=
n
ijj
NPjNP
jNPi
NPiNP
TNPjNP
iNP
i1
,1
,1,1
,11,1
,11,1
for the first class and remains unchanged forthe second one, where
( ) ( )( )
11
2
1
2
1
11
1
11
11
11
111
1
=
=
=
NP
X
NP
NP
XXNPNP
NP
NP
NP
NPNP
T
NPNPNPNP
NPNPNP
In case X comes from the second class,similar formulas are used.The evaluation of the error is performed for
150,75,50,40,30,20,10=NP . Several testswere performed on samples generated from 3repartitions, Gaussian, Rayleigh andgeometric, each class corresponding to one ofthem. All tests reported to a surprisingconclusion that is the misclassification erroris near to 0.a) The classes correspond to the Gaussianrepartition and Rayleigh repartition
respectively, NP=75, 150, n=50, e=50, wheren is the data dimensionality and e is thenumber of epochs. The resulted values of
empirical errors are 0.213 in case NP=75 andrespectively 0.0327 in case NP=150. Severalother tests confirmed the idea that the
performance can be significantly improvedby using larger size training sets.The variation of the empirical error in termsof e is presented in Figure 1 and Figure 2. InFigure1 the test is performed for NP=75 andin Figure 2 the volume is NP=150
Fig. 1. The variation of empirical error for NP=75
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
6/12
158 Informatica Economicvol. 13, no. 3/2009
Fig. 2. The variation of empirical error for NP=150
b) The classes correspond to the geometricrepartition and Rayleigh repartitionrespectively, NP=50, 150, n=50, e=50, wheren is the data dimensionality and e is thenumber of epochs. The resulted values ofempirical errors are 0.35 in case NP=50 and
respectively 0.0112 in case NP=150. Thevariation of the empirical error in terms of eis presented in Figure 3 and Figure 4. InFigure3 the test is performed for NP=50 andin Figure 4 the volume is NP=150.
Fig. 3.The variation of empirical error for NP=50
Fig. 4.The variation of empirical error for NP=150
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
7/12
InformaticaEconomicvol. 13, no. 3/2009 159
c) The classes correspond to the Gaussianrepartition, NP=50,150, n=50, e=50, where nis the data dimensionality and e is thenumber of epochs. The resulted values of
empirical errors are 0.15 in case NP=50 and
respectively 0.0261 in case NP=150. Thevariation of the empirical error in terms of eis presented in Figure 5 and Figure 6. InFigure5 the test is performed for NP=50 and
in Figure 6 the volume is NP=150
Fig. 5. The variation of empirical error for NP=50
Fig. 6. The variation of empirical error for NP=150
Test 2. The evaluation of the error bycounting the misclassified examples from a
set of NC new test samples coming from thegiven classes of the same repartitions.In this case, the learning is performed in anadaptive way, that is, each new classifiedexample contributes to the newcharacteristics of the class the exampled isassigned to, the new characteristics beingcomputed using first order approximations interms of the previous ones. Besides, aftereach iteration, the characteristics of the new
resulted classes are re-computed using anexact method M.
The tests were performed forNP=50,100,150, NC=10,20,30,40,50, n=50,e=100, where n is the data dimensionalityand e is the number of epochs.a) The classes correspond to the Gaussianrepartition and Rayleigh repartitionrespectively. The variation of the empiricalerror in terms of e is presented in Figure 7and Figure 8. In Figure 7 the test is
performed for NP=100 and in Figure 8 thevolume is NP=150
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
8/12
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
9/12
InformaticaEconomicvol. 13, no. 3/2009 161
Fig. 10. The variation of empirical error for NP=150
c) The classes correspond to the Gaussianrepartition. The variation of the empiricalerror in terms of e is presented in Figure 11
and Figure 12. In Figure 11 the test isperformed for NP=50 and in Figure 12 thevolume is NP=150
Fig. 11. The variation of empirical error for NP=50
Fig. 12.The variation of empirical error for NP=150
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
10/12
162 Informatica Economicvol. 13, no. 3/2009
The results of a test on a two-class problemin signal recognition are presented in Figure
13, Figure 14, and Figure 15.
Fig. 13. The input signals
The samples are extracted from the signalsdepicted in Figure 13. In Figure 14 arerepresented the marginal probabilitydistribution function (PDF) of each signal.
The correct recognition of 20 new examplescoming from these two classes using P1failed in 2 cases.
Fig. 14. The marginal PDF of the input signals
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
11/12
InformaticaEconomicvol. 13, no. 3/2009 163
Fig. 15. A set of correct recognized examples using the proposed algorithm
The correctly recognized examples arepresented in Figure 15. The performance wasimproved significantly when the volume ofthe initial samples increases. The values of
the resulted empirical mean error are lessthan 0.05 (more than 95% new examples arecorrectly recognized).
References
[1]
C. Cocianu, L. State, P. Vlamos, D.Constantin and C. Sararu, PCASupervised and Unsupervised Classifiersin Signal Processing, in Proceedings ofPRIS 2009, Milan, Italy, 2009.
[2] C. Cocianu, L. State, I. Rosca, and P.Vlamos, A New AdaptiveClassification Scheme Based onSkeleton Information, in Proceedingsof SIGMAP 2007, Barcelona, Spain, pp.128-136, 2007.
[3] K.I. Diamantaras and S.Y. Kung,Principal Component Neural Networks:theory and applications, John Wiley&Sons, 1996.
[4] P. A. Devijver and J. Kittler, PatternRecognition: A Statistical Approach,Prentice-Hall, London, 1982.
[5] M. Friedman and A. Kandel,Introduction to Pattern Recognition,
statistical, structural, neural and fuzzy
logic approaches, World Scientific,
Singapore, 1999.[6] J. Goldberger, S. Roweis, G. Hinton and
R. Salakhutdinov, Neighbourhood
Component Analysis, in Proceedings
of the Conference on Advances inNeural Information Processing Systems,pp. 324-341, 2004.
[7] A. D. Gordon, Classification, Chapman& Hall/CRC, 2nd
[8] T. Hastie, R. Tibshirani, and J.Friedman, The Elements of Statistical
Learning Data Mining, Inference, and
Prediction, Springer, 2001.
Edition, 1999.
[9] A. Hyvarinen, J. Karhunen and E. Oja,Independent Component Analysis. John
Wiley &Sons, 2001.[10]D. T. Larose,Data Mining. Methods and
Models. Wiley-Interscience, A JohnWiley and Sons, Inc Publication,Hoboken, New Jersey, 2006.
[11]L. State, C. Cocianu, P. Vlamos and V.Stefanescu, PCA-Based Data MiningProbabilistic and Fuzzy Approacheswith Applications in PatternRecognition, in Proceedings of
ICSOFT 2006, Portugal, pp. 55-60,
2006.[12]L. State, C. Cocianu and P. Vlamos, An
unsupervised skeleton based method to
8/12/2019 106 - Cocianu, State, Constantin, Sararu2
12/12
164 Informatica Economicvol. 13, no. 3/2009
discover the structure of the classsystem, in Proceedings of the IEEE
International Conference on Research
Challenges in Information Science,
RCIS 2008,, Marrakech, Morocco, IEEEComputer Society Press, pp. 169-178,2008.
Catalina-Lucia COCIANU, Professor, PhD, currently working withAcademy of Economic Studies, Faculty of Cybernetics, Statistics andInformatics, Department of Informatics in Economy. Competence areas:statistical pattern recognition, digital image processing. Research in thefields of pattern recognition, data mining, signal processing. Author of 12
books and more than 70 papers published in national and internationaljournals.
Luminita STATE, Professor, PhD, currently working with University ofPitesti, Department of Mathematics and Computer Science. Competence
areas: artificial intelligence, machine learning, statistical patternrecognition, digital image processing. Research in the fields of machinelearning, pattern recognition, neural computation. Author of 15 books andmore than 120 papers published in national and international journals.
Doru CONSTANTIN, Lecturer, PhD, currently working with Universityof Pitesti, Department of Mathematics and Computer Science. Competenceareas: pattern recognition, artificial intelligence, digital signal processing,
programming logic. Research in the fields of pattern recognition,independent component analysis and neural computation. Author of morethan 20 papers published in national and international journals.
Corina SARARU, Assistant, PhD, currently working with University ofPitesti, Department of Mathematics and Computer Science. Competenceareas: digital image processing, pattern recognition, dynamical systems.Research in the fields of image processing and pattern recognition. Authorof 6 papers published in national and international journals.