106 - Cocianu, State, Constantin, Sararu2

Embed Size (px)

Citation preview

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    1/12

    InformaticaEconomicvol. 13, no. 3/2009 153

    Partially Supervised Approach in Signal Recognition

    Catalina COCIANU1, Luminita STATE2, Doru CONSTANTIN2, Corina SARARU21Academy of Economic Studies, Bucharest, Romania

    2

    [email protected], [email protected], [email protected] of Pitesti, Pitesti, Romania,

    The paper focuses on the potential of principal directions based approaches in signal

    classification and recognition. In probabilistic models, the classes are represented in terms of

    multivariate density functions, and an object coming from a certain class is modeled as a

    random vector whose repartition has the density function corresponding to this class. In cases

    when there is no statistical information concerning the set of density functions corresponding

    to the classes involved in the recognition process, usually estimates based on the information

    extracted from available data are used instead. In the proposed methodology, the

    characteristics of a class are given by a set of eigen vectors of the sample covariance matrix.

    The overall dissimilarity of an object X with a given class C is computed as the disturbance of

    the structure of C, when X is allotted to C. A series of tests concerning the behavior of the

    proposed recognition algorithm are reported in the final section of the paper.

    Keywords: signal processing, classification, pattern recognition, compression/decompression

    ntroductionMachine recognition of patterns can be

    viewed as a two-fold task, comprisinglearning the invariant properties of a set ofsamples characterizing a class, and deciding

    that a new sample is a possible member ofthe class by noting that it has propertiescommon to those of the set of samples.A typical pattern recognition system consistsof three phases. These are data acquisition,feature extraction and classification. In thedata acquisition phase, depending on theenvironment within which the objects are to

    be classified, data are gathered using a set ofsensors. These are then passed on to thefeature extraction phase, where the

    dimensionality of the data is reduced bymeasuring and retaining only somecharacteristic features or properties. In a

    broader perspective, this stage significantlyinfluences the entire recognition process.Finally, in the classification phase, theextracted features are passed on to theclassifier that evaluates the incominginformation and makes a final decision. This

    phase basically establishes a transformationbetween the features and the classes.

    Therefore pattern recognition can bedescribed as a transformation from the

    measurement space M to the feature space Fand finally to the decision space D.The problem of classification basicallyestablishes a transformation FD betweenthe features and the classes. In other words, it

    provides a partitioning of the feature spaceinto regions, one region for each category ofinput. That is, it attempts to assign every data

    point in the entire feature space to one of thepossible (say K) classes. Different forms oftransformation can be a Bayesian rule ofcomputing a posteriori class probability,nearest neighbor rule, linear discriminantfunctions, perceptron rule, nearest prototyperule, etc. Classifiers are usually, but notalways, designed with labeled data, in which

    case these problems are sometimes referredto as supervised classification (where the

    parameters of a classifier function D arelearned). Some common examples of thesupervised pattern classification techniquesare the nearest neighbor rule, Bayes'maximum likelihood classifier and the

    perceptron rule [4], [5], [10].The process of feature selection aims to mapa data space into a feature space that, intheory, has precisely the same dimension as

    the original data space. The mapping isdesigned such that the available data set isrepresented by a reduced number of effective

    1

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    2/12

    154 Informatica Economicvol. 13, no. 3/2009

    features, the most informative ones (that isby retaining most of the intrinsic informationcontent of the data). Principal ComponentAnalysis (PCA) has been broadly used in a

    large series of signal and image processing,pattern recognition and data analysisapplications. Classical PCA is based on thesecond-order statistics of the data and, in

    particular, on the eigen-structure of the datacovariance matrix and accordingly, the PCAneural models incorporate only cells withlinear activation functions. More recently,several generalizations of the classical PCAmodels to non-Gaussian models, theIndependent Component Analysis (ICA) and

    the Blind Source Separation techniques(BSS) have become a very attractive and

    promising framework in developing moreefficient signal processing algorithms. [10]The paper focuses on the potential of

    principal directions-based approaches insignal classification. The structure of a classis represented in terms of the estimates of its

    principal directions computed from data, theoverall dissimilarity of a particular objectwith a given class being given by thedisturbance of the structure, when theobject is identified as a member of this class.A series of conclusions experimentallyestablished are exposed in the final section ofthe paper.

    2 The Proposed Methodology for

    Classification Purposes

    According to the well known resultestablished by Karhunen and Loeve, a set of

    principal directions corresponds to themaximum variability of the cloud from

    metric point of view, and they are also almostoptimal from informational point of view, the

    principal directions being recommended bythe maximum entropy principle as the most

    reliable characteristics of the repartition.PCA allows the identification of a lineartransform such that the axes of the resultedcoordinate system correspond to the largestvariability of the investigated signal. Thesignal features corresponding to the newcoordinate system are uncorrelated, that is, incase of normal models these components areindependent. The advantages of using

    principal components reside from the factthat the information contained by each band

    is maximum for the whole set of bits [3].The principal directions of a class are given

    by a set of unit orthogonal eigen vectors ofthe covariance matrix.When the available data is represented by a

    set of objects NXXX ,...,, 21 , belonging to a

    certain class C, the covariance matrix isestimated by the sample covariance matrix,

    ( )( )=

    =N

    i

    TNiNiN

    N 1

    1

    1 XX (1)

    where =

    =N

    iiN

    N 1

    1 X .

    Let us denote by NnNN ...21 the

    eigen values and by NnN

    ,...,1 a set of

    orthonormal eigen vectors of N .

    If a new example XN+1

    ( )( ) NT

    NNNNNNNN

    XX 1

    1

    1111

    ++= +++

    coming from thesame class has to be included in the sample,the new estimate of the covariance matrix

    can be recomputed as,

    (2)

    Using first order approximations [11], theestimates of the eigen values and eigenvectors respectively are given by,

    ( ) ( ) NiNTN

    iNiN

    TNi

    Ni

    Ni 1

    1 +

    +=+= (3)

    ( )=

    +

    +=

    n

    ijj

    NjNj

    Ni

    NiN

    Tj

    NNiNi1

    1

    (4)

    Let NnN

    ,...,1 be set of principal directions

    of the class C computed using N . When

    the example XN+1

    ( )=

    +=

    n

    k

    Nk

    Nkd

    nD

    1

    ,1 1

    is identified as a memberof the class C, then the disturbance implied

    by extending C is expressed as,

    (6)

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    3/12

    InformaticaEconomicvol. 13, no. 3/2009 155

    where d is the Euclidian distance and11

    1 ,..., ++ Nn

    N are the principal directions

    computed using 1

    +N .

    Let { }MCCCH ,...,, 21= be a set of classes,where the class CjcontentsNjelements. Thenew object X is allotted to Cj

    =

    =

    =

    +n

    k

    N

    jk

    N

    jkjj

    dn

    D1

    1

    ,,,

    1

    , one of theclasses for which

    =

    +

    n

    k

    N

    pk

    N

    pkMp

    ppdn 1

    1

    ,,1

    ,1

    min (7)

    In order to protect against misclassifications,

    due to insufficient closeness to any class,we implement this recognition techniqueusing a threshold T>0 such that the exampleX is allotted to Cj

    Briefly, the recognition procedure isdescribed below [2].

    only if relation (7) holdsandD

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    4/12

    156 Informatica Economicvol. 13, no. 3/2009

    Using first order approximations, from (11)we get,

    111111

    11

    11

    11

    ++++++

    +

    +

    +

    +

    ++

    ++

    ++

    N

    i

    N

    i

    N

    i

    N

    i

    N

    i

    N

    i

    NiN

    NiN

    Ni

    Ni

    (12)

    hence,

    ( ) ( ) + ++++++ 111111 NiNTN

    iNiN

    TNi

    ( ) 211111 +++++ + NiNiNiTN

    iNi (13)

    Using ( ) ( ) 1111 ++++ = NTN

    i

    TNi

    Ni we

    obtain ,

    ( ) ( )

    ( ) 1111

    11

    1111

    ++++

    +

    +

    ++++

    +

    +

    NiNi

    T

    NiNi

    NiN

    TNi

    Ni

    TNi

    Ni

    (14)

    hence ( ) 1111 ++++ = NiNTN

    iNi that

    is,

    ( ) 1111 ++++ += NiNTN

    iNi

    Ni (15)

    The first order approximations of the

    orthonormal eigen vectors of N can be

    derived using the expansion of each vector1+

    Ni in the basis represented by the

    orthonormal eigen vectors of 1

    +N ,

    =

    ++=

    n

    j

    Njji

    Ni b

    1

    1,

    1 (16)

    Where

    ( ) 11, ++ = NiT

    Njjib (17)

    Using the orthonormality, we get,

    ( ) ( )=++= +++++ 1121211 21 NiTN

    iNi

    Ni

    Ni ( ) ( )1121 ++ += Ni

    TNi (18)

    that is

    ( ) 11, ++ = NiTN

    iiib =0 (19)

    Using (11), the approximation,

    111111

    11

    ++++++

    ++ ++

    Ni

    Ni

    Ni

    Ni

    NiN

    NiN (20)

    holds for each ni1 .

    For nij 1 , from (20) we obtain the

    following equations:

    ( ) ( ) + ++++++ 111111 NiNTN

    jNiN

    TNj

    ( ) ( ) 111111 ++++++ + NiTN

    jNi

    Ni

    TNj

    Ni (21)

    ( ) ( )

    ( ) 111

    11

    111

    1

    +++

    +

    +

    ++

    +

    +

    +

    Ni

    TNj

    Ni

    NiN

    TNj

    NiN

    TNj

    (22)

    ( ) ( ) ( ) 111111111 +++++++++ + NiTN

    jNi

    NiN

    TNj

    Ni

    TNj

    Nj (23)

    From (23) we get,

    ( ) ( )11

    11

    1

    11,

    ++

    +

    +

    +

    ++

    ==Nj

    Ni

    NiN

    TNjN

    i

    TNjjib

    (24)

    Consequently, the first order approximationof the eigen vectors of N are,

    ( )

    =

    +

    ++

    ++

    +

    +++

    ++

    n

    ijj

    NjN

    jNi

    NiN

    TNjN

    iNi

    Ni

    1

    1

    11

    11

    1

    111

    (25)

    3 Experiments and Conclusive Remarks

    Several tests on the proposed recognitionprocedure were performed on differentclasses of signals. The results proved very

    good performance in terms of the recognitionerror.Test 1. The evaluation of error using theleaving one out method.

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    5/12

    InformaticaEconomicvol. 13, no. 3/2009 157

    Sequentially, one of the given examples isremoved from the sample. The classifier isdesigned using the rest of 2NP-1 examples(that is the characteristics of the classes are

    computed in terms of the NP, NP-1remaining examples) and the removedexample is classified into one of resulted

    classes. The error is evaluated asNP

    F

    2 ,

    where F is the number of misclassifiedexamples.

    Letni

    NPini

    NPi 1

    ,2

    1

    ,1, ,

    { } { } ni

    NPini

    NPi 1

    ,2

    1

    ,1, be the

    characteristics of the classes and the

    corresponding eigen values at the initial

    moment and 21 , NPNP ,21 , NPNP the

    sample means and the sample covariancematrices respectively. Let X be the removedexample. In case X comes from the firstclass, the new characteristics are,

    ( )

    =

    +=

    n

    ijj

    NPjNP

    jNPi

    NPiNP

    TNPjNP

    iNP

    i1

    ,1

    ,1,1

    ,11,1

    ,11,1

    for the first class and remains unchanged forthe second one, where

    ( ) ( )( )

    11

    2

    1

    2

    1

    11

    1

    11

    11

    11

    111

    1

    =

    =

    =

    NP

    X

    NP

    NP

    XXNPNP

    NP

    NP

    NP

    NPNP

    T

    NPNPNPNP

    NPNPNP

    In case X comes from the second class,similar formulas are used.The evaluation of the error is performed for

    150,75,50,40,30,20,10=NP . Several testswere performed on samples generated from 3repartitions, Gaussian, Rayleigh andgeometric, each class corresponding to one ofthem. All tests reported to a surprisingconclusion that is the misclassification erroris near to 0.a) The classes correspond to the Gaussianrepartition and Rayleigh repartition

    respectively, NP=75, 150, n=50, e=50, wheren is the data dimensionality and e is thenumber of epochs. The resulted values of

    empirical errors are 0.213 in case NP=75 andrespectively 0.0327 in case NP=150. Severalother tests confirmed the idea that the

    performance can be significantly improvedby using larger size training sets.The variation of the empirical error in termsof e is presented in Figure 1 and Figure 2. InFigure1 the test is performed for NP=75 andin Figure 2 the volume is NP=150

    Fig. 1. The variation of empirical error for NP=75

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    6/12

    158 Informatica Economicvol. 13, no. 3/2009

    Fig. 2. The variation of empirical error for NP=150

    b) The classes correspond to the geometricrepartition and Rayleigh repartitionrespectively, NP=50, 150, n=50, e=50, wheren is the data dimensionality and e is thenumber of epochs. The resulted values ofempirical errors are 0.35 in case NP=50 and

    respectively 0.0112 in case NP=150. Thevariation of the empirical error in terms of eis presented in Figure 3 and Figure 4. InFigure3 the test is performed for NP=50 andin Figure 4 the volume is NP=150.

    Fig. 3.The variation of empirical error for NP=50

    Fig. 4.The variation of empirical error for NP=150

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    7/12

    InformaticaEconomicvol. 13, no. 3/2009 159

    c) The classes correspond to the Gaussianrepartition, NP=50,150, n=50, e=50, where nis the data dimensionality and e is thenumber of epochs. The resulted values of

    empirical errors are 0.15 in case NP=50 and

    respectively 0.0261 in case NP=150. Thevariation of the empirical error in terms of eis presented in Figure 5 and Figure 6. InFigure5 the test is performed for NP=50 and

    in Figure 6 the volume is NP=150

    Fig. 5. The variation of empirical error for NP=50

    Fig. 6. The variation of empirical error for NP=150

    Test 2. The evaluation of the error bycounting the misclassified examples from a

    set of NC new test samples coming from thegiven classes of the same repartitions.In this case, the learning is performed in anadaptive way, that is, each new classifiedexample contributes to the newcharacteristics of the class the exampled isassigned to, the new characteristics beingcomputed using first order approximations interms of the previous ones. Besides, aftereach iteration, the characteristics of the new

    resulted classes are re-computed using anexact method M.

    The tests were performed forNP=50,100,150, NC=10,20,30,40,50, n=50,e=100, where n is the data dimensionalityand e is the number of epochs.a) The classes correspond to the Gaussianrepartition and Rayleigh repartitionrespectively. The variation of the empiricalerror in terms of e is presented in Figure 7and Figure 8. In Figure 7 the test is

    performed for NP=100 and in Figure 8 thevolume is NP=150

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    8/12

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    9/12

    InformaticaEconomicvol. 13, no. 3/2009 161

    Fig. 10. The variation of empirical error for NP=150

    c) The classes correspond to the Gaussianrepartition. The variation of the empiricalerror in terms of e is presented in Figure 11

    and Figure 12. In Figure 11 the test isperformed for NP=50 and in Figure 12 thevolume is NP=150

    Fig. 11. The variation of empirical error for NP=50

    Fig. 12.The variation of empirical error for NP=150

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    10/12

    162 Informatica Economicvol. 13, no. 3/2009

    The results of a test on a two-class problemin signal recognition are presented in Figure

    13, Figure 14, and Figure 15.

    Fig. 13. The input signals

    The samples are extracted from the signalsdepicted in Figure 13. In Figure 14 arerepresented the marginal probabilitydistribution function (PDF) of each signal.

    The correct recognition of 20 new examplescoming from these two classes using P1failed in 2 cases.

    Fig. 14. The marginal PDF of the input signals

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    11/12

    InformaticaEconomicvol. 13, no. 3/2009 163

    Fig. 15. A set of correct recognized examples using the proposed algorithm

    The correctly recognized examples arepresented in Figure 15. The performance wasimproved significantly when the volume ofthe initial samples increases. The values of

    the resulted empirical mean error are lessthan 0.05 (more than 95% new examples arecorrectly recognized).

    References

    [1]

    C. Cocianu, L. State, P. Vlamos, D.Constantin and C. Sararu, PCASupervised and Unsupervised Classifiersin Signal Processing, in Proceedings ofPRIS 2009, Milan, Italy, 2009.

    [2] C. Cocianu, L. State, I. Rosca, and P.Vlamos, A New AdaptiveClassification Scheme Based onSkeleton Information, in Proceedingsof SIGMAP 2007, Barcelona, Spain, pp.128-136, 2007.

    [3] K.I. Diamantaras and S.Y. Kung,Principal Component Neural Networks:theory and applications, John Wiley&Sons, 1996.

    [4] P. A. Devijver and J. Kittler, PatternRecognition: A Statistical Approach,Prentice-Hall, London, 1982.

    [5] M. Friedman and A. Kandel,Introduction to Pattern Recognition,

    statistical, structural, neural and fuzzy

    logic approaches, World Scientific,

    Singapore, 1999.[6] J. Goldberger, S. Roweis, G. Hinton and

    R. Salakhutdinov, Neighbourhood

    Component Analysis, in Proceedings

    of the Conference on Advances inNeural Information Processing Systems,pp. 324-341, 2004.

    [7] A. D. Gordon, Classification, Chapman& Hall/CRC, 2nd

    [8] T. Hastie, R. Tibshirani, and J.Friedman, The Elements of Statistical

    Learning Data Mining, Inference, and

    Prediction, Springer, 2001.

    Edition, 1999.

    [9] A. Hyvarinen, J. Karhunen and E. Oja,Independent Component Analysis. John

    Wiley &Sons, 2001.[10]D. T. Larose,Data Mining. Methods and

    Models. Wiley-Interscience, A JohnWiley and Sons, Inc Publication,Hoboken, New Jersey, 2006.

    [11]L. State, C. Cocianu, P. Vlamos and V.Stefanescu, PCA-Based Data MiningProbabilistic and Fuzzy Approacheswith Applications in PatternRecognition, in Proceedings of

    ICSOFT 2006, Portugal, pp. 55-60,

    2006.[12]L. State, C. Cocianu and P. Vlamos, An

    unsupervised skeleton based method to

  • 8/12/2019 106 - Cocianu, State, Constantin, Sararu2

    12/12

    164 Informatica Economicvol. 13, no. 3/2009

    discover the structure of the classsystem, in Proceedings of the IEEE

    International Conference on Research

    Challenges in Information Science,

    RCIS 2008,, Marrakech, Morocco, IEEEComputer Society Press, pp. 169-178,2008.

    Catalina-Lucia COCIANU, Professor, PhD, currently working withAcademy of Economic Studies, Faculty of Cybernetics, Statistics andInformatics, Department of Informatics in Economy. Competence areas:statistical pattern recognition, digital image processing. Research in thefields of pattern recognition, data mining, signal processing. Author of 12

    books and more than 70 papers published in national and internationaljournals.

    Luminita STATE, Professor, PhD, currently working with University ofPitesti, Department of Mathematics and Computer Science. Competence

    areas: artificial intelligence, machine learning, statistical patternrecognition, digital image processing. Research in the fields of machinelearning, pattern recognition, neural computation. Author of 15 books andmore than 120 papers published in national and international journals.

    Doru CONSTANTIN, Lecturer, PhD, currently working with Universityof Pitesti, Department of Mathematics and Computer Science. Competenceareas: pattern recognition, artificial intelligence, digital signal processing,

    programming logic. Research in the fields of pattern recognition,independent component analysis and neural computation. Author of morethan 20 papers published in national and international journals.

    Corina SARARU, Assistant, PhD, currently working with University ofPitesti, Department of Mathematics and Computer Science. Competenceareas: digital image processing, pattern recognition, dynamical systems.Research in the fields of image processing and pattern recognition. Authorof 6 papers published in national and international journals.