Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann- Sparse coding Neural Gas for analysis of Nuclear Magnetic Resonance Spectroscopy

  • Upload
    tuhma

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann- Sparse coding Neural Gas for analysis of Nuclear Magnetic Resonance Spectroscopy

    1/6

    Sparse coding Neural Gas for analysisof Nuclear Magnetic Resonance Spectroscopy

    Frank-Michael Schleif, Matthias Ongyerth, Thomas VillmannUniversity Leipzig, Dept. of Medicine, 04103 Leipzig, Germany

    {schleif,villmann}@informatik.uni-leipzig.de,{matthias.ongyerth}@medizin.uni-leipzig.de

    Abstract

    Nuclear Magnetic Resonance Spectroscopy

    is a technique for the analysis of complexbiochemical materials. Thereby the iden-tification of known sub-patterns is impor-tant. These measurements require an accu-rate preprocessing and analysis to meet clin-ical standards. Here we present a method for an appropriate sparse encoding of NMRspectral data combined with a fuzzy classifi-cation system allowing the identification ofsub-patterns including mixtures thereof. Themethod is evaluated in contrast to an alterna-tive approach using simulated metabolic spec-

    tra. keywords: data analysis, nuclear mag-netic resonance, sparse coding

    1 Introduction

    Nuclear Magnetic Resonance Spec-troscopy (NMR) is a technique for theanalysis of complex substances such as cellextracts. One prominent NMR applicationis metabolite profiling in stem cell biology

    [3]. NMR spectra are high dimensionalfunctional signals consisting of a multitudeof peaks. The peak positions describe thepresence of specific chemical compounds inthe analyzed material while the area of thepeaks are quantitative with respect to theamount of this analyte in the substrate.

    To meet clinical standards and highthroughput demands a full automatic eval-uation is required. Hereby, an accurate pre-processing of the NMR spectra and identifi-cation of metabolites in a set of NMR spec-tra are key issues. High level analyses suchas metabolite identification still requires a

    lot of manual interaction to fit expected pat-terns against the measured signal. Typically,some assumptions are incorporated to sim-plify the fitting. The NMR signal consistsof multiple peaks of specific shape, therebythe shape is approximated by a Lorentzian orsome type of Gaussians, giving a bias in theanalyses. Here we present a method whichestimates data specific basis functions suchthat an appropriate compact representationof the NMR spectra becomes possible. Thishelps to reduce two of the mentioned prob-lems, namely the high dimensionality of theanalyzed data and the approximation prob-lem in the fitting approach.

    The paper is organized as follows. Firstwe present the used methods, including ashort description of the basic preprocessingof the NMR spectra and how the simulatedmetabolites are obtained. Subsequently wepresent the method of sparse coding in aspecific variant tailored to the analysis offunctional data. The introduced approachis applied on sets of simulated pure NMRmetabolites as well as of mixtures of suchmetabolites. We close the paper with a dis-cussion of the obtained results and an out-look on further extensions.

    2 Methods

    We consider simulated 1H NMR spectrarecorded at 700.15 MHz with 20K complexdata points. The simulations are done us-ing the gamma-System as given in [13]. It isassumed that each sample is solved in D2Owith an additive of DSS as a standard refer-ence at 4.7 ppm. All spectra were Fouriertransformed with 1.5 Hz line broadening.The NMR spectra are phased and baselinecorrected, the signal range of the water peak

  • 8/3/2019 Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann- Sparse coding Neural Gas for analysis of Nuclear Magnetic Resonance Spectroscopy

    2/6

    f u n c t i o n [ v BL ] = b a s e d e t ( v S i g n a l , d D SS Po i nt s )% S i g n a l s i z e , Window w i d thn s = s i z e ( v S i g na l , 1 ) ; w = d DS SP oi nt s ;

    % Empty w in do ws w h ol e s i g n a ltemp = z e ro s (w, c e i l ( ns /w))+NaN ;% F i l l i n o v e r w i nd o ws , m in s p e r w in do wt emp ( 1 : n s ) = v S i g n a l ; [ m, h ] = mi n ( te mp ) ;g = h>1 & h

  • 8/3/2019 Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann- Sparse coding Neural Gas for analysis of Nuclear Magnetic Resonance Spectroscopy

    3/6

    DSS characteristics. Each peak is limitedto a minimal peak width assumed as 20%

    of the DSS width (DSSw) and a maximalwidth of 1.5 DSSw. Each peak list of aspectrum constitutes its line spectrum andcan be considered as a dense representationof the original signal. While the peak pick-ing approach is promising, e.g. an analysisof noise free data is possible, there are alsosome problems. First the peak picking be-comes complicated in case of stronger noise,partially resolved peaks and overlapping sig-nals, second the quantification of the iden-tified patterns based on the peak lists may

    be biased due to slightly incorrect start/endpositions of the peaks and hiding effects dueto overlapping peaks. As a positive pointthe peak shape becomes less crucial and onegets a natural measure about the safety ofa pattern match, considering the number ofmatched peaks in the pattern. In conclusion,the difference approach always gives a matchand concentration value, sensitive to the fit,the PCA method may potentially lead to in-correct results due to the huge number of di-mensions or failed constraints, the peak pick-ing on the other hand is promising to achievemost of the requirements but is also sensitivewith respect to noise and signals with poorresolution. To overcome some of these prob-lems an alternative is proposed which can beseen as a compromise between all the differ-ent approaches.

    2.2 Sparse coding for functional data

    Sparse coding refers back to the workgiven in [10], which has shown that imagesin the primary visual cortex of mammals are

    encoded by sparse representations of the im-age data such that a set of sparse codes isobtained. Here we focus on a special vari-ant of sparse coding applied on sets of func-tional spectral data. This encoding can beseen as a natural compact representation ofthe data balancing dimensionality reductionand information preservation.

    2.2.1 Minimum sparse coding

    We suppose that N functional data fkare available, each containing D values, i.e.fk RD and fk = 1. A set ofM, maybe

    overcomplete and/or not necessarily orthog-onal, basis function vectors j R

    D should

    be used for representation of the data in formof linear combination:

    fk =j

    j,k j + k (1)

    with k RD being the reconstruction error

    vector and j,k are the weighting coefficientswith j,k [0, 1],

    j j,k = 1 and k =

    (1,k, . . . , M,k). We define a cost functionEk for fk as

    Ek = k2 Sk (2)

    which has to be minimized. It contains aregularization term Sk. This term judges thesparseness of the representation. It can de-fined as

    Sk =j

    g (j,k) (3)

    whereby g (x) is a non-linear function like

    exp

    x2

    , log

    1

    1+x2

    , etc.. Another choice

    would be to take

    Sk = H(k) (4)

    being the entropy of the vector k. We re-mark that minimum sparseness is achieved iffj,k = 1 for exactly one arbitrary j and zeroelsewhere. Using this minimum scenario, op-timization is reduced to minimization of thedescription errors k

    2 or equivalently tothe optimization of the basis functions j .The span for a set of data vectors, consists ofvectors j chosen as principal components.Minimum principal component analysis re-

    quires at least the determination of the firstprincipal component. Taking into accounthigher components improves the approxima-tion. However, as mentioned above, if thedata space is non-linear, principal compo-nent analysis (PCA) may be suboptimal. InNMR spectroscopy, one possible way to over-come this problem is to split the data spaceinto continuous patches, building homoge-nous subsets on these patches and to caryout a PCA on each subset taking only thefirst principal component. The respectiveapproach to determine the principal com-ponent is a combination of adaptive PCA

  • 8/3/2019 Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann- Sparse coding Neural Gas for analysis of Nuclear Magnetic Resonance Spectroscopy

    4/6

    (Oja-learning, [9]) and prototype-based vec-tor quantization (neural gas [8]) called sparse

    coding neural gas.

    2.2.2 Sparse coding neural gas (SCNG)

    We now briefly describe the basic vari-ant of SCNG according to [7]. In SCNG Nprototypes W = {wi} approximate the firstprincipal component pi of the subsets i. Afunctional data vector fk belongs to i iff itscorrelation to pi defined by the inner prod-uct O (wi, fk) = wi, fk is maximum:

    i =

    fk|i = argmaxj

    pj , fk

    (5)

    The approximations wi can be obtainedadaptively by Oja-learning starting with ran-dom vectors wi for time t = 0 with wi = 1.Let P be the the probability density in i.Then, for each time step t a data vectorfk i is selected according to P and theprototype wi is updated by

    wi = tO (wi, fk) (fk O (wi, fk) wi) (6)

    with t > 0, t t

    0 ,

    t t = andt 2t < which is a converging stochastic

    process [6]. The final limit of the process iswi = pi [9].

    Yet, the subsets i are initially unknownbut requires the knowledge about their firstprincipal components pi according to (5).This problem is solved in analogy to the orig-inal neural gas in vector quantization [8]. Fora randomly selected functional data vector fk(according P) for each prototype the corre-

    lation O (wi, fk) is determined and the rankri is computed according to

    ri (fk, W) = NN

    j=1

    (O (wi, fk) O (wj, fk))

    (7)counting the number of pointers wj forwhich the relation O (wi, fk) < O (wj, fk) isvalid [8]. (x) is the Heaviside-function. Allprototypes are updated according to

    wi = th (v, W, i) O (wi, fk) (fk O (wi, fk) wi)(8)

    with

    ht (fk, W, i) = exp

    ri (fk, W)

    t

    (9)

    is the so-called neighborhood function withneighborhood range t > 0. Thus, the up-date strength of each prototype is correlatedwith its matching ability. Further, the tem-porary data subset i (t) for a given proto-type is

    i (t) =

    fk|i = argmax

    j

    wj, fk

    (10)

    For t the range is decreased as t 0and, hence, only the best matching proto-type is updated in (8) in the limit. Then, inthe equilibrium of the stochastic process (8)one has i (t) i for a certain subset con-figuration which is related to the data spaceshape and the density P [16]. Further, onegets wi = pi in the limit. Both results arein complete analogy to usual neural gas, be-cause the maximum over inner products ismathematically equivalent to the minimumof the Euclidean distance between the vec-tors [5],[8].

    2.2.3 Classification with Fuzzy LabeledSelf Organizing Map

    The sparse coded spectra have been fedinto a special variant of a self organizingmap, called Fuzzy Labeled Self OrganizingMap (FL-SOM) as given in [12]. We do notdetail FL-SOM here but mention that it gen-erates a classifier and a topological mappingof the data. The parameters of the FL-SOM

    are: map size 5 10, final neighborhoodrange 0.75 and with remaining parametersas in [12]. The map has been trained uptoconvergence as specified in [12]. To obtainthe sparse coding on NMR data, the spec-tra were splitted into 90 so called patches,which are fragments of the NMR signal (see[7]), with a width of 200 points, motivated bythe DSS width. For the SCNG algorithm 30prototypes have been used, determined by agrid search over different values. We wouldlike to mention that the number of proto-types did not significantly influence the re-sults but should be chosen in accordance to

  • 8/3/2019 Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann- Sparse coding Neural Gas for analysis of Nuclear Magnetic Resonance Spectroscopy

    5/6

    A G Y S 12 13 14 23 24 34A .9 .1G .8Y 1 .2S 0.9 .2

    12 113 .914 123 .824 .834 1

    Table 1. Classification of metabolites using peaklists. Simulated metabolites are almost perfectly re-covered (A,G,Y,S) whereas for the unknown mix-tures some miss identifications can be observed.

    the diversity of the substructures expected inthe overall dataset. The sparse coding got adimensionality reduction by a factor of 10.

    3 Experiments and Results

    Here we compare the peak picking encod-ing and sparse coding for a set of simulatedmetabolite spectra. We consider four typesof metabolites, relevant in metabolic studiesof the stem cell: Alanine (Ala), Glutamine(Gln), Gycine (Gly) and Serine (Ser), sim-ulated at 39 different linear increased con-centration levels (1 39). Hence we obtain

    156 spectra simulated using the prior men-tioned NMR system parameters. Addition-ally we generated mixtures of these metabo-lites by combining two metabolites up to allcombinations, with 39 concentration levels.This gives 6 39 mixture spectra, which arenot used in subsequently training steps butused for external validations. All spectra areprocessed as mentioned above and either en-coded to peak lists or alternatively encodedby sparse coding. The results for the peakbased approach are collected in Table 1 andthe sparse coding in 2 (Alanine - A, Gua-nine G, Glycine - Y, Serine - S). Therebythe peak lists of the patterns are directlymatched against the peak lists of the mea-surement using a tolerance of 0.005 ppm.

    For the peak lists we observe a very goodrecognition as well as prediction accuracy.In average the recognition (on the 4 train-ing classes) is 91% and on the unknown 6mixture classes 90%. It should be notedthat the fractions in a column of Table 1 donot necessary accumulate to 1.0 = 100% be-cause, the peak based identification is notforced to identify one of the metabolites in

    A G Y S 12 13 14 23 24 34A .8G 1 1 .4 1Y 1S .8

    1213 .2 1 .714 .323 .2 124 .634

    Table 2. Classification of metabolites using sparsecoding. Pure metabolites are almost perfectly re-covered (A,G,Y,S) whereas for the unknown mixturedata stronger miss identifications are observed.

    each analysis1.The sparse coded data have been analyzed

    using the FL-SOM and the obtained map(which was topological preserving, topo-graphic error < 0.05). The model has beentrained with 4 classes of metabolites. TheFL-SOM model generates a fuzzy-labelingof the underlying prototypes and hence isalso able to give assignments to more thanone class. Using a majority vote schemeto classify the data the training data havebeen learned with 100% accuracy. But wealso wanted to determine the 6 new mixtureclasses. To do this we defined prototypi-cal labels for each class such as {1, 0, 0, 0}for class 1-Alanine and {0.5, 0.5, 0, 0} for a

    mixture class of Alanine and Glycine. Spec-tra where assigned to the closest prototypeand labeled by the label which was closestto the labeling of the data point using Eu-clidean distance. For example let a datapoint v have a fuzzy label {1, 0, 0, 0} whichassigns it to class 1 or Alanine. Let furtherbe some prototype w be the winner (closestprototype) for this data point with a fuzzylabel of {0.6, 0.4, 0, 0}. Then two classifi-cations are possible. Using majority votethe prototype label becomes {1, 0, 0, 0} and

    hence the data point is assigned to class 1- Alanine, which is correct. Using the al-ternative scheme the fuzzy label {0.60.40, 0}is closer to {0.5, 0.5, 0, 0} then to {1, 0, 0, 0}and hence the prototype is labeled as a mix-ture of alanine and glutamine, consequentlythe data point is assigned to the 1/2 or Ala-nine/Glutamine class leading to a (in thiscase) wrong classification because the datapoint was labeled as Alanine. Using thisscheme and considering the receptive fields

    1

    At least 75% of the peaks had to match to countthe classification

  • 8/3/2019 Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann- Sparse coding Neural Gas for analysis of Nuclear Magnetic Resonance Spectroscopy

    6/6

    Figure 2. FL-SOM (bar plot) for the 4 metaboliteclasses. The map is given as a 510 grid and for eachgrid 4 bars are depicted indicating the fuzzy valuesfor the respective classes. As shown in the picture(indicated by manually added closed regions - e.g.

    ellipsoids in the corners) the map contains receptivefield with high responsibility for a single class, butthere is also a region in the map responsible for datapoints which are topologically located between dif-ferent classes - in our case metabolite mixtures.

    of the prototypes of the FL-SOM the pic-ture is a bit different as shown in Table 2.In average the recognition (on the 4 trainingclasses) becomes 87% and on the unknown 6mixture classes we obtain 50%. However,it should be noted that the used FL-SOMclassifier did never see the mixture classes

    during training but also learned prototypeswhich are located between different classesin a topology preserved manner. The er-ror for the mixtures Ala/Gln and Gly/Serare caused due to the fact that no prototypewere learned on the map representing thesemixtures as depicted in Figure 2.

    4 Conclusions

    We presented a method for the sparsecoded representation of functional data ap-

    plied in NMR spectroscopy and compared itto an alternative peak based approach. Allapproaches were able to recognize the plainmetabolite spectra at different concentrationlevels. For the analysis of mixtures the peakpicking approach performed better but thisresult is potentially biased because the simu-lated data always show a perfect peak shape.For the SCNG approach, we found promisingresults, the metabolic information encodedin the spectra could be preserved and a sig-nificant data reduction by a factor of 10 wasachieved. The SCNG provided a sufficientand accurate data reduction such that the

    FL-SOM classifier method could be used ina topology preserved manner. The SCNG

    encoding also allows the application of otherdata analysis methods, such as different clas-

    sifiers or statistical tests, which need a com-pact data representation. The SCNG gener-ated a compact and discriminative encoding.Future directions of improvement will focuson a better combination of sparse coded dataand the FL-SOM, the additional integrationof NMR specific knowledge and an advanceddetermination of the patches. In a next stepall methods will be analyzed on the basis ofreal NMR metabolite and NMR cell extractmeasurements. 2.

    References

    [1] David Chang, Cory D. Banack, and Sirish L. Shah. Robustbaseline correction algorithm for signal dense nmr spectra.Journal of Magnetic Resonance, 187(2):288292, 2007.

    [2] Li Chen, Zhiqiang Weng an Laiyoong Goh, and Marc Gar-land. An efficient algorithm for automatic phase correctionof nmr spectra based on entropy minimization. Journal ofMagnetic Resonance, 158(1-2):164168, 2002.

    [3] M. Cross, R. Alt, and D. Niederwieser. The case for ametabolic stem cell niche. Cells Tissues Organs, in press,2008.

    [4] S. W. Homans. A dictionary of concepts in NMR. ClarendonPress, Oxford, 1992.

    [5] Teuvo Kohonen. Self-Organizing Maps, volume 30 of SpringerSeries in Information Sciences. Springer, Berlin, Heidelberg,1995. (Second Extended Edition 1997).

    [6] H.J. Kushner and D.S. Clark. Stochastic AppproximationMethods for Constrained and Unconstrained Systems. Springer-Verlag, New York, 1978.

    [7] K. Labusch, E. Barth, and T. Martinetz. Learning data rep-

    resentations with sparse coding neural gas. In M. Verleysen,editor, Proceedings of the European Symposium on ArtificialNeural Networks ESANN, page in press. d-side publications,2008.

    [8] Thomas M. Martinetz, Stanislav G. Berkovich, and Klaus J.Schulten. Neural-gas network for vector quantization andits application to time-series prediction. IEEE Trans. on Neu-ral Networks, 4(4):558569, 1993.

    [9] E. Oja. Neural networks, principle components and suspaces.International Journal of Neural Systems, 1:6168, 1989.

    [10] B. Olshausen and D. Field. Emergence of simple-cell recep-tive field properties by learning a sparse code for naturalimages. Letters to Nature, 381:607609, 1996.

    [11] T.H. Park. Towards Automatic Musical Instrument TimbreRecognition. PhD thesis, Princeton University, 2004.

    [12] F.-M. Schleif, T. Villmann, and B. Hammer. Prototype basedfuzzy classification in clinical proteomics. International Jour-nal of Approximate Reasoning, 47(1):416, 2008.

    [13] S.A. Smith, T.O. Levante, B.H. Meier, and R.R. Ernst. Com-puter simulations in magnetic resonance. an object orientedprogramming approach. J. Magn. Reson., 106a:75105, 1994.

    [14] Lucksanaporn Tarachiwin, Koichi Ute, Akio Kobayashi, andEiichiro Fukusaki. 1h nmr based metabolic profiling in theevaluation of japanese green tea quality. J. Agric. FoodChem., 55(23):93309336, 2007.

    [15] M. Verleysen and D. Francois. The curse of dimensionalityin data mining and time series prediction. In J. Cabestany,A. Prieto, and F. S. Hernandez, editors, Computational Intel-ligence and Bioinspired Systems, Proceedings of the 8th Inter-national Work-Conference on Artificial Neural Networks 2005(IWANN), Barcelona.

    [16] T. Villmann and J.-C. Claussen. Magnification control inself-organizing maps and neural gas. Neural Computation,18(2):446469, 2006.

    [17] B. Williams, S. Cornett, B. Dawant, A. Crecelius, B. Boden-heimer, and R. Caprioli. An algorithm for baseline correc-tion of maldi mass spectra. In Proceedings of the 43rd annualSoutheast regional conference - Volume 1, pages 137142, Ken-nesaw, Georgia, 2005. ACM.

    2Acknowledgment: We are grateful to Thomas Riemer IZKF,

    Leipzig University