13
Targeted Profiling: Quantitative Analysis of 1 H NMR Metabolomics Data Aalim M. Weljie, ,‡ Jack Newton, Pascal Mercier, Erin Carlson, and Carolyn M. Slupsky* Chenomx Inc., Edmonton, Alberta, Canada, and Metabolomics Research Centre, University of Calgary, Calgary, Canada Extracting meaningful information from complex spec- troscopic data of metabolite mixtures is an area of active research in the emerging field of “metabolomics”, which combines metabolism, spectroscopy, and multivariate statistical analysis (pattern recognition) methods. Chemo- metric analysis and comparison of 1 H NMR 1 spectra is commonly hampered by intersample peak position and line width variation due to matrix effects (pH, ionic strength, etc.). Here a novel method for mixture analysis is presented, defined as “targeted profiling”. Individual NMR resonances of interest are mathematically modeled from pure compound spectra. This database is then interrogated to identify and quantify metabolites in com- plex spectra of mixtures, such as biofluids. The technique is validated against a traditional “spectral binning” analy- sis on the basis of sensitivity to water suppression (presaturation, NOESY-presaturation, WET, and CPMG), relaxation effects, and NMR spectral acquisition times (3, 4, 5, and 6 s/scan) using PCA pattern recognition ana- lysis. In addition, a quantitative validation is performed against various metabolites at physiological concentra- tions (9 μM-8 mM). “Targeted profiling” is highly stable in PCA-based pattern recognition, insensitive to water suppression, relaxation times (within the ranges exam- ined), and scaling factors; hence, direct comparison of data acquired under varying conditions is made possible. In particular, analysis of metabolites at low concentration and overlapping regions are well suited to this analysis. We discuss how targeted profiling can be applied for mixture analysis and examine the effect of various acqui- sition parameters on the accuracy of quantification. Measurement of small-molecule metabolites, either endo- genous or exogenous, provides a chemical “snapshot” of an organ- ism’s metabolic state. Simultaneous characterization of numerous metabolites, or metabolic profiling, is an emerging area of ana- lytical chemistry exemplified by the fusion of analytical metabolite measurement with pattern recognition chemometric statistical analysis 1,2 and can be considered as a biological subset of general mixture analysis problems. Commonly referred to as “metabolo- mics” or “metabonomics”, this technique has wide applicability to a number of fields including medicine, plant sciences, toxicol- ogy, and food sciences. 3 Within the realm of medicine, such profiling has provided both diagnostic and prognostic information using a variety of sources including serum, urine, cereberospinal fluid. and tissue extracts (See Table 1 from Holmes and Antti 2 and references therein for representative examples.). The essence of the technology lies in distilling chemical information from complex spectroscopic information, such as nuclear magnetic resonance (NMR) spectroscopy or mass spec- trometry data. NMR has been pursued as an analytical platform for metabolomics due to its inherent quantitative nature and the wealth of chemical information it can provide about nuclei that are “NMR visible”. Typical 1 H 1D NMR experiments involve multiple samples (N from 10 to 100s or 1000s) with between 25-75 (tissue) and >200 (urine) NMR-visible metabolites in each sample. The sensitivity of NMR analysis to the chemical environment also brings certain challenges; subtle differences in pH, ionic strength, temperature, protein content, etc., between samples will cause differences in the NMR-detected peak position and line widths of a given metabolite (Figure 1). Furthermore, each metabolite is differentially sensitive to these effects, and in many compounds, the NMR resonances are affected indepen- dently, thus making a global correction infeasible for a mixture of multiple metabolites. Finally, in complex spectra there is a high degree of overlap in certain regions of the NMR spectrum, which hampers analysis. While this complexity can be reduced using selective data acquisition (e.g., ref 4), much more common is chemometric analysis of complete 1D or 2D spectra. The most widely employed method to overcome issues of peak and line width changes is a form of data reduction referred to as “spectral binning”. 5,6 The spectrum is subdivided into a number of regions, and the total area within each bin is considered in further analysis. The assumption is that by considering regions of the spectra, as opposed to individual data points, minor peak shifts and line width differences for the same compound can be accounted for across samples. The size of the bins can be of a fixed width or variably sized using manual inspection (best optimization) or automated algorithms. 7,8 For example, a typical 64k point NMR spectrum is reduced to 250 variables (K) upon fixed width binning at 0.04 ppm. In theory, each of these variables (K i where i is the ith bin) will contain the same latent chemical * Corresponding author. Tel: 780-492-8919. Fax: 780-492-5329. E-mail: [email protected]. Chenomx Inc. University of Calgary. § Current address: Metabolomics Centre, Department of Medicine, University of Alberta, Edmonton, Alberta, Canada. (1) Eriksson, L.; Antti, H.; Gottfries, J.; Holmes, E.; Johansson, E.; Lindgren, F.; Long, I.; Lundstedt, T.; Trygg, J.; Wold, S. Anal. Bioanal. Chem. 2004, 380, 419-429. (2) Holmes, E.; Antti, H. Analyst 2002, 127, 1549-1557. (3) Griffin, J. L. Philos. Trans. R. Soc. London, Ser. B 2004, 359, 857-871. (4) Sandusky, P.; Raftery, D. Anal. Chem. 2005, 77, 7717-7723. (5) Gartland, K. P. R.; Beddell, C. R.; Lindon, J. C.; Nicholson, J. K. Mol. Pharmacol. 1991, 39, 629-642. (6) Anthony, M. L.; Sweatman, B. C.; Beddell, C. R.; Lindon, J. C.; Nicholson, J. K. Mol. Pharmacol. 1994, 46, 199-211. Anal. Chem. 2006, 78, 4430-4442 4430 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 10.1021/ac060209g CCC: $33.50 © 2006 American Chemical Society Published on Web 05/17/2006

Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

Embed Size (px)

Citation preview

Page 1: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

Targeted Profiling: Quantitative Analysis of 1HNMR Metabolomics DataAalim M. Weljie,†,‡ Jack Newton,† Pascal Mercier,† Erin Carlson,† and Carolyn M. Slupsky*†,§

Chenomx Inc., Edmonton, Alberta, Canada, and Metabolomics Research Centre, University of Calgary, Calgary, Canada

Extracting meaningful information from complex spec-troscopic data of metabolite mixtures is an area of activeresearch in the emerging field of “metabolomics”, whichcombines metabolism, spectroscopy, and multivariatestatistical analysis (pattern recognition) methods. Chemo-metric analysis and comparison of 1H NMR1 spectra iscommonly hampered by intersample peak position andline width variation due to matrix effects (pH, ionicstrength, etc.). Here a novel method for mixture analysisis presented, defined as “targeted profiling”. IndividualNMR resonances of interest are mathematically modeledfrom pure compound spectra. This database is theninterrogated to identify and quantify metabolites in com-plex spectra of mixtures, such as biofluids. The techniqueis validated against a traditional “spectral binning” analy-sis on the basis of sensitivity to water suppression(presaturation, NOESY-presaturation, WET, and CPMG),relaxation effects, and NMR spectral acquisition times (3,4, 5, and 6 s/scan) using PCA pattern recognition ana-lysis. In addition, a quantitative validation is performedagainst various metabolites at physiological concentra-tions (9 µM-8 mM). “Targeted profiling” is highly stablein PCA-based pattern recognition, insensitive to watersuppression, relaxation times (within the ranges exam-ined), and scaling factors; hence, direct comparison ofdata acquired under varying conditions is made possible.In particular, analysis of metabolites at low concentrationand overlapping regions are well suited to this analysis.We discuss how targeted profiling can be applied formixture analysis and examine the effect of various acqui-sition parameters on the accuracy of quantification.

Measurement of small-molecule metabolites, either endo-genous or exogenous, provides a chemical “snapshot” of an organ-ism’s metabolic state. Simultaneous characterization of numerousmetabolites, or metabolic profiling, is an emerging area of ana-lytical chemistry exemplified by the fusion of analytical metabolitemeasurement with pattern recognition chemometric statisticalanalysis1,2 and can be considered as a biological subset of generalmixture analysis problems. Commonly referred to as “metabolo-mics” or “metabonomics”, this technique has wide applicabilityto a number of fields including medicine, plant sciences, toxicol-ogy, and food sciences.3 Within the realm of medicine, such

profiling has provided both diagnostic and prognostic informationusing a variety of sources including serum, urine, cereberospinalfluid. and tissue extracts (See Table 1 from Holmes and Antti2

and references therein for representative examples.).The essence of the technology lies in distilling chemical

information from complex spectroscopic information, such asnuclear magnetic resonance (NMR) spectroscopy or mass spec-trometry data. NMR has been pursued as an analytical platformfor metabolomics due to its inherent quantitative nature and thewealth of chemical information it can provide about nuclei thatare “NMR visible”. Typical 1H 1D NMR experiments involvemultiple samples (N from ∼10 to 100s or 1000s) with between∼25-75 (tissue) and >200 (urine) NMR-visible metabolites ineach sample. The sensitivity of NMR analysis to the chemicalenvironment also brings certain challenges; subtle differences inpH, ionic strength, temperature, protein content, etc., betweensamples will cause differences in the NMR-detected peak positionand line widths of a given metabolite (Figure 1). Furthermore,each metabolite is differentially sensitive to these effects, and inmany compounds, the NMR resonances are affected indepen-dently, thus making a global correction infeasible for a mixtureof multiple metabolites. Finally, in complex spectra there is a highdegree of overlap in certain regions of the NMR spectrum, whichhampers analysis. While this complexity can be reduced usingselective data acquisition (e.g., ref 4), much more common ischemometric analysis of complete 1D or 2D spectra.

The most widely employed method to overcome issues of peakand line width changes is a form of data reduction referred to as“spectral binning”.5,6 The spectrum is subdivided into a numberof regions, and the total area within each bin is considered infurther analysis. The assumption is that by considering regionsof the spectra, as opposed to individual data points, minor peakshifts and line width differences for the same compound can beaccounted for across samples. The size of the bins can be of afixed width or variably sized using manual inspection (bestoptimization) or automated algorithms.7,8 For example, a typical64k point NMR spectrum is reduced to ∼250 variables (K) uponfixed width binning at 0.04 ppm. In theory, each of these variables(Ki where i is the ith bin) will contain the same latent chemical

* Corresponding author. Tel: 780-492-8919. Fax: 780-492-5329. E-mail:[email protected].

† Chenomx Inc.‡ University of Calgary.§ Current address: Metabolomics Centre, Department of Medicine, University

of Alberta, Edmonton, Alberta, Canada.

(1) Eriksson, L.; Antti, H.; Gottfries, J.; Holmes, E.; Johansson, E.; Lindgren,F.; Long, I.; Lundstedt, T.; Trygg, J.; Wold, S. Anal. Bioanal. Chem. 2004,380, 419-429.

(2) Holmes, E.; Antti, H. Analyst 2002, 127, 1549-1557.(3) Griffin, J. L. Philos. Trans. R. Soc. London, Ser. B 2004, 359, 857-871.(4) Sandusky, P.; Raftery, D. Anal. Chem. 2005, 77, 7717-7723.(5) Gartland, K. P. R.; Beddell, C. R.; Lindon, J. C.; Nicholson, J. K. Mol.

Pharmacol. 1991, 39, 629-642.(6) Anthony, M. L.; Sweatman, B. C.; Beddell, C. R.; Lindon, J. C.; Nicholson,

J. K. Mol. Pharmacol. 1994, 46, 199-211.

Anal. Chem. 2006, 78, 4430-4442

4430 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 10.1021/ac060209g CCC: $33.50 © 2006 American Chemical SocietyPublished on Web 05/17/2006

Page 2: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

information for each sample (Nj, where j is the jth sample), i.e.,the contribution of all resonances from the original spectral region.

Success of subsequent multivariate pattern recognition analysisto elucidate metabolite patterns, biomarkers, or both is criticallydependent on the assumption that the input variables (K) containthe same latent information across all samples (N). In practicehowever, the sensitivity to sample conditions, coupled withbaseline and other spectral distortions that can arise during anexperiment, means that the bin integrations will not necessarilyreflect true changes in the spectral areas.9,10 Since patternrecognition techniques, such as principal component analysis(PCA), depend on linear combinations of input spectral bins,artifacts in the integrations of input bins will compromise theanalysis. If the metabolites of interest are modulated in a subtlemanner, peak position and line width differences between samplesas well as instrumental and spectral artifacts may mask significant

changes to their concentrations, especially for low-concentrationmetabolites, which may be important in disease processes.Statistical tools such as orthogonal signal correction can be appliedto regression-type experiments, and these have been shown toeliminate components of the data that are not relevant to theanalysis, such as the effects of physiological variation or instru-mental effects.11 However, these analysis corrections are stilldependent on the assumption that the latent information in eachvariable K is the same across N samples.

Several studies have examined methodologies to align spectraprior to analysis using all points in the spectra or binning.12,13

Successful implementation of these approaches would ensure thesame chemical information content for each point or bin in thespectrum. One significant limitation to these approaches is thatoverlapping peaks are highly problematic, especially when thepeak shifts of the overlapping entities are influenced by hetero-geneous factors. Alignment is usually successful for the mostintense peaks and less so for lower concentration metabolites.

Despite these advances to spectral preprocessing algorithmsand pattern recognition methods for spectral binning data, littleinformation is available with respect to individual metabolites andtheir concentrations in the biofluid. Underlying any statisticaltreatment of NMR spectra in metabolomics is the basic notionthat metabolites are the actual variables of interest. The idealstatistical treatment would be based directly on the concentrationsof all metabolites in the samples, since these data represent theunderlying physical model that generated the observed data (theNMR spectrum). Such treatment would allow for analysis ofcompounds as an entity or selectively (such as pathway modeling).In this case, there would still be N samples; however, each variableK would be the concentration of a particular metabolite.

An approach to data reduction, which addresses the quantita-tive concern, entails profiling a spectrum by comparison to NMRspectral signatures of individual metabolites found in a referencedatabase. This technique works by reducing spectral data toquantified metabolites, which can then be used as input variablesinto pattern recognition tools such as PCA or partial least squares(PLS). An advantage of this approach is a reduction in thedimensionality of problem space compared to spectral binning,as assignment of all protons in a chemical compound will show

(7) ACD/Labs. Ref Type: Internet Communication. 2006.(8) Eads, C. D.; Furnish, C. M.; Noda, I.; Juhlin, K. D.; Cooper, D. A.; Morrall,

S. W. Anal. Chem. 2004, 76, 1982-1990.

(9) Crockford, D. J.; Keun, H. C.; Smith, L. M.; Holmes, E.; Nicholson, J. K.Anal. Chem. 2005, 77, 4556-4562.

(10) Potts, B. C.; Deese, A. J.; Stevens, G. J.; Reily, M. D.; Robertson, D. G.;Theiss, J. J. Pharm. Biomed. Anal. 2001, 26, 463-476.

(11) Beckwith-Hall, B. M.; Brindle, J. T.; Barton, R. H.; Coen, M.; Holmes, E.;Nicholson, J. K.; Antti, H. Analyst 2002, 127, 1283-1288.

(12) Stoyanova, R.; Nicholls, A. W.; Nicholson, J. K.; Lindon, J. C.; Brown, T. R.J. Magn Reson. 2004, 170, 329-335.

(13) Forshed, J.; Torgrip, R. J.; Aberg, K. M.; Karlberg, B.; Lindberg, J.; Jacobsson,S. P. J. Pharm. Biomed. Anal. 2005, 38, 824-832.

Table 1. List of Urine Metabolites Studied and Their Concentrations (µM) in Each Sample

sample1-methyl-histidine citrate creatine creatinine fucose glycine glycolate hippurate isoleucine lactate taurine

1 991 997 251 7677 84 924 221 819 12 79 1492 352 425 127 5333 70 1273 172 619 10 101 783 707 357 229 7967 71 455 211 1659 10 129 1694 736 1289 206 6377 91 1327 266 1487 12 111 2485 520 780 149 5874 65 407 226 734 9 61 1046 383 1129 166 7753 69 470 251 754 13 98 164

Figure 1. Structures for the compounds analyzed in this study. Notethat compounds added for chemical shift indication or pH referencing(DSS, imidizole) were excluded from the analysis, unless otherwisenoted. Numbering corresponds to assignments in Table 2.

Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 4431

Page 3: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

all spectral regions correlated to a specific compound. As a result,a variety of approaches to targeted profiling have recently beendeveloped for both in vivo and ex vivo NMR.9,14,15 This methodol-ogy is most useful when the compounds of interest are previouslydefined and can be targeted.

NMR relaxation properties of metabolites are a key factorinfluencing the accuracy of metabolite quantification by NMR intargeted profiling approaches.16 In high-throughput metabolomicsexperiments, it is not feasible to allow complete relaxation of allnuclei as experimental times would be too long due to the rela-tively large longitudinal relaxation time (T1) of small molecules.In principle, if sample NMR spectra are acquired under the exactsame conditions (solvent suppression, relaxation time, etc.) as aparticular reference database, quantitation should be ideal as eachone will undergo the same relaxation phenomenon. However, inpractice, this is often not feasible. For example, sample data thatwere acquired prior to knowledge of reference database acquisi-tion parameters, or sample data requiring Carr-Purcell-Mei-boom-Gill (CPMG)-type experiments for the reduction of inter-fering protein or lipid baseline artifacts,17 will produce quantificationerrors.

Here we present a novel method of quantitatively character-izing 1H NMR spectra, hereafter referred to as “targeted profiling”.In this approach, metabolites of interest are first chemicallymodeled using their peak center and J-coupling information. Thisinformation is stored in a database, which is accessed during theanalysis of an unknown metabolite mixture spectrum, to create amathematical model of each metabolite in a cumulative manner.Quantification is achieved through the use of an internal standard.We have previously successfully used this approach to identifyand quantify overlapped glutamine/glutamate peaks from rat brainextracts.18 Here we demonstrate the utility of the method usingmultivariate statistical analysis in a mixture of metabolites atphysiological concentrations. We compare this approach to theideal case of manually created bins in a spectral binning approach.Specifically, we address the effect of water suppression techniquesand acquisition parameters on pattern recognition using eitherspectral binning or targeted profiling to determine the stability ofPCA models. The effect of presaturation,19 1D nuclear Overhauserenhancement spectroscopy (NOESY)-presaturation,20 and watersuppression enhanced through T1 effects (WET)21 methods ofsolvent suppression has previously been studied with respect tospectral binning.10 Here we additionally examine the CPMG 22

pulse sequence often used in metabolic profiling of samplescontaining larger MW components. We expect that the differentpulse sequences will cause differences in baseline due to thedifference in water suppression techniques. However, thesedifferences will not be quantifiable for a particular pulse sequencesince spectral distortions are also a result of sample conditionsand magnet shimming. The choice of scaling factor, autoscaling

(unit variance), or Pareto scaling is also evaluated as an influencingvariable on the PCA analysis; scaling has been shown to have agreater impact on analysis than bin size for spectral binning.23

The effect of variable acquisition times is considered for theevaluation of the influence of resolution and relaxation on patternrecognition. Furthermore, the influence of water suppressiontechniques and relaxation is investigated as a factor in the accuracyof quantification of metabolites using the targeted profilingapproach. Finally, the results of the analyses are described withrespect to the impact of each pulse sequence on different chemicalgroups and regions of the spectrum, and T1 properties of themetabolites characterized.

METHODOLOGYSample Preparation. To unambiguously analyze the spectral

information, a simplified synthetic urine for six samples wasprepared using the concentrations of metabolites as described inTable 1. Metabolite concentrations were chosen to represent actualconcentrations that may be observed in urine. To determinerepresentative metabolite concentrations, we averaged the mea-sured metabolite concentrations from the urine of 6 healthy malesover 25 days of 1-methylhistidine, citrate, creatinine, fucose,glycine, glycolate, hippurate, isoleucine, lactate, and taurine, usingthe Chenomx NMR Suite 3.1 software (Marrie et al., unpublishedobservations). All assignments were confirmed by comparisonwith pure reference compounds. All samples contained a mixtureof salts,24 including 78 mM NaCl, 16 mM Na2SO4, 18.6 mM NH4-Cl, 21.2 mM KCl, 0.02% NaN3, 3.26 mM CaCl2, 3.17 mM MgCl2,20 mM phosphate, 10 mM creatinine, 400 mM urea, 10 mMimidizole (pH indicator), 10% D2O, and 0.50 mM DSS (chemicalshift indicator) and were at pH 7.00 ( 0.05 uncorrected for HODusing an appropriately calibrated pH meter.

NMR Data Acquisition and Processing. The NMR spectrafor each sample were acquired using four water suppressionmethods: NOESY-presaturation (noesypr), presaturation (presat),CPMG, and WET on a four-channel Varian Inova 600 spectrometerwith a Triax-gradient 5-mm HCN probe. Acquisition parameterswere as follows: presat and noesypr water saturation of 0.99 sduring prescan delay, preceded by a short delay of 0.01 s; noesyprmixing time of 100 ms with prescan delay of 1.00 s; CPMGsaturation time of 80 ms with prescan delay of 1.00 s. All spectrawere acquired with a 12 ppm sweep width, 4-s acquisition time,four dummy scans, and 32 transients. An additional set of fourspectra for each sample were acquired with the noesypr pulsesequence using a sweep width of 20 ppm and acquisition timesof 2, 3, 4, and 5 s, for a total of eight spectra per sample. All spectrawere zero filled to 128k data points, Fourier transformed with a0.5-Hz line broadening applied, and manually phased and baselinecorrected using VNMR software.

T1 Measurements. A separate sample containing the me-tabolites listed in Table 1 at 1.0 mM was prepared to measure T1

values for nonoverlapping resonances. Nine spectra were acquired(14) Provencher, S. W. NMR Biomed. 2001, 14, 260-264.(15) Bamforth, F. J.; Dorian, V.; Vallance, H.; Wishart, D. S. J. Inherited Metab.

Dis. 1999, 22, 297-301.(16) Evilia, R. F. Anal. Lett. 2001, 34, 2227-2236.(17) Lucas, L. H.; Larive, C. K.; Wilkinson, P. S.; Huhn, S. J. Pharm. Biomed.

Anal. 2005, 39, 156-163.(18) McGrath, B. M.; Greenshaw, A. J.; McKay, R.; Weljie, A. M. Slupsky C. M.;

Silverstone, P. H. Int. J. Neurosci. 2006, (In Press).(19) Hoult, D. I. J. Magn. Reson. 1976, 21, 337-347.(20) Kumar, A.; Ernst, R. R.; Wuthrich, K. Biochem. Biophys. Res. Commun. 1980,

95, 1-6.

(21) Smallcombe, S. H.; Patt, S. L.; Keifer, P. A. J. Magn. Reson. A 1995, 117,295-303.

(22) Meiboom, S.; Gill, D. Rev. Sci. Instrum. 1958, 29, 688-691.(23) Webb-Robertson, B. J.; Lowry, D. F.; Jarman, K. H.; Harbo, S. J.; Meng, Q.

R.; Fuciarelli, A. F.; Pounds, J. G.; Lee, K. M. J. Pharm. Biomed. Anal. 2005,39, 830-836.

(24) Jones, D. S.; Djokic, J.; Gorman, S. P. J. Biomed. Mater. Res., B: Appl.Biomater. 2006, 76, 1-7.

4432 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006

Page 4: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

with T1 relaxation delays of 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, and 128s and a prescan delay of 180 s. The area for individual resonanceswas used and fit to the equation

where TR is the repetition time for one scan.Targeted Profiling. Quantification was achieved using the 600-

MHz library from Chenomx NMR Suite 3.1 (Chenomx Inc.,Edmonton, Canada), which uses the concentration of a knownreference signal (in this case DSS) to determine the concentrationof individual compounds. The library is predicated on a databaseof individual metabolite spectra acquired using the noesyprsequence with a 4-s acquisition time, and 1-s recycle delay betweentransients, and contained 215 metabolites. Each reference com-pound was fit to record peak centers and homonuclear J-couplingconstants at pH 7.00, and this information was stored in a field-dependent database. Profiling of the analyte mixture spectra wasaccomplished using the Profiler module. Essentially, a Lorentzianpeak shape model of each reference compound is generated fromthe database information and superimposed upon the actualspectrum. The linear combination of all modeled metabolites givesrise to the total spectral fit, which can be evaluated with asummation line. While automated numerical optimization algo-rithms would be possible for the data in this study, and is thelong-term goal of such work, quantification was achieved byensuring that all nonexchangeable peaks from the reference modeland the analyte spectrum were at the same level in the y-dimension(concentration) by visual inspection, as this is currently the bestmethod for analyzing biofluids. Total fitting time was ∼2 min/spectrum for 14 compounds (metabolites in Table 1, DSS,imidazole, and urea).

Spectral Binning. Integral bins were created in such a manneras to ensure that each resonance was in the same bin throughoutall spectra. Custom bin sizes were created for each resonanceover all spectra, with overlapping resonances being consideredtogether in one bin. The spectra were identical to those used forquantification using the targeted profiling approach. All bins werenormalized to the area of the DSS methyl peak to provide ameasure of the absolute contributions of particular resonancesto the spectrum, as well as the total spectral area for a comparisoncorrected for dilution effects.

Principal Component Analysis. PCA was performed usingstandard procedures as implemented in Simca P+ 10.5 (Umetrics,Umeå, Sweden). Input variables consisted of either compoundconcentrations (targeted profiling) or integral bin areas (spectralbinning). Data were preprocessed by mean-centering and eitherautoscaling or Pareto scaling prior to analysis. Autoscaling, alsoknown as unit variance scaling, divides each of the “x” variablesby its standard deviation, while Pareto scaling entails a divisionby the square root of the standard deviation. The quality of themodels was judged by the goodness-of-fit parameter (R2), and thegoodness of prediction parameter based on the fraction correctlypredicted in a 1/7th cross-validation (Q2),25 with each sample beingused in the test set.

In the case of targeted profiling data, PCA models were builtfor quantitative analysis with the input variables (K) being the

percent error in concentration determination of each metaboliteover all samples (N). All variables used for quantitative analysiswere Pareto scaled. Separate models were built to examine theeffect of pulse sequence and relaxation times.

RESULTSSpectral Binning and T1 Analysis. A total number of 45

custom bins were created from the data representing the 11compounds listed in Table 1 and depicted in Figure 1. Each binwas custom-sized considering all spectra as a group, such thatpeak and peak clusters would remain in the same bin over allspectra. The goal of the binning procedure was to ensure thatpeak areas from each resonance were assigned to a specific bin(and not divided across two or more bins). This was done toaccount for the minor change in pH and ionic strength betweensamples. This method of binning is simply a manually optimizedimplementation of variable bucketing algorithms 7,13 where localminima in spectra are recognized and buckets are adjusted toprevent peaks from being split. Typically, the buckets range from0.02 to 0.06 ppm using this method. In our study, the bin sizesranged from 0.016 (creatine peak at ∼3.91 ppm) to 0.248 ppm(exchanging hippurate peak at ∼8.40 ppm) (Table 2). As can beseen from Table 2, most buckets were less than 0.1 ppm in width.

(25) Multi- and Megavariate Data Analysis: Principles and Applications; UmetricsAB: Umeå, Sweden, 2001.

I ) Io(1 - e-TR/T1)

Table 2. T1 Values as a Function of Compound andIndividual Resonance

compound resonanceaT1(s)

binindex

bin width(ppm)

taurine 1a,2b 2.47 1 0.035taurne/fucose 1b,2a 2 0.095creatinine/creatine H6 2.73 3 0.053creatinine H5 4 0.018hippurate/methyl-

histidineH9 6 0.050

hippurate H3,H5 2.87 7 0.058H4 3.84 8 0.053H6,H2 2.73 9 0.055H8 10 0.248

fucose aH6 0.89 11 0.029fucose/isoleucine bH6 12 0.084

bH3 1.46 14 0.037bH4,aH2,bH5,aH4 1.38 15 0.087aH3 16 0.038

fucose/unknown aH5 17 0.050fucose/water bH1 18 0.036

aH1 1.98 19 0.032citrate 2a,4a 0.71 20 0.102

2b,4b 0.61 21 0.078isoleucine H5 1.55 22 0.048

H6 1.07 23 0.0584b 1.23 25 0.102H3 1.80 26 0.091H2 1.82 27 0.028

lactate H3 1.79 28 0.052H2 3.99 29 0.060

1-methylhistidine 3b 0.88 30 0.0573a 0.79 31 0.055H6 1.92 32 0.038H5 3.42 34 0.123H4 3.76 35 0.156

glycine H2 2.74 36 0.037glycolate H2 3.21 37creatine H2 1.28 39 0.016DSS H1 2.87 40 0.033

H3 1.62 41 0.058H4 1.65 42 0.078

DSS/creatininesatellite

H5 43 0.060

imidazole 7.46 447.71 45

a Note that numbering corresponds to structures in Figure 1.

Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 4433

Page 5: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

Interestingly, with very similar ionic strengths and pH’s within0.1 pH unit, many peak chemical shifts change dramatically. Thiscould unfortunately lead to a great deal of error when doingspectral PCA type of analysis unless these changes are taken intoaccount.

Figure 2A shows an example of a bin (bin 35) correspondingto the downfield aromatic proton from 1-methylhistidine. Shownis the superposition of all spectra (six different samples) containingthis resonance. As may be seen, the bin width is relatively largedue to large chemical shift changes of this resonance with theminor pH and ionic differences between the samples. Resonanceoverlap between and within compounds resulted in some binsconsisting of more than one resonance and, in some cases,resulted in multiple resonances per bin. Figure 2B depicts anoverlapping region (bin 6) containing 1-methylhistidine, glycolate,and hippurate resonances with assignment of peaks using spectralprofiling software shown in Figure 2C. The peaks from glycolate(∼3.945 ppm) and hippurate (doublet at ∼3.959) do not shiftsignificantly between samples; however, the resonance from1-methylhistidine (doublet of doublets centered at ∼3.953 ppm)shows significant chemical shift changes.

For T1 analysis, only bins with unique resonances originatingfrom a single compound were analyzed. Table 2 lists the T1 valuesobtained for resonances that could be separated into unique bins.Citrate clearly demonstrates the fastest relaxation in the groupof compounds considered with T1 values of 0.71 and 0.61 s forthe two bins considered. The two bins containing imidazole had

the longest T1 values of 7.46 and 7.71 s, respectively. The methylgroup of DSS had a T1 value of 2.87 s. Bins containing peaks fromDSS or imidazole were excluded from the principal componentanalysis procedure.

PCA Analysis. PCA is a classical multivariate statisticalmethod that has been widely used in all types of data analysis toreduce the dimensions of a single set of measured variables todetermine which variables or set(s) of variables form coherentsubsets that are relatively independent of one another. In otherwords, PCA transforms a number of correlated variables into asmaller number of uncorrelated variables called principal com-ponents. The first component accounts for the greatest variabilityin the data, and each succeeding component accounts for thatmuch of the remaining variability. Often the first two principalcomponents are represented in a PCA plot, and these representthe two largest components separating the data. However, PCAdata may be best separated using other components.

PCA has been widely used in the field of metabolomics (for areview, see refs 2 and 3) for comparing raw NMR spectral dataobtained from various biofluids and tissue extracts to determinewhether NMR spectral differences exist between groups such as“healthy” and “diseased”. Using NMR spectral data for this typeof analysis can be tricky at best as 1H NMR chemical shifts aresensitive to pH and ionic strength as shown above. Often binsare set (rather than using discrete points) since peaks movearound, but as shown, these bins need to be dynamic in nature.Even the variable-sized bins approach, however, has its limitations

Figure 2. Representative spectral bins showing the superposition of all spectra (6 samples, 48 spectra) collected. (A) Downfield 1-methylhistidinearomatic proton (bin 35) illustrates the large variation in chemical shift with slight changes in pH and ionic strength between samples. In thiscase, the bin width was set to 0.156 ppm to account for all variations. (B) Bin 6 contains resonances from 1-methylhistidine, hippurate, andglycolate. Hippurate and glycolate show little chemical shift movement whereas the resonance from 1-methylhistidine changes. In this case, thebin width was set to be much smaller (0.05 ppm) since the 1-methylhistidine resonance in this bin does not move as much with slight variationsin pH and ionic strength as the resonance in bin 35. (C) Targeted profiling illustrates how overlapping resonances may be deconvoluted. Thespectral line is shown in black, 1-methylhistidine in purple, hippurate in blue, glycolate in green, and summation of all three fit compounds in red.Note how the summation (red) line and black line superimpose well.

4434 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006

Page 6: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

as the complexity of the spectrum increases. Furthermore, theuse of bins complicates analysis as more than one bin canrepresent the same compound. In our case, we have broken upthe spectra into 45 bins. Analyzing compounds with their corre-sponding concentrations directly (targeted profiling) greatlysimplifies the analysis and reduces our 45 bins to 13 compoundsmaking analysis easier to interpret. We are interested in a directcomparison of these two techniques of data reduction by compar-ing the use of different data acquisition schemes (pulse sequencesand acquisition times).

Figure 3 shows PCA plots using our variable spectral binningand targeted profiling approaches separating samples by concen-tration, pulse sequence, and spectral acquisition time used. The

number of components (A) and explained variance relating to thegoodness of fit (R2) for each model are provided in Table 3. Eachcolor in Figure 3 represents a different sample with varyingconcentrations of metabolites (as described in the figure legend),and each point represents a different acquisition condition (eitherpulse sequence or acquisition time). In Figure 3A, the spectraldata have been normalized to DSS, whereas in Figure 3B, thespectral data have been normalized to the total area under thespectrum. Figure 3C illustrates the PCA plot using direct com-pound profiling. As may be observed, irrespective of the acquisi-tion time and method of analysis, each set of data collected withthe same pulse sequence tends to cluster together when usingthe targeted profiling method. Interestingly, the binned data

Figure 3. PCA, using Pareto scaling, of the NMR spectral data arising from six different samples, with metabolites at the concentrationsindicated in Table 1. 1-D NMR spectra were acquired using different water suppression schemes present in four pulse sequences: WET,prnoesy, presat, and CPMG. Additionally, four different acquisition times (2, 3, 4, and 5 s) with a constant 1-s relaxation delay were collectedusing the prnoesy pulse sequence. Each sample is represented by a unique color (samples: 1, black; 2, red; 3, blue; 4, green; 5, orange; 6,magenta), and selected pulse sequences have been indicated (2, CPMG; *, WET; 4, PRESAT; 9, PRNOESY; ×, PRNOESY-2s; 0, PRNOESY-3s; b, PRNOESY-4s; O, PRNOESY-5s). The ellipses in all PCA plots represent the 95% confidence limit (Hotelling T2). (A) Spectral binningdata normalized to DSS. (B) Spectral binning data normalized to total area (Table 3). (C) Targeted profiling data.

Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 4435

Page 7: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

normalized to DSS are more susceptible to changes in acquisitiontime than the binned data normalized to total area.

In addition, a comparison of the binned data normalized tothe total area with the targeted profiling data was made. In addition

to the goodness-of-fit metric (R2), the fraction of data correctlypredicted (Q2) using 1/7 cross-validation (Umetrics AB 2001) wasdetermined. A value of 1 for both of these variables impliescomplete data correlation and 100% prediction for a given dataset. Typically, Q2 is lower than R2; however, a significant differencebetween the two parameters is undesirable, and a negative Q2

implies that the model is not predictive. Figure 4A shows a plotof the contribution of each bin to the overall model, and Figure4B shows each compound for the targeted profiling model. TheR2 and Q2 values for all variables in the spectral binning modelare relatively good except for bins 22-26, which correspond toisoleucine. In other words, all bins fit the model relatively welland are reasonably well predicted using cross-validation except

Table 3. Table of Models

Pareto scaling autoscaling

model A R2 Q2 A R2 Q2

binned, normalized to DSS 12 0.996 0.881 5 0.776 0.517binned, normalized to spectral area 12 0.995 0.839 6 0.841 0.492targeted profiling 5 0.993 0.853 6 0.962 0.768targeted profiling errors (pulse sequence) 2 0.606 -0.108targeted profiling errors

(acquisition time)2 0.825 0.607

Figure 4. Comparison of contribution plots between (A) spectral binning (over 12 components separating the data in Figure 2B, Table 3) and(B) targeted profiling (over 5 components separating the data in Figure 2C). Bin number (for spectral binning) or metabolite ID (for targetedprofiling, Table 3) are plotted as a function of R2 (gray) and Q2 (black). R2 represents the goodness of fit, and Q2 represents the fraction of datacorrectly predicted in a 1/7 cross-validation and provides an estimate of the predictive ability of the model. Negative Q2 indicates a nonpredictivemodel for bins 22-26.

4436 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006

Page 8: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

for the bins representing this compound. In addition, bins 10, 12,14, 29, and 36 show a higher than desirable gap between R2 andQ2, prompting further investigation. Bin 10 derives from a slowexchanging peak from hippurate near 8.4 ppm, bin 12 from anoverlap of fucose and isoleucine near 1.2 ppm, bin 14 a fucoseresonance at 3.61, bin 29 from the lactate peak near 4.05 ppm,and bin 36 the glycine resonance near 3.5 ppm. These latter threebins suggest some residual artifacts from solvent suppressionwhich the Pareto scaling was not able to completely eliminate.This phenomenon has been observed by other researchers withrespect to differences in water suppression methods.10

It is worth noting that compound-based profiling does notproduce the same result for isoleucine (Figure 4B), where itcontributes positively to Q2. In addition, none of the othercompounds appears affected by solvent suppression. This resultimplies that it is not the integral values from the compound itselfcausing the deviation in the binning model, but some other artifact.Further analysis of the bins comprising peaks related to isoleucinedemonstrates that the combination of low concentration and highJ-multiplicity of the aliphatic resonances produces signal very closeto the baseline. It is the baseline that may be the source of errorin this case. In the targeted profiling model, since all of the regionsin isoleucine combine to form a single variable, these variationsare eliminated.

Effect of Scaling Parameters on PCA Analysis. A numberof different scaling parameters are available for preprocessing dataprior to PCA analysis. The choice of scaling parameter is importantas it defines the relationship between variables. Autoscaling forcesall x values to have equal weight, irrespective of the startingintensity, and hence is prone to distortion from poor baseline andother spectral artifacts.10 On the other hand, Pareto scaling givesa greater weight to variables that start with a larger valuesso,for example, intense peaks in an NMR spectrum will have greaterweight than weak ones. Figure 5A shows the pattern that emergeswhen binned data normalized to the DSS methyl peak are subjectto PCA using autoscaling. There is a separation of the six samplesthrough PC1. However, in PC2, data separation is caused by afactor other than sample difference. Inspection of the outliersindicates that acquisition with CPMG, and to a lesser degree WETpulse sequences, was responsible for the PC2 separation. Thesame analysis with Pareto scaling instead of autoscaling (Figure3A) demonstrates better clustering, although the effect of theCPMG pulse sequence is still evident. While the separation isimproved in PC2 with Pareto scaling, the dispersion through PC1and PC2 still suggests that a combination of intrinsic sampledifferences and intrasample acquisition condition variations areresponsible for diffuse clustering.

The bins resulting from the integration procedure were furtheranalyzed, after normalization to the entire total spectral area, usingboth autoscaling (Figure 5B) and Pareto scaling (Figure 3B). Inthis instance, autoscaling still provides components with mixedinter- and intrasample variation. The most obvious intersampleseparation of all binned data, reducing the differences to justacquisition parameters in PC2, is with the total area normalizationusing Pareto scaling. Figure 6A,B illustrates loading plots for thePareto scaled and autoscaled data normalized to total spectral area(Figures 3B and 5B, respectively). The compounds causingvariation in the PCA plots are highlighted in Figure 6A, although

in some cases (e.g., bins 2, 3, 6, 30, and 31), there are overlappingresonances possibly confounding the analysis. There is a sub-stantial change in the loadings plot due to scaling. In the case ofthe autoscaling, bins from fucose, lactate, creatine, and isoleucinebetween 3.65 and 4.21 ppm are distinct from the remaining bins(Figure 6B), suggesting that the acquisition parameters (specif-ically the CPMG pulse sequences and, to a lesser extent, WET)cause a significant difference in this region (Figure 7). Paretoscaling (Figure 6A) reduces this effect and allows resonances fromindividual compounds to be discriminated with relative ease asthey cluster into moderately defined regions.

Compound-based targeted profiling results in transformingintegral regions from the various resonances of a single compoundinto a single parameter, its concentration (Figure 2C). One wouldtherefore expect PCA results from targeted profiling to be similarto those from spectral binning. This result is shown in Figures5C (Autoscaling) and 3C (Pareto scaling). As with the binned data,Pareto scaling of targeted profiling data results in tight clustering.Interestingly, the effect of scaling is marginal, only causing a slightchange in both PC1 and PC2. In both cases, the effect ofacquisition parameters is clearly minor relative to the intersamplevariation. This result is reflected in the loadings plots (Figure 6Cand D, for the Pareto scaled and autoscaled models, respectively).Since the initial input to PCA analysis was compound concentra-tions, the loadings plot is relatively easy to interpret. A change inscaling causes the variation in both components to be interpretedslightly differently. However, key features such as the differencebetween citrate and hippurate in PC2 are preserved. This is incontrast to spectral binning (Figure 6B) where autoscalingindicated larger differences in PC2 due to pulse sequence selectionrather than variation between citrate and hippurate, which appearsto be obscured by artifacts.

Effect of Acquisition Parameters. One distinct advantageof using a targeted profiling approach is that the data of interestare quantitative. As a result, we were interested in assessing theaccuracy of quantitation and the underlying factors from this studythat might influence absolute concentration determination. Theeffect of the various acquisition parameters on the analysis wasevaluated using the difference between the measured concentra-tions from the targeted profiling approach with the actual knownconcentrations of metabolites. This method was chosen since thedifference between the measured and actual concentrationsrepresents a vector in multivariate space that provides valuableinformation about how the particular condition (acquisition time,pulse sequence) will change the measured concentration. Inessence, this is a pattern recognition analysis on the errorassociated with concentration determination; larger vectors willbe associated with increased error. Because this is a measure oferror in quantitation, all metabolites, including imidazole, wereincluded in the analysis. Typical errors for the prnoesy and presatsequences were <15%, and WET and CPMG <20%, with a fewnotable exceptions as described below. Figure 8A shows thescores plot for the PCA analysis of the difference between realand measured metabolite concentrations when acquired withdifferent pulse sequences. Figure 8B illustrates the correspondingloadings plot. Clearly, there is a defined difference based on pulsesequence in PC1. A distortion is evident from spectra acquiredusing the WET sequence, as all samples cluster in the top right

Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 4437

Page 9: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

corner, and some minor distortions from CPMG. The presat andprnoesy samples cluster in the middle, which is not entirelysurprising given that they both use water presaturation forsuppression of the water signal. The loadings plot and subsequenterror calculations indicate that these differences are caused byan underestimation of the concentrations of glycine (-33%) and1-methylhistidine (-17%), as well as overestimation of citrate(+18%), imidazole (+15%), fucose (+10%), and isoleucine (+18%)with WET. Presumably the absence of solvent saturation affectsthe excitation profile of these compounds in a reproduciblemanner, although more work is required to establish the underly-ing cause.

The effect of acquisition time is shown in Figure 8C. As onewould expect, there is a clear progression from shorter to longeracquisition times in PC1, with the shortest (2 s) and longest (5 s)

times showing the largest deviations. Since the library used forquantification was based on an acquisition time of ∼4 s, we wouldexpect the extent of error to be the least near this value and, asobserved, larger at higher and lower values. In other words, withacquisition times shorter or longer than 4 s, the error will bedirectly related to the magnitude of T1. For compounds with shortT1’s, quantitation becomes inaccurate only at very short acquisitiontimes, whereas compounds with long T1’s will show errors as theacquisition time is altered from 4 s as shown in eq 1.

The loadings plot (Figure 8D) implies two interesting features.First, there is a distinct vector created primarily in PC1 by thecompounds based on T1 values, from imidazole with the longestT1 (Table 2) to citrate with the shortest. The rest of the compoundsfall roughly in order of the average T1 values’ overall resonances.In addition, PC2 is heavily influenced by the error in the measured

Figure 5. PCA, using autoscaling, of the NMR spectral data arising from six different samples. Samples and conditions are as described inFigure 3. (A) Spectral binning data normalized to DSS. (B) Spectral binning data normalized to total area (Table 3). (C) Targeted profiling data.Autoscaling enhances artifacts arising from poor baseline in the CPMG spectra regardless of the type of normalization used (DSS or total area).The targeted profiling data remain well clustered and demonstrate little difference.

4438 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006

Page 10: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

values. Creatine and creatinine exhibit relatively high errors dueto conversion of creatinine to creatine, which has been welldocumented26 for both urine and buffered systems. PC2 ischaracterized by overestimated creatine and underestimatedcreatinine as is expected.

DISCUSSIONWe have demonstrated here that pattern recognition tech-

niques such as PCA are more stable using data derived from atargeted profiling approach as opposed to a conventional spectralbinning approach. The reduction of spectral information to singularcompound concentrations via a pseudospectral model preservesthe inherent chemical information, while abstracting away interfer-ing factors. For example, PCA has been shown to be sensitive to

spectral noise,27 as well as instrumental imperfections28 that willbe picked up in binning type analysis. Spectral binning approachesare also inherently prone to artifacts from water suppression andbaseline correction due to the indiscriminate nature of the binningprocess. Previous work using complex mouse urine spectra hasshown that the region near the water peak (∼3.3-4.3 ppm) isparticularly sensitive to binomial-type sequences.10 As demon-strated here, the aliphatic region between ∼3.5 and 4.2 is relativelysensitive to the CPMG pulse sequence and to a lesser extent theWET sequence as opposed to noesy-presat and presat. Presum-ably, the choice of field used for the CPMG spin-lock used fortransverse relaxation time (T2) filtering will also have an impact

(26) Fuller, N. J.; Elia, M. Clin. Chim. Acta 1988, 175, 199-210.

(27) Halouska, S.; Powers, R. J. Magn. Reson. 2006, 178, 88-95.(28) Siuda, R.; Balcerowska, G.; Aberdam, D. Chemom. Intell. Lab. Syst. 1998,

40, 193-201.

Figure 6. Comparison of loadings plots for PCA obtained using the Pareto scaling and autoscaling routines. (A) Pareto-scaled binned datanormalized to total area (Figure 3B). Each bin is marked. Bins corresponding to specific metabolites for which all integral bins can be clearlydistinguished are circled and annotated. (B) Autoscaled binned data normalized to total area (Figure 5B). Each bin is marked. The brokenellipse indicates the separation of bins 14-17, 27, 29, and 39 corresponding to many different compounds described in the text and Figure 7.(C) Pareto-scaled spectral profiling data. (D) Autoscaled spectral profiling data. All compounds are marked.

Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 4439

Page 11: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

on the nature and extent of the distortion. The CPMG-type ofNMR experiment is of importance as many biofluids, such asserum and cereberospinal fluid, contain large MW componentssuch as protein, which can result in serious baseline distortions,hampering multivariate analysis. Recently, the CPMG sequencehas been hyphenated with WET (WET-CPMG) in an effort toreduce distortions specifically for metabolomics work,29 andrecently, a pulse-field gradient sequence has been introduced withthe same intent.17

Interpretation of the data acquired with variable acquisitiontimes and digital resolution was quite straightforward in theseexperiments. The biological variation evident from the compoundsprofiled here was much greater than the acquisition conditiondifferences, as these spectra clustered quite tightly in PCA scoresplots using either spectral profiling or binning. This is noteworthyas it suggests that the number of points acquired and totalrelaxation delay (between 3 and 6 s) have little influence on patternrecognition. This is somewhat surprising given that the databaseof compound spectra used for spectral profiling were all acquiredwith a total recycle time of 5 s between scans, and somequantitative effects are evident from the error analysis (Figure8D). This result has positive practical implications for spectra thathave already been acquired. For example, the most commonexperimental protocol from the COMET consortium30 uses thenoesy-presat sequence with a sweep width of 20 ppm and a 1-srelaxation delay. The results here suggest that spectra acquiredwith this protocol or similar ones can be reasonably analyzed using

targeted profiling pattern recognition methods using a library thatis acquired under similar, but not necessarily the same, relaxationconditions.

Spectral Binning versus Targeted Profiling. One of aimsof the current study was to ensure that all binning was done under“perfect” conditions. In other words, each bin contains resonancesfrom the same compounds through all spectra, eliminating analysisartifacts due to resonances split over multiple bins. In addition,tight controls on the pH and ionic strength of the samples allowedfor isolation of the acquisition variables responsible for separation;in a scenario where these conditions were not able to be met,significant variations in the spectral signatures of compoundswould be observed, thereby confounding virtually any attempt atreaching a “correct” spectral binning configuration.

The data presented here are relatively simple in comparisonto analysis of actual urine, and both the binning and targetedprofiling approaches face challenges due to the inherent complex-ity. For example, when comparing many NMR spectra in a study,it is essentially impossible to ensure that all chemical resonancesare contained within the same bins, leading to difficulties withthis type of analysis. However, targeted profiling can be doneregardless of the pH or ionic strength of the sample. At present,targeted profiling is time-consuming and labor-intensive due tothe large number of compounds present in a urine sample since

(29) Van, Q. N.; Chmurny, G. N.; Veenstra, T. D. Biochem. Biophys. Res. Commun.2003, 301, 952-959.

(30) Lindon, J. C.; Nicholson, J. K.; Holmes, E.; Antti, H.; Bollard, M. E.; Keun,H.; Beckonert, O.; Ebbels, T. M.; Reily, M. D.; Robertson, D.; Stevens, G.J.; Luke, P.; Breau, A. P.; Cantor, G. H.; Bible, R. H.; Niederhauser, U.;Senn, H.; Schlotterbeck, G.; Sidelmann, U. G.; Laursen, S. M.; Tymiak, A.;Car, B. D.; Lehman-McKeeman, L.; Colet, J. M.; Loukaci, A.; Thomas, C.Toxicol. Appl. Pharmacol. 2003, 187, 137-146.

Figure 7. Comparison of the NMR spectral region between 3.4 and 4.2 ppm for a representative sample acquired with the four different pulsesequences showing baseline deviations. Note that, upfield of 3.65 ppm, the baseline appears flat. Downfield of 3.65 ppm reveals differences inthe baseline. In particular, the spectrum acquired with the CPMG sequence, and to a lesser degree the spectrum acquired with WET, shows asignificant negative trend in the baseline. These deviations are attributable largely to the water resonance at ∼4.7 ppm. These deviations in thebaseline are pronounced when autoscaling is used in spectral binning approaches as shown in Figures 5 and 6.

4440 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006

Page 12: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

the technique is not automated at present. For compounds thatare heavily overlapped with no obvious recognizable spectralpattern among the background, targeted profiling requires samplespiking or the use of other analytical techniques to ensure thatassignment is correct. However, there are many metabolites (onthe order of 100 in the present Chenomx library) in urine thathave unique NMR spectral signatures and are easily identifiableby eye, and it is for these metabolites that this approach showsgreat promise as an analytical tool to monitor compound concen-trations. We appreciate that low-concentration metabolites andmetabolites that are highly overlapped will suffer quantitationerrors at the moment, but as baseline correction proceduresbecome more robust and more compounds are added to libraries

used for this purpose, we expect that quantitation errors willdecrease. When approaching the limit of detection, we will alwaysexpect higher errors as anticipated for any quantitative instrumentat its limit of detection.

In the application of targeted profiling demonstrated here, PCAanalysis was conducted on a complete matrix of compoundconcentrations determined for all samples. There are numerouspermutations to this methodology in which targeted profiling maybe combined with binning approaches. For example, when a largesample size is available for 1H NMR metabolomics, covarianceanalysis of spectral regions has been shown to be a promisingmethod for resolving related spectral regions.31 In principle,extracted rows or columns from such a covariance matrix would

Figure 8. PCA of quantitation error using targeted profiling. (A) Score and (B) loadings plot of the error based upon pulse sequence (CPMG,blue; PRESAT, cyan; PRNOESY, green; WET, red). The WET pulse sequence shows the largest difference when compared to the other pulsesequences, caused mainly by differences in imidazole, glycine, and 1-methylhistidine. (C) Score and (D) loadings plot of the error using differentacquisition times while keeping a constant 1-s prescan delay. The arrow in (D) denotes a relationship between the average T1 values of individualcompounds, with citrate having the shortest (∼0.6-0.7 s) and imidazole the longest (7.5-7.7 s) (see Table 2).

Analytical Chemistry, Vol. 78, No. 13, July 1, 2006 4441

Page 13: Targeted Profiling:  Quantitative Analysis of 1H NMR Metabolomics Data

provide suitable simplified spectral traces from which specificcompounds could be automatically identified using targetedprofiling and other biochemically related compounds identifiedfor further analysis. Such an approach has the advantage of greatlysimplifying the analysis from a complex mixture such as urine.Another intriguing potential application combines targeted profil-ing analysis with nonnegative criteria, such as described in themolecular factor analysis technique.8 In this case, a collection ofmixture spectra gives rise to positive ingredient loadings spectrathat would be ideal candidates for targeted analysis.

One area in which targeted profiling is of great value is thecharacterization of compounds at low concentration that areoverlapped. For example, in this study some of the isoleucineresonances contribute negatively to Q2, using the integral bins,but positively, using targeted profiling. This may be because thelow concentration of isoleucine (∼9-13 µM), combined with highmultiplicity of these resonances, caused the baseline to influencethe integrals obtained. Combining the information from all integralregions using the targeted profiling approach allows for aninformed means of data reduction, and as a result, isoleucine doesindeed contribute positively to the Q2. The overlap of fucose andisoleucine in bin 12 also caused undesirable artifacts in the spectralbinning model, as well as the slow exchange NH peak ofhippurate, which is presumably more affected by the choice ofsolvent suppression and pH than other nonexchanging protons.This is a particularly notable point when one considers that oftenautoscaling is avoided as a preprocessing algorithm due todistortion of spectral information. As a result, spectral features at

relatively high intensity dominate the analysis and may overwhelmmore important changes in low-concentration metabolites. Thecombination of targeted profiling and autoscaling will ensure thatthe effect of low-concentration metabolites is on a par with high-concentration ones during feature selection. Further investigationsare ongoing into the relative merits of targeted profiling in urineand other complex biofluids.

At the end of the day, the choice of technique for data reductionwill depend on a number of factors, including the nature andnumber of samples, the acquisition conditions used, and thedesired information. Spectral binning techniques have much tooffer in terms of their ease of use, rapid screening ability, andcomplete spectral coverage. However, in certain cases, such asevaluating data acquired under different conditions, if quantitativeinformation about metabolites is desired or if ease of interpretationis important, targeted profiling clearly offers significant advantages.

ACKNOWLEDGMENTWe are indebted to Dr. Thomas J. Marrie for access to NMR

data collected from the urine of healthy individuals and Mr. DavidChang for valuable advice. We thank the Canadian National HighField NMR Centre (NANUC) for their assistance and use of thefacilities, especially Dr. Ryan McKay. Operation of NANUC isfunded by the Canadian Institutes of Health Research, the NaturalScience and Engineering Research Council of Canada, and theUniversity of Alberta. A.M.W. and P.M. are the recipients ofAlberta Ingenuity Industrial Associateships.

Received for review January 31, 2006. Accepted April 17,2006.

AC060209G

(31) Cloarec, O.; Dumas, M. E.; Craig, A.; Barton, R. H.; Trygg, J.; Hudson, J.;Blancher, C.; Gauguier, D.; Lindon, J. C.; Holmes, E.; Nicholson, J. Anal.Chem. 2005, 77, 1282-1289.

4442 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006