14
Review Chemometrics tools used in analytical chemistry: An overview Naveen Kumar, Ankit Bansal, G.S. Sarma, Ravindra K. Rawal n Department of Pharmaceutical Analysis, Indo-Soviet College of Pharmacy (ISFCP), Moga 142001, India article info Article history: Received 7 November 2013 Received in revised form 2 February 2014 Accepted 3 February 2014 Available online 14 February 2014 Keywords: Singular value decomposition Evolving factor analysis Chemometrics Rank annihilation Multivariate calibration Principal component abstract This article presents various important tools of chemometrics utilized as data evaluation tools generated by various hyphenated analytical techniques including their application since its advent to today. The work has been divided into various sections, which include various multivariate regression methods and multivariate resolution methods. Finally the last section deals with the applicability of chemometric tools in analytical chemistry. The main objective of this article is to review the chemometric methods used in analytical chemistry (qualitative/quantitative), to determine the elution sequence, classify various data sets, assess peak purity and estimate the number of chemical components. These reviewed methods further can be used for treating n-way data obtained by hyphenation of LC with multi-channel detectors. We prefer to provide a detailed view of various important methods developed with their algorithm in favor of employing and understanding them by researchers not very familiar with chemometrics. & 2014 Elsevier B.V. All rights reserved. Contents 1. Introduction ........................................................................................................ 187 2. Origin and development of chemometrics ................................................................................ 187 3. Chemometrics (multivariate data analysis) tools ........................................................................... 187 3.1. Multivariate regression method .................................................................................. 187 3.1.1. Multiple linear regressions ............................................................................... 188 3.1.2. Multivariate calibration methods .......................................................................... 188 3.2. Multivariate decomposition methods .............................................................................. 189 3.2.1. Principal component analysis (PCA) ........................................................................ 190 3.2.2. Parallel factor analysis (PARAFAC) .......................................................................... 191 3.3. Hierarchical cluster analysis (HCA) ................................................................................ 191 3.4. Pattern recognition methods..................................................................................... 191 3.4.1. Soft independent modeling of class analogy (SIMCA) .......................................................... 191 3.4.2. Linear discriminate analysis (LDA) ......................................................................... 192 3.5. Chemometric resolution methods (CRMs) .......................................................................... 192 3.5.1. Rank annihilation factor analysis (RAFA) .................................................................... 192 3.5.2. Generalized rank annihilation method (GRAM) ............................................................... 193 3.5.3. Evolving factor analysis (EFA) ............................................................................. 193 3.5.4. Heuristic evolving latent projections (HELP) ................................................................. 193 3.5.5. Subwindow factor analysis (SFA) .......................................................................... 194 3.5.6. Window factor analysis (WFA) ............................................................................ 194 3.5.7. Alternating least square (ALS) ............................................................................. 195 3.5.8. Orthogonal projection analysis (OPA) ....................................................................... 195 4. Application of chemometric tools in analytical chemistry.................................................................... 196 Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/talanta Talanta http://dx.doi.org/10.1016/j.talanta.2014.02.003 0039-9140 & 2014 Elsevier B.V. All rights reserved. n Corresponding author. Tel.: þ91 9878 882520; fax: þ91 1636 239515. E-mail address: [email protected] (R.K. Rawal). Talanta 123 (2014) 186199

Chemometrics tools used in analytical chemistry: An overview

Embed Size (px)

Citation preview

  • Review

    Chemometrics tools used in analytical chemistry: An overview

    Naveen Kumar, Ankit Bansal, G.S. Sarma, Ravindra K. Rawal n

    Department of Pharmaceutical Analysis, Indo-Soviet College of Pharmacy (ISFCP), Moga 142001, India

    a r t i c l e i n f o

    Article history:Received 7 November 2013Received in revised form2 February 2014Accepted 3 February 2014Available online 14 February 2014

    Keywords:Singular value decompositionEvolving factor analysisChemometricsRank annihilationMultivariate calibrationPrincipal component

    a b s t r a c t

    This article presents various important tools of chemometrics utilized as data evaluation tools generatedby various hyphenated analytical techniques including their application since its advent to today. Thework has been divided into various sections, which include various multivariate regression methods andmultivariate resolution methods. Finally the last section deals with the applicability of chemometric toolsin analytical chemistry. The main objective of this article is to review the chemometric methods used inanalytical chemistry (qualitative/quantitative), to determine the elution sequence, classify various datasets, assess peak purity and estimate the number of chemical components. These reviewed methodsfurther can be used for treating n-way data obtained by hyphenation of LC with multi-channel detectors.We prefer to provide a detailed view of various important methods developed with their algorithm infavor of employing and understanding them by researchers not very familiar with chemometrics.

    & 2014 Elsevier B.V. All rights reserved.

    Contents

    1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1872. Origin and development of chemometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1873. Chemometrics (multivariate data analysis) tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

    3.1. Multivariate regression method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1873.1.1. Multiple linear regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1883.1.2. Multivariate calibration methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    3.2. Multivariate decomposition methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1893.2.1. Principal component analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1903.2.2. Parallel factor analysis (PARAFAC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

    3.3. Hierarchical cluster analysis (HCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1913.4. Pattern recognition methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

    3.4.1. Soft independent modeling of class analogy (SIMCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1913.4.2. Linear discriminate analysis (LDA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

    3.5. Chemometric resolution methods (CRMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.5.1. Rank annihilation factor analysis (RAFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.5.2. Generalized rank annihilation method (GRAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1933.5.3. Evolving factor analysis (EFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1933.5.4. Heuristic evolving latent projections (HELP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1933.5.5. Subwindow factor analysis (SFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1943.5.6. Window factor analysis (WFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1943.5.7. Alternating least square (ALS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1953.5.8. Orthogonal projection analysis (OPA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

    4. Application of chemometric tools in analytical chemistry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

    Contents lists available at ScienceDirect

    journal homepage: www.elsevier.com/locate/talanta

    Talanta

    http://dx.doi.org/10.1016/j.talanta.2014.02.0030039-9140 & 2014 Elsevier B.V. All rights reserved.

    n Corresponding author. Tel.: 91 9878 882520; fax: 91 1636 239515.E-mail address: [email protected] (R.K. Rawal).

    Talanta 123 (2014) 186199

  • 5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

    1. Introduction

    Analytical procedure is a powerful tool that has the potential toincrease the efciency of audits since it is a relatively low-costprocedure that seems to have considerable value in identifyingerrors or irregularities and in guiding audits. Analytical result is oneof the keys to guarantying the quality of products. These enable usto verify the stability and purity of pharmaceutical and foodproducts. Hence, we say that analytical procedures must complywith the audits, if the consistency of their results is assured [1].

    In the present era analysts have increasing interest in thechromatographic and spectroscopic analyses of complex mixturessuch as drugs, herbal medicines, food and blood plasma samples.To achieve the best separation quality in support of ensuing quali-tative or/and quantitative analysis, chromatographic and spectro-scopic conditions should be optimized according to the particularanalytical objectives. So the important and signicant factors likesolvent, mobile phase, pH and column temperature affect the out-come of analysis and should be dened. In addition an experimentalprocedure also approaches the chemical reality of a sample. The nextstep involves determining the resolution or separation qualitymeasured in terms of response or optimization function, which isfollowed by the development of a mathematical or statistical modelthat describes the relationship between analytical (spectroscopic/chromatographic) parameters and the responses of the designedexperiments and at last predict optimal separation conditions. Fewtimes it might be repeated and the predicted optimal conditions willbe modied and validated to achieve desirable analytical results [2].

    Analytical results involve resolution, asymmetry ratio, peakpurity, precision, accuracy, robustness and so on. For evaluatingthese parameters from complex mixtures, a practical response ismost likely to be multi-criterion, i.e., one weighting gives severaldifferent criteria. So, some response functions like derringer'sresponse function are used traditionally [3,4].

    Obviously, high-quality analytical results can rarely be obtainedby optimizing an inappropriate response function not well relatedto the actual separation quality and special analytical objectives.Many traditional response functions involve single-responsedetectors. Traditional response function may fail to characterizesome crucial aspects of separation quality and encounter somedifculties in practical use. However, now recent approaches involvedifferent hyphenations, e.g. HPLC-DAD, GCMS and LCNMR that canbasically increase the available information, and are also useful forqualitative and qualitative analyses. With the aid of the spectralinformation in hyphenated instruments, greatly enhanced perfor-mance is seen in terms of the elimination of instrumental interfer-ences, retention time shift correction, selectivity, chromatographicseparation abilities and measurement precision; however, datacoming from these instruments is very complex and difcult toresolve or interpret [5].

    Therefore to achieve these results from our instrument werequire an appropriate mathematical and statistical approachusing a suitable response function. The proper use of such a modelwith information obtained from hyphenations will cast new lighton the evaluation of analytical data. Therefore, the hyphenatedtechnique is further combined with chemometric approaches todevelop a clear picture of herbal ngerprint.

    Thus, our main aim is to discuss the various chemometrics toolswith respect to regular analysis to improve the quality of analytical

    determinations of complex samples by fullling the performancecriteria. As we know there is a wide gap between analysts andchemometricians. Thus, many times, they are not able to useanalytical instruments. In the present review we have discussedanalytical process hyphenated with multivariate analysis and theapplication of various tools of multivariate analysis in analyticalchemistry.

    2. Origin and development of chemometrics

    In 1971, a Swedish scientist Svante Wold coined the termkemometri, in Swedish and in English it is equivalent tochemometrics [6]. The science of chemometrics can briey bedescribed as the interaction of certain mathematical and statisticalmethods in chemical measurement processes. It has been devel-oped as a consequence of the change in the data obtained with theemergence of new analytical techniques as well as microproces-sors. During 19861987 two journals named Chemometrics andIntelligent Laboratory Systems and Journal of Chemometrics were published.

    The breakthrough in chemometrics came in the 21st century byvarious software development companies, which promoted equip-ment intellectualization and offered new methods for the con-struction of new and high-dimensional hyphenated equipment.This hyphenated equipment has opened many new options fordata analytical method improvement. Now, chemometrics plays amajor role in analytical chemistry [7].

    3. Chemometrics (multivariate data analysis) tools

    Multivariate data analysis involves the analysis of data consist-ing of numerous variables measured from a number of samples.The aim of multivariate data analysis is to determine all thevariations in the data matrix study. Thus, chemometric tools tryto nd the relationships between the samples and variables in agiven data set and convert into new latent variables. Multivariatedata analysis is mainly classied into multivariate regression andmultivariate calibration methods based on complexity of the dataestimated.

    Multiple linear regression is widely applied for solving varioustypes of problems in one or few component analyses; however, inmany cases the involvement of multiple variables' interaction ofanalytes with each other, especially in herbal medicines (HMs),leads to quantify error. Therefore, in those cases biased regressionmethods can provide better results [8]. These methods are com-monly known as multivariate calibration methods. Therefore, in thisarticle we discuss both multiple linear regression and multivariatecalibration methods with their corresponding examples.

    3.1. Multivariate regression method

    In study of most of the chemicals, the concentration of one ormore analytes which has to be determined is based on measuredproperties of the system [9]. For example, linear regressionequation between two variables, concentration and absorbance,for the spectrophotometric determination of X analyte at

    N. Kumar et al. / Talanta 123 (2014) 186199 187

  • wavelength can be dened by the following equation:

    AXi bXiCX aXi 1where CX and AXi are the concentration and absorbance of theanalyte X at wavelength, respectively, bXi is the slope of the linearregression equation and aXi is the intercept of the regressionmodel [10]. Absorption of electromagnetic radiation at can berelated to concentration through the BeerLambert law.

    log Io=I cl 2where I is the intensity of light at passing through a sampleof length l, Io is the light intensity incident on the sample, c ishe concentration and is the absorptivity of the sample atthe specic wavelength. Mixture containing multiple analytesbecomes more complicated and to resolve these components weneed a regression model. Although there are different types ofmodels, generally linear, interaction and quadratic models aremost often used [11]. A linear model is

    Y 1c1l 2c2l pcpl E 3where Y is the response of a particular sample at , p is the molarextinction coefcient for the pth sample, cp is the correspondingconcentration and E is the spectral error. If is measured at severalwavelengths, then the equation obtained is

    Yji p

    k 1kckjl Ey 4

    where i is the index for wavelengths, j is the index for samples, p isthe number of components to be determined and Ey is spectralerrors. In matrix notation, Eq. (4) can be written as

    Y CnKEY 5where Y is the j i matrix of calibration spectra, C is the j lmatrix of component concentration, K is the l i matrix ofabsorptivity-path length products, and EY is the j i matrix ofspectral errors.

    3.1.1. Multiple linear regressions3.1.1.1. Tri-linear regression-calibration (TLRC). It is a calibrationmodel used to determine the mixture having three analytes withthe condition that they are not interfering with each other. Forexample, we have three analytes (X, Y and Z) that are measured atthree wavelength sets i 1; 2 and 3. The following equationscan be written for a three-component analysis:

    Amix1 bX1CX bY1CY bZ1CZ aXYZ1 6a

    Amix2 bX2CX bY2CY bZ2CZ aXYZ2 6b

    Amix3 bX3CX bY3CY bZ3CZ aXYZ3 6cwhere Amix1, Amix2 and Amix3 represent the absorbance of themixture of X, Y and Z analytes at three-wavelength sets, bX1, 2 and3, bY1, 2 and 3 and bZ1, 2 and 3 are the slopes of linear regressionequations of X, Y and Z, respectively, and aXYZ1, aXYZ2 and aXYZ3 arethe sums of intercepts of linear regression equations at the threewavelengths.

    Eq. (6) in matrix notation is represented as

    Amix1Amix2Amix3

    2664

    3775

    bX1 bY1 bZ1bX2 bY2 bZ2bX3 bY3 bZ3

    2664

    3775n

    CXCYCZ

    264

    375

    aXYZ1aXYZ3aXYZ3

    264

    375 7

    This equation can also be written as

    AmixaXYZ3n1 K3n3 UC3n1 8

    The matrix, b, corresponding to the slope values of linearregression equations is called the matrix, K, which is expressed as

    K bX1 bY1 bZ1bX2 bY2 bZ2bX3 bY3 bZ3

    2664

    3775 9

    In case, for the calculation of the concentration of the analytes,X, Y and Z in ternary mixture, the matrix AmixaXYZ3n1 is multi-plied by the inverse (K1)3n3 of the matrix K3n3 and it can bewritten as

    C3n1 k13n3AmixaXYZ3n1 10

    This procedure is the mathematical basis of the TLRC methodfor multi-component analysis. The developed calibration modelcan be applied easily to the resolution of the three-componentmixtures. However, this method is not appropriate to resolvemixtures with more than three components in a pharmaceuticalmixture [12,13].

    3.1.1.2. Multi-linear regression-calibration (MLRC). In case theabsorbance values of a mixture of three or more analytes aremeasured at n wavelengths i 1;2;;n, the following set ofequations can be written for a multi-component analysis:

    Amix1 bX1CX bY1CYbM1CM aXYm1Amix2 bX2CX bY2CY bM2CM aXYm2

    Amixn bXnCX bYnCY bMnCM aXYmn

    11

    where Amix1, Amix2, and Amixn represent the absorbances of themixture of X, Y, and m analytes at the selected wavelength set,(in); bX1, 2, , n, bY1, 2, , n, , and bM1, 2, , n are the slopes of nlinear regression equations of m components and aXm1, aXYm2,and aXYmn are the sums of intercepts of linear regressionequations at n wavelengths.

    In the matrix terms, the multi-equation system in Eq. (11) canbe formulated as follows:

    Amix1Amix2

    Amix3

    26666664

    37777775

    bX1 bY1 bm1bX2 bY2 bm2 bXn bYn bmn

    26666664

    37777775n

    CXCY

    CmM

    26666664

    37777775

    aXYm1aXYm2

    aXYmn

    26666664

    37777775

    12

    Thus Eq. (12) can also be written as

    AmixaXY:mnn1 Knnm UCmn1 13

    The matrices Cmn1 are calculated from Eq. (13) as follows:

    Cmn1 KT mnnKnnm1nK 1mnnnAmixaXYmnn1 14

    In this case, the MLRC model contains the use of linear algebra,also known as matrix mathematics. This calibration model can beapplied for the multi-resolution of a multi-component mixturesystem containing m compounds. But it has limited applicabilityon biological and herbal mixtures [1214].

    3.1.2. Multivariate calibration methodsMultivariate calibration methods have been widely used

    because it analyses those analytes that interact with each other.In these methods, rst we require the preparation of the trainingset from which a series of properties has been measured and then

    N. Kumar et al. / Talanta 123 (2014) 186199188

  • the prediction set is made in which the training set is used todetermine the concentration of the components of unknownmixtures from their spectral data [15].

    3.1.2.1. Classical least squares (CLS). This method assumes Beer'slaw model with the absorbance at each frequency being propor-tional to the component concentrations. In matrix notation, Beer'slaw model form calibration standards containing l chemical compo-nents with the spectra of n digitized absorbances is given by

    A CnKEA 15where A is the mn matrix of calibration spectra, C is the m lmatrix of component concentration, K is the ln matrix ofabsorptivity-path length products, and EA is the mn matrix ofspectral errors.

    The classical least squares solution during calibration is givenas follows:

    K CTC1CTnA 16where K represents the matrix of pure component spectra at unitconcentration and unit path length.

    Analysis based on the spectrum of unknown componentsconcentration (samples) is given as follows:

    c0 KKT 1KnA 17where c0 is the vector of predicted concentrations and KT is thetranspose of the matrix K [16,17]. CLS is a linear least squaremethod and its main disadvantages are limitations in the matrixshapes that linear models can assume over long ranges, possiblypoor extrapolation properties, and its sensitivity to outliers.

    3.1.2.2. Inverse least squares (ILS). This method treats theseconcentrations as a function of absorbance. The inverse of Beer'slaw model for m calibration standards with spectra of n digitizedabsorbance is given by

    C AnPEc 18where C and A are as before, P is the n l matrix of unknowncalibration coefcient relating the l component concentrations ofthe spectral intensities, and Ec is the m l vector of errors. Theinverse least square solution during calibration for P is

    P ATA1ATnC 19

    In this method the concentration of the analyte in the unknownsample is given as

    c0 aTnP 20

    where c0 and a represent the concentration and spectrum of theunknown analyte, respectively. Since in ILS the number of fre-quencies cannot exceed so that the total number of calibrationmixtures is used, and stepwise multiple linear regressions havebeen used for the selection of frequencies [1719].

    3.1.2.3. Partial least square regression (PLSR). PLSR is used toanalyze strongly collinear and noisy data with numerous Xvariables (independent variables) and also simultaneously modelthe several response variables, i.e. Y (dependent variables). MLR inwhich modeling of Y by means of X is done as long as when data isfew and fairly uncorrelated. However, in modern instrumentationonly X variables are in larger numbers and also strongly correlatedso that they are usually noisy and incomplete [20,21]. PLSR allowsone to inspect more complex problems by handling numerous andcollinear X variables and response variables Y and analyze data in amore rational way.

    Mean centering or scaling of both X and Y data matrix isperformed in PLSR so that it is tted in such a way that it describesthe variance of X and Y. PLSR is a maximum covariance method,because the main aim of PLS is to predict the y-variables from thex-variables. PLSR nds out the new variables for both X and Ymatrices, i.e. X-scores (T) and Y-scores (U), respectively [22].

    X-scores estimate the linear combination of variable xk withcoefcient of weight (Wn)

    T X UWn 21

    However, the weight W can be transformed to Wn which isdirectly related to X [23].

    From Eq. (21), Wn can be written as

    Wn WPTW1 22

    The PLSR model can be supposed to consist of an outer relationand an inner relation where the outer relation describes the X andY matrices individually while the inner relation links the twomatrixes together. The outer relations are given by the followingequations:

    X T UPTE 23

    Y U UCTF 24where PT is the loading matrix of the X space, CT is the loadingmatrix of the Y space. E and F are the residual matrices of the X andY spaces, respectively.

    X scores (T) are also good predictors for Y variables, i.e.correlated according to the following equation:

    Y T UCTG 25

    By combining Eqs. (21) and (25) we can write

    Y XWnCTG XBG 26where

    BWnCT 26awhere B represents the PLSR coefcient and G is the residualmatrix. The prediction of y-variables of new samples is determinedby Eq. (26).

    By putting the value of Wn from Eq. (22) in Eq. (26a)

    BWPTW1CT 27

    The part which is not explained by the model is called resi-duals. It is useful in determining model applicability which isindicated by residual value. Large residual value indicates that themodel is poor. When the rst PLS component has been calculated,then further one can be calculated based on the residual matrices.This process continues until we achieve approximately 99 of theexplained variance. The number of signicant PLS components in acalibration model can be decided by means of cross-validation[2427]. The main limitation in this method is the preparation ofcalibration as well as the prediction of the set and the employingof human decision for selecting the number of factors. In spiteof the wide applicability of this method, it is not applicable tocomplex herbal mixtures.

    3.2. Multivariate decomposition methods

    These methods are useful to resolve multivariate data byreducing their dimensionality into a number of correlated vari-ables, called the latent variables (principal components).

    N. Kumar et al. / Talanta 123 (2014) 186199 189

  • These methods are applied by decomposing the data into lesserdimensions, i.e. unfolding of three-way data is likely to be slicingup the three-dimensional data cube into two-dimensional tables.And then further placing these tables side by side to each othercreates a large two-dimensional data matrix. These methods arefurther explained in the following sections given below.

    3.2.1. Principal component analysis (PCA)The central idea of PCA is to reduce the dimensionality of a data

    set consisting of a large number of interrelated variables, whileretaining as much as possible of the variation present in the dataset. This is achieved by transforming to a new set of variables, whichare uncorrelated and ordered so that the rst few componentsretain most of the variation in all the original variables [2830].

    PCA is based on eigenvector decomposition of the covariancematrix of the process variables. Here, we will use that rows of adata matrix X correspond to samples while columns correspond tovariables [31,32]. For a data matrix X with m rows and n columns,the covariance matrix of X is dened as

    COV X XTX=m1 28If the columns of data matrix X are made mean centered and

    autoscaled then Eq. (28) gives the correlation matrix of X. PCAdecomposes the data matrix X as the sum of the outer product ofvectors score t (i.e. contain information regarding the interactionof the samples to each other) and loading p (i.e. eigenvectors of thecovariance matrix) plus a residual matrix E

    X t1pT1t2pT2 tkpTkE 29

    Here kmust be less than or equal to the smaller dimension of X.Fig. 1 represents the decomposition of data matrix by using PCA,where X is decomposed into score and loadings up to the desiredresponse.

    COV Xpi ipi 30where i is the eigenvalue associated with the eigenvector p.

    Score vector ti (orthogonal set) is the linear combination of theoriginal X data which is dened by pi (orthonormal set), calculatedby the following equation:

    Xpi ti 31The i in Eq. (30) describes the amount of variance present in

    the ti pi pair. The ti pi pairs are in descending order of i and therst pair captures the largest amount of information of any pair inthe process. It is also proved that the ti, pi pair captures thegreatest amount of variation in the data that is possible to capturewith a linear factor and then each next pair captures the greatestpossible amount of variance remaining at this step. Further,PCA describes data by using much fewer variables compared tothe original data [33].

    It is also possible to calculate a lack of model t statistic Q, forthe ith sample in data matrix X, i.e. xi:

    Q xiIPkPTk xTi 32where Pk is the matrix of the rst k loadings vectors retained in thePCA model and I is the identity matrix of appropriate size (n by n).The Q statistic measures the amount of variation in each samplenot captured by the k principal components retained in the modeland also tells how well each sample conforms to the PCA model.

    Hotelling's T2 statistics measures the variation present within thePCA model. T2 is the sum of normalized squared scores dened as

    T2 xiP1PTxTi 33

    The matrix 1 is a diagonal matrix containing the inverseeigenvalues associated with the k eigenvectors (principal compo-nents) retained in the model.

    PCA based on singular value decomposition (SVD) [30,34] isexplained using the matrix X and it can be written as

    X UDVTE 34where U contains the same column vectors as does t (score), VT isidentical to PT (loading) but normalized to length one and D is adiagonal matrix. These diagonal elements of D are the square rootsof the eigenvalues of XTX.

    Once the PC model has been developed for a training matrixthen it is tted to the model, giving scores for the new objects orloadings for the new variables. The formula for a new object x is asfollows:

    t xP 35

    PCA based on non-linear iterative partial least squares (NIPALS)[3436] algorithm is explained using matrix X which is scaled foreach dimension.

    Score vector t is selected from the column of matrix X with thelargest variance and then the loading vector pT is calculated usingthe following equation:

    pT tTX=tT t 36Now p is normalized to unit length by multiplying a constantfactor c.

    c 1=pTp

    p37

    Then the new score vector is calculated for the ith elementusing the following equation:

    t Xp=pTp 38

    Check the overlapping among different data using the sum ofsquared differences between all elements in two consecutive scorevectors. If the data meets then continue with Eq. (38), otherwisend other score values with the second largest variance. If overlaphas not been reached in the maximum number of iterations, i.e.20, then there is no strongly preferred direction of maximumvariance.

    For calculating residual we use the following equation:

    E XtpT 39PCA is widely applicable in herbal and biological mixtures and

    easily resolved data using latent variables, but it also employs theselection of a number of components which is human decision-based and thus leads to chances of error. Dimension reduction canonly be achieved if the original variables were correlated. If the

    Fig. 1. Representing schematic description of data matrix X into various PCs.

    N. Kumar et al. / Talanta 123 (2014) 186199190

  • original variables were uncorrelated, PCA does nothing, except forordering them according to their variance.

    3.2.2. Parallel factor analysis (PARAFAC)Parallel factor analysis (PARAFAC) is a decomposition method

    for three-way arrays that can be seen as a simplication of bilinearPCA to higher order arrays. This multi-way method originates frompsychometrics [37,38]. In this, data decomposition is made intotri-linear components. Each component consists of one scorevector and two loading vectors instead of one score and oneloading as in PCA.

    The PARAFAC model for three-way array is given by three loadingmatrices, A, B and C, with elements aif, bjf, and ckf. The PARAFACmodel of a three-way array minimizes the sum of the squares of theresiduals eijk for F components [39,40].

    xijk F

    f 1aif bjf ckf eijk 40

    For the fth column of loading matrix equation can be written as

    X F

    f 1af bf cf 41

    where af, bf and cf are the fth columns of the loading matrices A, Band C, respectively.

    Alternating least squares (ALS) nds out the solution of thePARAFAC model sequentially assuming two modes of loadingwhich are known and then estimates the parameters for theunknown set [41].

    The additional advantage of PARAFAC over PCA is to eliminatethe rational problem. By using the right number of components inPARAFAC where tri-linear data is needed, we can nd the truespectra with appropriate signal-to-noise ratio. Hence, loadingsobtained with PARAFAC can be interpreted directly [42]. Further-more, in the PARAFAC algorithm all the components are calculatedsimultaneously which is not possible in PCA because it requires anumber of steps. It is the most advanced method used for theresolution of three-dimensional data obtained from differenthyphenated techniques and the method has wide applicability inherbal mixtures, impurity proling, etc.

    3.3. Hierarchical cluster analysis (HCA)

    Cluster analysis is a multivariate technique that arranges com-ponents on the basis of their characteristics. It classies compo-nents on the basis of their similarity in space. As a result, clusterexhibits high homogeneity in the intergroup and high heteroge-neity among different groups [4345].

    Hierarchical agglomerative clustering provides good similaritycorrelation of data of one sample to the entire data of othersamples. It is further expressed graphically as a picture of differentgroups and their proximity by high reduction in dimensionality ofthe original data. The HCA dendrogram is based on the assumptionthat the sample having similar values numerically is close to eachother in space so that the sample with high similarity has highproximity. In cluster analysis similarity is measured on the basis ofdistance between different data points; so Euclidean distance (ED)and Mahalanobis distance (MD) are used to measure the distancebetween clusters.

    For two clusters r and s having a number of components nr andns, respectively, ED and MD are calculated as

    EDrs xrxsxrxsT

    q42

    MDrs xrxsV 1xrxsT

    q43

    where V is the covariance matrix.The basic process of HCA is explained using N data sets to be

    clustered. Assign each data set as a cluster so there are N clustersand the distance between different clusters represents similaritybetween data sets. Then nd the closest pairs of cluster and mergethem into a single cluster and then nally compute the distancebetween new clusters that has to be formed by merging. Finally,clustering tree of size N is obtained. The analysis is not stablewhen cases are dropped; this occurs because the selection of acase (or merger of clusters) depends on the similarity of one caseto the cluster. Dropping one case can drastically affect the coursein which the analysis progresses.

    3.4. Pattern recognition methods

    3.4.1. Soft independent modeling of class analogy (SIMCA)Soft independent modeling of class analogy (SIMCA) [4648] is

    a supervised classication technique that uses PCA or PLS forclassication by creating condence region around each classusing residuals of the samples in the calibration set. SIMCA is apattern recognition technique in which new objects are projectedas a member of a particular class based on their Euclidian distancefrom its particular principle component space. Euclidian distancedoes not exceed a condence limit from a particular principlecomponent. SIMCA can also be treated as a multivariate outliertest because it checks outliers in the space of the selected PCs. Onthe basis of residual variance of the objects in the training set,residual standard deviation (s0) is calculated as

    s0 mk 1

    ni 1e

    2ik

    m1ama1

    s44

    where e2ik is the residual of object, k is the calibration set forvariable i, m is the number of observations in the calibration set, nis the number of variables and a is the number of principalcomponents. The number of degrees of freedom given in theequation is used in case when the number of observations is lessthan the number of variables, i.e. mrn.

    The residuals of the training set follow normal distribution andthe F-test may be used to describe the critical value of Euclideandistances of the objects towards the model.

    scrit Fcrits20

    q45

    where Fcrit is the tabulated value for the specic degree of freedomat a signicant level. For a new object xnewj that belongs to a certainclass having residual vector enewji , the residual standard deviation(sj) is calculated as

    sj ni 1enewji 2m1a

    s46

    By investigating whether the residual variance sj is signicantlydifferent from the pooled residual variance s0 of the model or not.This is done by calculating the Fnewj and comparing it with thetabulated critical value Fcrit.

    Fnewj s2j =s20 47

    Mahalanobis distance (MD) proposed by Hawkins is usedinstead of Euclidean distance for multivariate outlier tests. Furtherdescriptions of SIMCA can be found elsewhere. An attractivequality of SIMCA is that a principal component mapping of thedata has occurred. Hence, samples that may be described byspectra are mapped onto a much lower dimensional subspacefor arrangement. If a sample is comparable to the other samples in

    N. Kumar et al. / Talanta 123 (2014) 186199 191

  • the class, it will lie closer to them in the principal component mapdened by the samples indicating that class [49,50].

    3.4.2. Linear discriminate analysis (LDA)LDA is a supervised pattern recognition method, which is used

    in cases where class variance is asymmetrical [51,52]. This methodmaximizes the ratio between both variances compared to thewithin-group variance. It searches for a linear function of thevariable in multivariate space. When the number of variables islarger than the number of observations in multi-dimensional data,then we cannot use LDA directly. In that case rst PCA is employedfor data compression to transform the original data set comprisingof large number of inter-correlated variables into a reduced newset of variables. LDA makes a discriminate function for each groupaccording to the following equation:

    FGi ki n

    j 1wijpij 48

    where i represents the number of groups (G), ki is the constant ofeach group, n is the number of parameters used to classify a dataset into a given group and wj is the weight coefcient assigned byLDA to a selected parameter (pi).

    3.5. Chemometric resolution methods (CRMs)

    The main goal in the analysis of any multi-component system isgetting useful information from the raw experimental data andknowledge of the number of chemical components in the analyzedsample or certain important retention regions. It is an importantanalytical objective especially in herbal drugs. The commonpurpose of all CRMs is providing the linear model of individualcomponent contributions using solely the raw experimental mea-surements [5355]. CRMs decompose mathematically a mixedresponse from instrumental data into the pure contributions dueto each of the components in the system. Further, this response isorganized in the matrix X containing raw information about all thecomponents present in the data set. Resolution methods allowthe decomposition of data matrix X into the dot product of twodata matrices C and ST associated with the row and the columndirection of the data matrix X, respectively, and each of themincludes the pure response proles of the n components. In matrixnotation, the expression valid for all resolution methods is

    X CnSTE 49where X (r c) is the original data matrix, C (rn) and ST (n c)are the matrices containing pure response proles related to thedata variation in the row and in the column direction, respectively,and E (r c) is the residual variation of the data set that is notrelated to any chemical contribution.

    Parameters r and c represent the number of rows and thenumber of columns of the original data matrix, respectively, and nis the number of chemical components. We will give a schematicrepresentation in Fig. 2 on how the various MCR methods work. Inresolution methods, there is no need of any previous knowledge ofchemical or mathematical expression to analyze the data set.It makes these methods useful in routine analysis. However, initialinformation that can be obtained from instrumental analysisinuences positively the resolution of the system. Thus, thisinformation can be used to build good initial estimates of con-centration proles and responses.

    In many CRMs, preliminary analysis derived from principalcomponent analysis (PCA) is one of the most basic and widelyused chemometric tools discussed above to nd the number anddirection of the relevant sources of variation in a bilinear data set.The information provided by PCA is essential in the resolution

    process and in many resolution methods to determine the totalnumber of chemical components, which basically affect theambiguity of the nal solutions. Therefore, our main aim is todiscuss the various important resolution methods which minimizethe source of uncertainty in the analytical results and to makethese methods familiar with regular analysis.

    3.5.1. Rank annihilation factor analysis (RAFA)Rank annihilation factor analysis (RAFA) [56,57] is the rst

    method used for the decomposition of three-way data. RAFA isbased on rank analysis for two-way bilinear matrix and based onthe principle that the rank of pure analyte is one. It is used toanalyze the given component quantitatively in the presenceof other or possibly unknown two-way bilinear matrix. RAFAinvolves two bilinear data sets, rst is the calibration set Mstdand the second is the sample set Mu. By measuring a mixture thatcontains a known amount of analytes, a calibration set is preparedand then the analyte has to be quantied in the sample set whichcontains measurements of the sample of interest. After applyingprincipal component analysis (PCA) on data, rank Ru and Rs fordata matrix Mu and Ms, respectively, are obtained. Data matrix Mshas 1 component with rank Rs1 and data matrix Ru is theoreti-cally equal to 1ni (the number of interfering compounds). Thenthe rank of E residual is calculated as

    EMuMs 50where is equal to the ratio of the concentration of the kthcomponent in the standard solution to its value in the mixture.Therefore, the rank of residual matrix E is Ru1, i.e. it only showsinterfering compounds. Matrices Mu and Ms are written as

    Ms xkyTk 51

    Mu N

    k 1xky

    Tk 52

    where xk is the column vector containing xi and yk is a columnvector containing yj for component k. The data matrix of samples is

    Fig. 2. Flow chart of the application of multivariate curve resolution methods.Examples of 3D chromatogram obtained from LC-DAD and by using PCA andvarious MCR tools resolute the pure spectral and chromatographic prole.

    N. Kumar et al. / Talanta 123 (2014) 186199192

  • composed of N components. The concentration of the analyte inthe standard solution is cs and in the sample is cs. To nd out aniterative procedure is used which plots the eigenvalues. Ideallyeigenvalues show the number of components N but in realsituations the exact number of components is not known for sure.It becomes minimal when exactly compensates for the signal ofthe analyte in the sample.

    The concentration ck of the kth constituent is now computed as

    ck c0k= 53

    3.5.2. Generalized rank annihilation method (GRAM)GRAM [58,59] is a non-iterative method from which simulta-

    neous quantication of all the analytes present in the sample canbe carried out using one bilinear matrix from a mixture ofstandards (calibration set), i.e., one standard for each analyte.It can determine the relative concentration of each analyte in asample (unknown mixture) from which one can easily determinethe actual concentration. GRAM involves two bilinear data sets:rst is a calibration set Mstd containing spectra at each retentiontime of analytes in the standard mixture and the second is asample set Mu containing spectra at each retention time ofanalytes in the unknown sample. Both matrices Mstd and Mu arewritten as

    Mstd XCoYT 54

    Mu XCuYT 55where X J1nK and Y J2nKmatrices in which J1 relate to thenormalized chromatographic prole, J2 contains spectral data andK represents the total number of components. Co and Cu arediagonal matrices representing the concentration of analytes.The next step is to perform singular value decomposition (SVD)to calculate the number of signicant factors, i.e. the total numberof analytes

    M UDVT 56where U contains the same column vectors as does J1, VT isidentical to J2 but normalized to length 1 and D is a diagonalmatrix. Using eigenvalue-eigenvector equation for obtaining theeigenvalue matrix

    S1VMuUT Z Z 57Here Z is the eigenvector matrix and is the matrix of eigenvalues.After this, chromatographic prole and spectral prole are calcu-lated.

    X UZ 58

    Y VZ1T 59

    The concentration ck of the kth constituent is now computed as

    ck c0k=k 60

    Here k is the eigenvalue of the kth analyte, i.e., analyte ofinterest. The assignment of eigenvalue correspondence to analyteof interest is done by the hit and trail method. This can be doneby calculating the correlation coefcient between the spectrumof analyte in standard and sample. Finally, the eigenvalue iscalculated that is associated with the spectrum of the highestcorrelation.

    3.5.3. Evolving factor analysis (EFA)EFA [60,61] is a non-iterative method based on time-dependent

    rank analysis i.e., each row is associated with an increase in rank

    by one. However, this method is based on the rst in rst outassumption, i.e. the compound that started to elute rst willdisappear rst. EFA includes evolving PCA in two directions alongthe retention point Rt, namely forward and backward. Eigenvaluefrom forward PCA shows the retention points where chemicalcomponents begin to appear while backward PCA indicates theretention points where chemical components begin to disappear;by combining information from these data we nd out the elutionsequence of each of the components by this method.

    The collected information is in the form of two differentregions, i.e. concentration region and zero concentration region.Concentration window, i.e. the region where each compoundexists, is dened by performing process in both forward andbackward directions, i.e. starting from the rst spectrum to thelast one and the last one to the rst, respectively. Zero concentra-tion windows are the regions where nothing elutes out from aparticular component.

    This method can be understood by considering data matrix Xhaving n number of chemical components containing true con-centration prole C which is a linear combination of abstractchromatogram U and spectral prole S which is a linear combina-tion of abstract spectrum V. Both U and V can be determined byperforming singular value decomposition (SVD).

    X UDVTE 61where U contains the same column vectors as does t (score), VT isidentical to PT (loading) but normalized to length one and D is adiagonal matrix. These diagonal elements of D are the square rootsof the eigenvalues of XTX.

    After this, we have to nd out the transformation matrix Tusing Eq. (62); it is used to determine the individual concentrationprole

    C UT 62

    Then the specic component concentration prole ci is deter-mined by the following equation:

    ci Untii 1;2;;n 63

    Further, this procedure is repeated to determine n components,and at the end, the concentration prole is used to resolve thespectral prole.

    S CTC1CTX 64By taking advantage of the evolutionary feature of chromato-

    graphic separation, this method may become the rst methodused in a sophisticated manner to obtain the elution sequenceand it is also used as a main step for many self-modeling curveresolution (SMCR) methods till now. As in EFA one can analyze theprogressively increasing sub-matrix; Keller and Massart [62]suggested a method in which a window of xed size is used andmoves along the retention points to perform PCA, called the xed-size moving window evolving factor analysis (FSMWEFA).

    Eigen structure tracking analysis (ETA), which was proposed byToft and Kvalheim [63,64], is very similar to FSMWEFA. In this,it starts with a window of size 2 and the size is progressivelyincreased by 1 up to the size when the window exceeds thenumber of components in the examined window by one. Localnoise can be determined from the last evolving eigenvalue. ETAcan reveal the noise pattern as well as the elution sequence.

    3.5.4. Heuristic evolving latent projections (HELP)HELP [65,66] is a method which is used to resolve two-way

    bilinear multi-component data into spectra and chromatograms ofpure constituents. This method has a feature visual interface from

    N. Kumar et al. / Talanta 123 (2014) 186199 193

  • latent variable projection graph and also provides information onthe rank of the data matrix. It employs the formation of eigenvalueplot which is plotted between logarithms of eigenvalues againstretention time. In this method one can systematically calculate theeigenvalue for m sub-matrices containing spectra collected atretention points 1, 1 to 2, 1 to 3,,1 to m1, 1 to m, where m isthe total number of retention points. Then all the eigenvaluesobtained are joined by a line. In the next step, zero concentrationregions and selective concentration region of the compound aredetermined from the eigenvalue plot. Selective concentrationregion is dened as the region in which only a single componentelutes out, i.e. having rank equal to one.

    This method can be understood by considering data matrix Xhaving n number of chemical components containing true con-centration prole C which is a linear combination of abstractchromatogram U and spectral prole S which is a linear combina-tion of abstract spectrum V. Both U and V can be determined byperforming singular value decomposition (SVD).

    X UDVTE 65where U contains the same column vectors as does t (score), VT isidentical to PT (loading) but normalized to length one and D is adiagonal matrix. These diagonal elements of D are the square rootsof the eigenvalues of XTX.

    Then we have to nd out the transformation matrix T which isused to determine the individual concentration prole

    C UT 66

    In order to determine ti for individual component i, both zeroconcentration region and selective concentration region are incor-porated in Eq. (55):

    csel zero;i Usel zero;intii 1;2;;n 67where subscripts selzero, i imply the use of the selective regionplus the zero concentration region for component i. Then ti can bewritten as

    ti UT sel zero;inUsel zero;i1UTsel zero;incsel zero;i 68Evolving latent projection (ELP) graph gives us information

    about the selective region of the individual component in wave-length as well as retention time space. Such a region can beidentied by passing straight lines in ELP graphs. After determin-ing ti, the next step is to determine the elution pattern of aparticular component by Eq. (66). This procedure is repeated todetermine n components and at the end the concentration proleis used to resolve the spectral prole using Eq. (64).

    3.5.5. Subwindow factor analysis (SFA)Subwindow factor analysis (SFA) [67] is a method which is

    introduced directly as a solution to extract component spectrafrom overlapping structures obtained from a hyphenated instru-ment without resolving the concentration proles. It is based onthe theorem [68] If the concentration window of the analyte forevery interferents has a subwindow where the interferent isabsent. Now it is possible to calculate the spectrum of the analyte.Further, one has to select subwindowsZ2 where only spectralcomponent corresponding to particular analyte exists and it isfollowed by identication of that component.

    SFA is employed to determine a selective region, i.e. subwin-dow in which analyte appears without interferents. It involvesdetermining three more subwindows, namely left subwindow,right subwindow and middle subwindow. Left subwindow is theone in which only left interferents are present, i.e. interferingcompound which begins to elute before the selective analyte andappears in a chromatogram to the left of the selective analyte.

    In the same way the right subwindow is one in which only theright interferents are present, i.e. interfering compound whichcontinues to subsequently elute with selective analyte and appearsin a chromatogram to the right of the selective analyte. Asubwindow in which both interferents are present is called amiddle subwindow. The rst step in the SFA is the recognition ofsubwindows which is similar to other window-based methods andis done by any one of the methods discussed in this article (EFA,HELP, WFA, OPA). The difference of SFA from other methods is thecombination of elution limits in windows or subwindows.

    This method is discussed by considering data matrix X in whichrank analysis is performed that gives the number of chemicalcomponents of the left and right subwindows, say m and n,respectively. Then the total number of chemical components inboth subwindows is mn1 because the analyte is common inboth. Now the procedure is used to determine the vectorscorresponding to the largest singular values which represent thechemical information related to singular-value decomposition.Further, common vector v corresponding to analyte in bothsubspaces can be determined using the following equation:

    vMaNb 69where M and N are matrices with m and n columns, respectively,but in practice Ma is not equal to Nb and one has to nd out a andb which minimize the squared norm E

    E MaNb2 aTMTMabTNTNb2aTMTNb 70where aTabTb1 and MTM and NTN are identity matrices. Thenwe can write

    E 21aTMTNb 71

    If a and b are left and right singular values associated with thelargest singular value di of data matrix MTN then we can write

    E 21d1 72

    The singular value of d lies in the range 01. By determiningvector v one can ensure that only this vector is common in bothregions or if there is no common vector present then d1 will besignicantly less than 1. Similarly if there is more than 1 commonvector then d2 will be close to 1.

    Sometimes it is also possible that the rank is less than thenumber of components present, i.e. subwindow limit is verynarrow. In such type of cases we have to select the outer limitsof both windows wider than that suggested by rank analysis andchange the inner limits till good t is obtained.

    3.5.6. Window factor analysis (WFA)WFA [69,70] is a self-modeling technique developed for the

    resolution of multivariate data matrix obtained from evolutionaryprocess. In this method we nd that chemical components havesingle unique maxima in the evolutionary process by specifyingthe window, i.e., the region which is indigenous to a chemicalcomponent along the evolutionary axis. After this, the concentra-tion prole of the component is directly obtained without recourseto any information concerning the other components.

    Let X represent a data matrix where each column is a spectrumrecorded at different retention points during an evolutionaryprocess. Window factor analysis (WFA) assumes that the totalspectra S consist of a linear sum of abstract spectrum s. If there aren components, then

    X n

    i 1Xi

    n

    i 1sicTi 73

    N. Kumar et al. / Talanta 123 (2014) 186199194

  • where si and ci are vectors representing the spectra and concen-tration prole of component i, respectively, and Xi is a matrixrepresenting the contribution of i to the data matrix.

    Using SVD, data matrix X can be decomposed into a product ofspectral prole S which is a linear combination of abstractspectrum V and concentration prole C which is a linear combina-tion of abstract chromatogram U which are multiplied together toreproduce the data within experimental uncertainty:

    X UDVT 74where D is a diagonal matrix. These diagonal elements of D are thesquare roots of the eigenvalues of XTX.

    Suppose X0 is a sub-matrix of X and is obtained by removing alldata columns containing signals of component n. If all othercomponents except for component n exist outside the windowthen SVD of matrix X0 will yield a matrix So containing n1orthonormal spectral vectors V0j and a matrix C0 containing n1concentration prole vectors Uoj that are mutually orthogonal, sothat

    XJO Uoj DV0Tj 75

    A matrix Xj with columns proportional to the real concentrationof the jth compound can be calculated as follows [6870]:

    Xj IV0TV0X bncjns0Tj 76

    where I is the identity matrix of dimension n, b is constant, and cjand sj are concentration and spectral prole of jth component,respectively. Xj represents an average concentration prole of jthcomponent and once the concentration prole for the componentis determined then the true spectral prole with least square isdetermined.

    3.5.7. Alternating least square (ALS)Multivariate curve resolution (MCR) supported by the alternat-

    ing least squares (ALS) [24,7173] optimization algorithm hasbeen revealed as a powerful tool for resolving two- and three-way data matrices. This method is easily adapted for tri-linear datasets of different complexities and provides the least squaresolutions. This method is used to improve the initial estimationof either spectral or concentration prole. This method isexplained by using data matrix X which are obtained by analyzinga mixture of different concentrations C and Xi for the ith compo-nent:

    Xi CiSTEii 1;2;3;; i 77

    If the total number of rows and columns of spectral prole arethe same for all the samples, then all the samples are estimatedsimultaneously.

    X

    X1X2X i

    C1C2C i

    STE 78

    At starting point, rst we have to determine the number for co-eluted components in a particular peak cluster by principalcomponent analysis (PCA). Afterwards one has to determine theinitial estimation of spectral or concentration prole of analytes byone of the best suited methods discussed above in this article (EFA,WFA, SFA, and OPA) before ALS constrained optimization starts. InALS, 4 constraints named as non-negativity [54,74], unimodality[75,76], selectivity [65,77] and normalization have been applied.

    These constraints are applied to avoid the presence of rotationaland intensity uncertainties.

    Non-negativity constraint is applied to both concentration andspectral proles because chemical concentrations and spectra areonly dened to be positive or zero. Unimodality is a constraintwhich is frequently applied to force chromatographic elutionproles of matrix X into a single peak shape. This constraint isapplied in the case of co-eluted components that have similarspectra. Then it avoids the appearance of elution proles withdouble peaks. Selectivity constraint is applied either to concentra-tion or spectral proles. Finally, a normalization constraint isapplied to x scale indeterminacy during ALS resolution.

    In this way, all the components in a particular chromatographicrun are assumed to have the same relative signal contribution inthe spectral domain. All the differences in concentration of thecomponents in the mixtures will be expressed in the area ofelution proles. Outcomes from ALS optimization are the estima-tions of C, ST and E matrices. This gives us MCR-ALS resolvedelution proles, pure species spectra and residuals, respectively,tted by the constrained ALS optimization procedure.

    3.5.8. Orthogonal projection analysis (OPA)OPA [78,79] is a stepwise process and selects one key variable

    in each step. This method calculates dissimilarity based on themathematical concept of orthogonalization [80,81]. The method isbased on the fact that pure spectra are extreme spectra and willcover the mixture of spectra. It compares each spectrum with oneor more than one reference spectra and searches for least correla-tion. The rst dissimilar plot represents a comparison of eachspectrum with the average spectrum. The principle of OPA is asfollows: the instrument produces a data matrix, X (mn), where them rows are spectra measured at regular time intervals and the ncolumns are chromatograms measured at different wavenumbers.The data matrix X is bilinear, i.e., it can be decomposed into theproduct of the individual concentration matrix (C) and the purecompound spectra matrix (S).

    The dissimilarity of the ith spectrum, di, is dened as thedeterminant of the dispersion matrix of Yi. In general, matrices Yiconsist of one or more reference spectra and the spectrummeasured at the ith analysis time.

    di detYTi nYii 1;2;;m 79

    A dissimilarity plot is then obtained by plotting the dissim-ilarity values against the analysis time. The spectrum having thehighest dissimilarity value is the least correlated with the meanspectrum, and is then selected xs1. Then the dissimilarity of eachindividual spectrum of X with respect to xs1 is calculated. Asbefore, Yi consists of one reference spectrum which is the meanspectrum of matrix X and each xi. Then, the mean spectrum issubstituted by xs1, which is the most dissimilar spectrum in thedispersion matrix Yi of dimensions (m(nn2)) and the spectrummost dissimilar with xs1 is selected as xs2 and a second dissim-ilarity plot is obtained by applying Eq. (79). The spectrum mostdissimilar with xs1 is selected as (xs2) and added to matrix Yi (m(nn3)). Then this process continues until each spectrum containedin X is compared with the spectra already selected and includesthe most dissimilar one into matrix Yi. Then each step is plottedand visualized and the random prole indicates the number ofspectra equal to the number of components.

    In short, we can say that the procedure consists of three steps:(1) comparison of each spectrum of X with all spectra alreadyselected by applying Eq. (79), (2) plotting of the dissimilarity plotand (3) selection of the spectrum with the highest dissimilarityvalue by including it as reference in matrix Yi.

    N. Kumar et al. / Talanta 123 (2014) 186199 195

  • Then, nd out concentration by least-squares procedure usingthe following equation:

    C XnSnSTS1 80

    where S represents the spectral matrix. Concentration prole isobtained from least-squares by applying constrain of non-negativity and unimodality. A new set of spectra (matrix S) isobtained by the following equation:

    S XTnCnCTC1 81

    Then, the sum of square of residual SSR is calculated:

    R XCnST 82

    SSR m

    i 1n

    j 1r2ij 83

    where rij is the difference between the measured and the pre-dicted absorbance values at time i and wavelength j. This processcontinues until the relative difference in the SSR between twosuccessive iterations is lower than a pre-dened threshold.

    4. Application of chemometric tools in analytical chemistry

    Analytical application of chemometric technique is given in Table 1,which shows use of data from multivariate techniques generated withthe aid of analytical techniques. This data is a compilation of theliterature available in Scopus [82] and various research works werecarried out using chemometric tools. This comprehensive reviewfocuses on the application of various chemometric techniques fromits advent to the present era. In this article we have tried toincorporate maximum research work dealing with multivariate dataanalysis in the eld of analytical chemistry (i.e. plant and herbalmedicines and mixtures/miscellaneous compounds) and classied

    Table 1Chemometric methods used in analytical techniques classied on the basis of herbal medicines and mixtures of miscellaneous compounds.

    Analytical technique Chemometric method Plant and herbal medicine Mixtures and miscellaneous compounds

    UV-spectrometry TLRC [83]MLRC [83]CLS [83,84,8688,92,96,99,100,102104]ILS [19,8488,99,102104]PCR [19,8486,88,89,9296,99104,106]PLSR [84,88106,145]

    HPLC-DAD PCA [107110,112,113,115,146151] [120]CLS, PLS, PCR [154]PLS [107,112]PARAFAC [136,152] [119,135,155157]HCA [108,110,125] [147]SIMCA [112,129]RAFA [123,124,126,127,158]GRAM [121,122,135,159161]EFA [138,151]FSMW-EFA [130]WFA [138]OPA [130,138]MCR-ALS [136] [120,128,130,131,135,155,162]PCA-LLS [153]

    LCMS PARAFAC [155]MCR-ALS [132,133]

    LCNMR EFA, SFA, ITTFA, MCR-ALS, OPA, AUTOWFA [139]PCA [163]

    UPLC-DAD PCA, HCA [111,134]UPLCMS PCA [164]GCMS PCA [117] [165]

    GRAM [141]PARAFAC [140,165]

    GCGCMS PARAFAC [137]PLS-DA [166]

    HS-SPMEGCMS PCA [114]NIR PLS, PCA [118]

    SIMCA [50]FT-IR PLS-DA [167,168]

    PCA [116,169]Raman spectroscopy PCA, SIMCA [170]TLC-FD MCR-ALS [171]Spectrouorimetry PLS [172]

    PARAFAC [173]

    CLS: classical least square; EFA: evolving factor analysis; FSMW-EFA: xed size moving window evolving factor analysis; FT-IR: Fourier transform infrared; GCMS: gaschromatographymass spectrometry detection; GCGCMS: two-dimensional gas chromatographymass spectrometry detection; GRAM: generalized rank annihilationmethod; HCA: hierarchical cluster analysis; HPLC-DAD: high performance liquid chromatography, with diode array detection; HS-SPMEGCMS: head space solid-phasemicroextraction coupled with gas chromatography mass spectrometry; ILS: inverse least square; ITTFA: iterative target transformation factor analysis; LCMS: liquidchromatography mass spectrometry; LCNMR: liquid chromatography nuclear magnetic resonance; LLS: local least square; MCR-ALS: multivariate curve resolution byalternating the least squares approach; MLRC: multi-linear regression-calibration; NIR: near infrared; PCA: principle component analysis; PCR: principle componentregression; PLSR: partial least square regression; PARAFAC: parallel factor analysis; PCA: principal components analysis; PLS: partial least square; PLS-DA: partial least squarediscriminate analysis; OPA: orthogonal projection analysis; RAFA: rank annihilation factor analysis; SFA: sub-window factor analysis; SIMCA: soft independent modeling ofclass analogy; TLC-FD: thin layer chromatography with uorescence detection; TLRC: tri-linear regression-calibration; UPLC-DAD: ultra-performance liquid chromatographywith diode array detection; UPLCMS; ultra-performance liquid chromatography with mass spectrometry; WFA: window factor analysis.

    N. Kumar et al. / Talanta 123 (2014) 186199196

  • them according to the analytical and chemometric techniques used.This section describes them on the basis of obtained analytical results.

    TLRC and MLRC methods are used for the multi-resolution of aternary mixture [83]. These methods have been proved as usefulwith very simple mathematical contents for multi-resolution ofthe three-component mixture systems and show accuracy in therange 99101% for a mixture of highly overlapping spectra. Variousapproaches like CLS, ILS, PCR and PLSR have been used for theresolution of multi-component mixture using a UV spectrophot-ometer and their results are comparable with the HPLC methodsand it was found that the accuracy is in the range of 98103%[19,84106]. A six-component mixture is resolved by PLS and PCRand the obtained results show good agreement with the HPLCmethod [95].

    Two-wavelength HPLC ngerprinting was applied to the qualityassessment for 46 Cassia seed samples [107]. Based on thismethod, the roasted COS, raw COS, roasted CTS, and raw CTSsamples were discriminated by PCA calculation. Moreover, the PLSprediction models produced satisfactory results on the test set.10triterpenoid acids were simultaneously determined by the HPLC-DAD method in fruits of Ziziphus jujuba [108] and the method wassuccessfully applied to 42 samples. HPLC ngerprint analysis withchemometric methods was used for the purpose of speciesdifferentiation, quality evaluation and consistency check of RadixPaeoniae collected from different sources [109]. The chemometricmethods including HCA and PCA proved satisfactory for matchingand discrimination of Artemisia selengensis [110] and RhizomaCoptidis samples [111]. HCA, PCA, PLS-DA and SIMCA were ableto classify samples of Ganoderma lucidum successfully in accor-dance with the province of origin [112]. HPLC-PCA assay is used todifferentiate enotoginseng root extract from the extract of otherplant parts of notoginseng and also from the extract of Asian orAmerican ginseng plant parts [113].

    HS-SPME/GCMS analysis of the volatile prole of BalsamicVinegar of Modena coupled with statistical data analysis by meansof PCA have shown this method to be useful to discriminateBalsamic Vinegar of Modena samples with different maturationand aging characteristics [114]. Cnidiummonnieri fruits obtainedfrom different regions of China have been clustered reasonablyinto different groups based on the coumarin content by principalcomponent analysis (PCA) and cluster analysis [115]. Adulterationof cod liver oil with selected vegetable oils (CaO, CO, SO, and WO)can be monitored with FTIR spectroscopy and by using PLS, normalspectra can successfully be used to detect the level of oil adulter-ants [116]. Scutellariabarbata from different origins was studied byGCMS and the results were evaluated by principal componentanalysis and it was found that this method was useful for thediscrimination of its adulterants [117]. NIR spectroscopy combinedwith multivariate analysis was used for the simultaneous quanti-cation of -phosphatidylcoline and cholesterol in liposome [118].PARAFAC was used as a technique for resolving partly separatedpeaks of lidocaine and prilocaine into their pure chromatographic,spectral and concentration proles [119]. HPLC-DAD combinedwith MCR-ALS is used for impurity proling in order to obtain agood estimate of the content or relative response factors of smallchromatographic impurity peaks without knowledge of theirmolar absorption coefcients and without any pre-calibration[120]. GRAM was proved to be very useful for quantifyingcontaminants in complex marine samples [121]. GRAM was usedto quantify aromatic sulfonates in water with HPLC when inter-fering components co-eluted with the analytes of interest.Concentrations of analyte of interest were determined morequickly because a complete resolution is not required [122]. TheRAFA procedure was used to estimate the model parameters in acomplex gray chemical system when there was not enoughinformation about the whole process [123]. Mean centering of

    ratio spectra using the spectrum of absorbing reagent as divisor bycombining with RAFA is used to detect the contribution of onechemical component in an unknown sample [124]. The HPLC-DADmethod was developed to evaluate the quality of ReceptaculumNelumbinis (dried receptacle of Nelumbonucifera) through estab-lishing chromatographic ngerprint and simultaneous determina-tion of ve avonol glycosides, including hyperoside, isoquercitrin,quercetin-3-O--D-glucuronide, isorhamnetin-3-O--D-galactosideand syringetin-3-O--D-glucoside [125]. RAFA was used for quan-titative analyses of multi-component uorescence data as acquiredin the form of an excitation-emission matrix (EEM) by the videouorimeteris demonstrated using 10 different samples of a six-component poly-nuclear aromatic hydrocarbon solution whoseconstituents have a wide range of uorescence quantum efcien-cies and spectral overlaps [126]. Simultaneous multi-componentrank annihilation method was applied to a set of six-componentpoly-nuclear aromatic hydrocarbon [127]. Liquidsolid extractioncoupled to LC-DAD was applied for quantization of co-elutedorganophosphorus pesticides: fenitrothion, azinphos-ethyl, diazi-non, fenthion and parathion-ethyl and data evaluation by Multi-variate self-modeling curve resolution and this shows improvedresolution of the co-eluted organophosphorus insecticides andtheir quantization at trace level [128]. PCA, SIMCA and HCA wereapplied on HPLC ngerprint of Epimediumwushanenseas in order toidentify and distinguish their secondary metabolites [129].

    HPLC-DAD was used for the analysis of a tetracycline hydro-chloride sample and the data obtained is resolved by OPA andFSMW-EFA into pure spectra and individual concentration proles.This method was successfully applied for the detection of impu-rities in the sample [130]. Use of MCR-ALS is evaluated in theanalysis of complex biocide environmental sample mixtures by LC-DAD and Multivariate Curve Resolution has been shown to be apowerful Chemometrics tool to solve the lack of chromatographicresolution and strong co-elution problems encountered in theanalysis of complex biocide mixtures in environmental samples[131]. Resolution and quantitative determination of a mixture ofco-eluted pesticides (carbofuran, propoxur and pirimicarb) in LCMS by MCR are shown. Carbofuran and pirimicarb both exhibitsimilar mass spectrum and co-eluting is distinguished by theproposed MCR method [132]. Mixtures of multiple biocide com-pounds were simultaneously analyzed in standard mixtures and inenvironmental samples with little sample pretreatment using LC-DAD; all biocide compounds were properly resolved by MCR-ALSand quantitatively analyzed with the estimated errors alwaysbelow 20% [133]. Ultra-performance liquid chromatographyquad-rupole time-of-ight mass spectrometry (UPLC-QTOF/MS) andmultivariate statistical analysis were used to investigate theprocessing technology of Loquat (Eriobotrya japonica) leaf (pipaye,PPY). The differences in samples processed under different meth-ods were revealed by unsupervised principal component analysis(PCA) [134].

    GRAM, PARAFAC and MCR-ALS were able to quantify over-lapped and highly drifted chromatographic proles. Such prolescan be found in the determination of compounds at very lowconcentrations in natural samples [135]. MCR-ALS is evaluated inthe analysis of nine phenolic acids, both in standards mixturesamples and in strawberry juice samples, by LC-DAD [136].

    In [137], 100% of 12 selected trimethylsilyl derivatized organic acidmetabolites in human infant urine were located with the Dot Mapalgorithm. Vanillic acid (TMS) was located by Dot Map, but alsoexhibited overlap with other organic acids. The presence of vanillicacid (TMS) was conrmed by PARAFAC to yield pure componentinformation suitable for subsequent quantication. In [138], OPA isused for the determination of the number of compounds present ina multi-component system. The performance of the OPA algorithm iscompared with that of two window-based self-modeling curve

    N. Kumar et al. / Talanta 123 (2014) 186199 197

  • resolution approaches: EFA and WFA. The results obtained with theEFA method are slightly better than those with OPA.

    Multivariate curve resolution methods are classied and theneed to check the applicability of various curve resolution meth-ods to data obtained with different types of instruments is high-lighted [139]. It is demonstrated that LCNMR data can be resolvedif NMR peak cluster information is utilized. PARAFAC is proposedin [140], for the alignment of LCMS data in the chromatographicdirection, with moderate shifts present in the data. In [141] a semi-automated approach is proposed for the resolution and quantica-tion of unresolved target-analyte with GCMS. In this paper twomethods were utilized to correct retention-time shifts after theGRAM method was applied. When an environmental sample of1,4-dithiacyclohexane was analyzed, the GRAM calculated concen-tration was evaluated with an error of about 10%.

    From Table 1 we concluded that applicability of chemometricalapproaches is increased rapidly. There are thousands of researchpapers across these elds using chemometric-like data processingmethods during the period 20082013. There has been anincreased use of multivariate methods across broad ranges ofscientic disciplines, and these tools have become rather standardtraining for most graduate-level scientic disciplines.

    In the past, one had to code these algorithms in order to usethem; however, this has now become basic because of morepowerful computer software packages that are available todaywhich provide the option of a standard form of the algorithm.Multivariate analysis methods are now standard for basic andapplied research.

    The application of hyphenated techniques with the aid of chemo-metrics for ngerprint analysis can be very useful and the datacollected are analyzed in the form of two-dimensional matrices.The results obtained from matrices allow a better discriminationof the samples depending on the two variables measured. Chemo-metrics also allows the utilization of more than one analytical methodin order to nd the correlation between different variables. Chemo-metrics is an efcient and powerful tool for the quality control ofdifferent herbs and plants [142,143]. In [144] Lavine describes theimportance of chemometrics in regular analysis and also cited variousarticles published during 20082009.

    5. Conclusion

    To evaluate the ngerprint of complex herbal products, hyphe-nation with chromatography offers a powerful tool for separatingthe individual components. After the analysis is accomplishedthen for extracting useful information that resides within thegenerated data, various data-handling methods are used. There-fore, we stressed upon various multivariate methods that are usedto extract the information in the data. First various multivariateregression methods are discussed that are used to explain thevariation in data set by latent variables. These methods are veryuseful and even necessary in all branches of chemistry and theapplicability of chemometrics to very complicated problemsmakes the life of a chemometrician very challenging and theyseem to work at the border of impossible. Then various multi-variate resolution methods are explained which provide a linearmodel for individual component contributions using only the rawexperimental measurements.

    The advancement of multivariate methods is continuous, fastand efcient. With the improvements in exploratory tools, adapt-ability for analysis of complex data, quality control of herbal drugsand assessment of the results, it can be envisioned to be animportant increase in the range of application of these methodsand a more generalized and standardized use of multivariatemethods by the analytical community.

    Acknowledgments

    We are grateful to Shri Parveen Garg (Chairman), I.S.F. Collegeof Pharmacy, Ferozepur Road, Moga 142001, India, for the neces-sary support and motivation. The authors gratefully acknowledgePSCST of Punjab (project PSO/POS/80/9334) for the nancialsupport and ISFCP, Moga for supporting this work.

    References

    [1] M. Ortiz, L. Sarabia, J. Chromatogr. A 1158 (2007) 94110.[2] L. Xu, L.J. Tang, C.B. Cai, H.L. Wu, G.L. Shen, R.Q. Yu, J.H. Jiang, Anal. Chim. Acta

    613 (2008) 121134.[3] A. Siouf, R. Phan-Tan-Luu, J. Chromatogr. A 892 (2000) 75106.[4] P. Vanbel, J. Pharm. Biomed. Anal. 21 (1999) 603610.[5] Y.Z. Liang, P. Xie, K.C. Chan, J. Chromatogr. B 812 (2004) 5370.[6] R. Kiralj, M. Ferreira, J. Chemometr. 20 (2006) 247272.[7] D. Jing, H. Linfang, J. Med. Plants Res. 5 (2006) 40014008.[8] L.S. Ramos, K.R. Beebe, W.P. Carey, E. Sanchez, M. Erickson, B.E. Wilson,

    L.E. Wangen, B.R. Kowalski, Anal. Chem. 58 (1986) 294315.[9] P.K. Hopke, Anal. Chim. Acta 500 (2003) 365377.[10] E. Din, J. Pharm. Biomed. Anal. 33 (2003) 605615.[11] K. Wiberg, Stockholm, Multivariate Spectroscopic Methods for the Analysis of

    Solutions, Kent Wiberg, Sweden, 2004.[12] E. Din, A. zdemir, Chem. Pharm. Bull. 53 (2005) 899906.[13] E. Dinc, A. Ozdemi, Die Pharmazie-Int. J. Pharm. Sci. 59 (2004) 700705.[14] F.G.K. Baucke, R. Naumann, C. Alexander-Weber, Anal. Chem. 65 (1993)

    32443251.[15] P. Geladi, Spectrochim. Acta B 58 (2003) 767782.[16] S. Frans, J. Harris, Anal. Chem. 57 (1985) 26802684.[17] E.V. Thomas, Anal. Chem. 66 (1994) 795804.[18] P. Geladi, B.R. Kowalski, Anal. Chim. Acta 185 (1986) 117.[19] E. Dinc, E. Baydan, M. Kanbur, F. Onur, Talanta 58 (2002) 579594.[20] G. Musumarra, M. Fichera, Chemometr. Intell. Lab. 44 (1998) 363372.[21] S. Wold, M. Sjstrm, L. Eriksson, Chemometr. Intell. Lab. 58 (2001) 109130.[22] D.M. Haaland, E.V. Thomas, Anal. Chem. 60 (1988) 11931202.[23] H. Swierenga, P. De Groot, A. De Weijer, M. Derksen, L. Buydens, Chemometr.

    Intell. Lab. 41 (1998) 237248.[24] R. Manne, Chemometr. Intell. Lab. 2 (1987) 187197.[25] U. Indahl, J Chemometr. 19 (2005) 3244.[26] D.S. Cao, Q.S. Xu, Y.Z. Liang, X. Chen, H.D. Li, J Chemometr. 24 (2010)

    584595.[27] S. Garca-Muoz, M. Polizzi, Chemometr. Intell. Lab. 114 (2012) 116121.[28] S. Wold, K. Esbensen, P. Geladi, Chemometr. Intell. Lab. 2 (1987) 3752.[29] R. Tauler, E. Casassas, J Chemometr. 3 (1989) 151161.[30] J. Camacho, J. Pic, A. Ferrer, Chemometr. Intell. Lab. 100 (2010) 4856.[31] B.M. Wise, N.B. Gallagher, J. Proc. Control 6 (1996) 329348.[32] A. Duarte, S. Capelo, J. Liq. Chromatogr. Relat. Technol. 29 (2006) 11431176.[33] U. Depczynski, V. FrostK. Molt, Anal. Chim. Acta 420 (2000) 217227.[34] J. Mandel, Am. Stat. 36 (1982) 1524.[35] S. Wold, M. Sjstrm, Chemometr. Intell. Lab. 44 (1998) 314.[36] I. Jolliffe, Principal Component Analysis, Wiley Online Library, U.K., 2005.[37] R. Bro, Chemometr. Intell. Lab. 38 (1997) 149171.[38] B.D. Marx, P.H.C. Eilers, B. Li, Chemometr. Intell. Lab. 109 (2011) 120130.[39] J.D. Carroll, J.J. Chang, Psychometrika 35 (1970) 283319.[40] J.B. Kruskal, Linear Algebra Appl. 18 (1977) 95138.[41] J.M.F. ten Berge, Least Squares Optimization in Multivariate Analysis, DSWO

    Press, Leiden University Leiden, The Netherlands, 1993.[42] O. Abollino, M. Malandrino, A. Giacomino, E. Mentasti, Anal. Chim. Acta 688

    (2011) 104121.[43] D.L. Massart, B.G.M. Vandeginste, L. Buydens, Handbook of Chemometrics

    and Qualimetrics: Part A, Elsevier Science, Amsterdam, The Netherland,1997.

    [44] B.G.M. Vandeginste, D.L. Massart, Handbook of Chemometrics and Quali-metrics: Part B, Elsevier Science, Amsterdam, The Netherlands (1998) 1713.

    [45] K.R. Beebe, R.J. Pell, M.B. Seasholtz, Chemometrics: A Practical Guide.[46] R.G. Brereton, Chemometrics: Data Analysis for the Laboratory and Chemical

    Plant, Wiley, West Sussex, England, 2003.[47] S. Wold, Pattern Recognition 8 (1976) 127139.[48] W.J. Dunn , S. Wold, Chemometric Methods in Molecular Design, VCH

    Publisher, New York (2008) 179193.[49] B. Mertens, M. Thompson, T. Fearn, Analyst 119 (1994) 27772784.[50] R. De Maesschalck, A. Candol, D. Massart, S. Heuerding, Chemometr. Intell.

    Lab. 47 (1999) 6577.[51] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press,

    San Diego, CA, USA, 1990.[52] S. Axler, Linear Algebra Done Right, Springer, San Francisco, CA, USA, 1997.[53] A. De Juan, R. Tauler, Anal. Chim. Acta 500 (2003) 195210.[54] W.H. Lawton, E.A. Sylvestre, Technometrics 13 (1971) 617633.[55] E. Sylvestre, W. Lawton, M. Maggio, Technometrics 16 (1974) 353368.[56] C. Ho, G. Christian, E. Davidson, Anal. Chem. 50 (1978) 11081113.[57] A. Lorber, Anal. Chem. 57 (1985) 23952397.

    N. Kumar et al. / Talanta 123 (2014) 186199198

  • [58] N. Faber, L. Buydens, G. Kateman, J. Chemometr. 8 (2005) 147154.[59] E. Sanchez, B.R. Kowalski, Anal. Chem. 58 (1986) 496499.[60] M. Maeder, Anal. Chem. 59 (1987) 527530.[61] C.J. Xu, J.H. Jiang, Y.Z. Liang, Analyst 124 (1999) 14711476.[62] A.C. Whitson, M. Maeder, J. Chemometr. 15 (2001) 475484.[63] H. Keller, D. Massart, Anal. Chim. Acta 246 (1991) 379390.[64] J. Toft, O.M. Kvalheim, Chemometr. Intell. Lab. 19 (1993) 6573.[65] O.M. Kvalheim, Y.Z. Liang, Anal. Chem. 64 (1992) 936946.[66] Y.Z. Liang, O.M. Kvalheim, H.R. Keller, D.L. Massart, P. Kiechle, F. Erni, Anal.

    Chem. 64 (1992) 946953.[67] R. Manne, H. Shen, Y. Liang, Chemometr. Intell. Lab. 45 (1999) 171176.[68] R. Manne, Chemometr. Intell. Lab. 27 (1995) 8994.[69] E.R. Malinowski, J. Chemometr. 6 (2005) 2940.[70] W. Den, E.R. Malinowski, J. Chemometr. 7 (2005) 8998.[71] R. Tauler, I. Marqus, E. Casassas, J. Chemometr. 12 (1998) 5575.[72] R. Tauler, A. Smilde, B. Kowalski, J. Chemometr. 9 (2005) 3158.[73] R. Tauler, Chemometr. Intell. Lab. 30 (1995) 133146.[74] O.S. Borgen, B.R. Kowalski, Anal. Chim. Acta 174 (1985) 126.[75] B.G.M. Vandeginste, W. Derks, G. Kateman, Anal. Chim. Acta 173 (1985)

    253264.[76] S. Wold, Technometrics 20 (1978) 397405.[77] P. Geladi, S. Wold, Chemometr. Intell. Lab. 2 (1987) 273281.[78] F.C. Sanchez, J. Toft, B. Van den Bogaert, D. Massart, Anal. Chem. 68 (1996)

    7985.[79] F.C. Sanchez, B. Vandeginste, T. Hancewicz, D. Massart, Anal. Chem. 69 (1997)

    14771484.[80] F.C. Sanchez, M. Khots, D. Massart, Anal. Chim. Acta 290 (1994) 249258.[81] F.C. Sanchez, M. Khots, D. Massart, J. De Beer, Anal. Chim. Acta285 (1994)

    181192.[82] w.s. http:/www.scopus.com/, Scopus, An abstract and citation database of

    research literature and quality, 2013.[83] E. Din, J. Pharm. Biomed. Anal. 33 (2003) 605615.[84] E. Din, . stnda, Il Farmaco 58 (2003) 11511161.[85] E. Din, D. Baleanu, F. Onur, J. Pharm. Biomed. Anal. 26 (2001) 949957.[86] E. Din, C. Ycesoy, F. Onur, J. Pharm. Biomed. Anal. 28 (2002) 10911100.[87] E. Din, .M. Palabyk, . stnda, F. Yurtsever, F. Onur, J. Pharm. Biomed.

    Anal. 28 (2002) 591600.[88] E. Din, . stnda, J. Pharm. Biomed. Anal. 29 (2002) 371379.[89] A. El-Gindy, S. Emara, A. Mostafa, IL Farmaco 60 (2005) 269278.[90] M.S. Cmara, C. Mastandrea, H.C. Goicoechea, J. Biochem. Biophys. Methods

    64 (2005) 153166.[91] P.C. Damiani, A.C. Moschetti, A.J. Rovetto, F. Benavente, A.C. Olivieri, Anal.

    Chim. Acta 543 (2005) 192198.[92] G.M. Hadad, A. El-Gindy, W.M.M. Mahmoud, Spectrochim. Acta A 70 (2008)

    655663.[93] Y. Ni, Y. Wang, S. Kokot, Food Chem. 109 (2008) 431438.[94] A. El-Gindy, F. El-Yazby, A. Mostafa, M.M. Maher, J. Pharm. Biomed. Anal. 35

    (2004) 703713.[95] A. El-Gindy, S. Emara, A. Mostafa, J. Pharm. Biomed. Anal. 41 (2006) 421430.[96] Y. Ni, Y. Wang, S. Kokot, Spectrochim. Acta A 70 (2008) 10491059.[97] J. Ghasemi, A. Niazi, S. Ghobadi, J. Chin. Chem. Soc. Taipei 52 (2005)

    10491054.[98] N. Ramos Martos, A. Molina Daz, A. Navaln, I. De Orbe Pay, L. Capitn

    Vallvey, J. Pharm. Biomed. Anal. 23 (2000) 837844.[99] D. Nagavalli, V. Vaidhyalingam, A. Santha, A.S.K. Sankar, O. Divya, Acta

    Pharm. 60 (2010) 141152.[100] M.C.F. Ferraro, P.M. Castellano, T.S. Kaufman, J. Pharm. Biomed. Anal. 34

    (2004) 305314.[101] A. Ruiz MedinaM. Fernandez de CordovaA. Molina-Diaz, J. Pharm. Biomed.

    Anal. 21 (1999) 983992.[102] A.S.K. Sankar, T. Vetrichelvan, D. Venkappaya, Acta Pharm. 61 (2011)

    283296.[103] K. VIPUL, M. Rajshree, Anal. Sci. 23 (2007) 445451.[104] K. Kaur, A.K. Malik, B. Singh, M. Godarzi, Thai J. Pharm. Sci. 33 (2009)

    123136.[105] M. Luis, J. Fraga, F. Jimnez, A. Jimnez, J. Arias, Talanta 53 (2001) 761770.[106] R. Bhaskar, M.K. Sagar, V. Saini, K. Bhat, J. Anal. Sci. Methods Instrum. 2

    (2012) 161166.[107] Y.H. Lai, Y.N. Ni, S. Kokot, Chin. Chem. Lett. 21 (2010) 213216.[108] S. Guo, J. Duan, Y. Tang, S. Su, E. Shang, S. Ni, D. Qian, J. Pharm. Biomed. Anal.

    49 (2009) 12961302.[109] S. Xu, L. Yang, R. Tian, Z. Wang, Z. Liu, P. Xie, Q. Feng, J. Chromatogr. A 1216

    (2009) 21632168.[110] L. Peng, Y. Wang, H. Zhu, Q. Chen, Food Chem. 125 (2011) 10641071.[111] W.J. Kong, Y.L. Zhao, X.H. Xiao, C. Jin, Z.L. Li, Phytomedicine 16 (2009)

    950959.[112] Y. Chen, S.B. Zhu, M.Y. Xie, S.P. Nie, W. Liu, C. Li, X.F. Gong, Y.X. Wang, Anal.

    Chim. Acta 623 (2008) 146156.[113] H.L. Ma, M.J. Qin, L.W. Qi, G. Wu, P. Shu, Biomed. Chromatogr. 21 (2007)

    931939.[114] M. Cirlini, A. Caligiani, L. Palla, G. Palla, Food Chem. 124 (2011) 16781683.[115] F. Yan, Z. Liang, C. Jianna, W. Zhengtao, X. Losahan, Z. Zhengxing, Talanta 53

    (2001) 11551162.

    [116] A. Rohman, Y.B. Che Man, Vib. Spectrosc. 55 (2011) 141145.[117] R. Pan, F. Guo, H. Lu, W. Feng, J. Pharm. Biomed. Anal. 55 (2011) 391396.[118] A. Porre, I. Tomuta, L. Tefas, S.E. Leucuta, M. Achim, J. Pharm. Biomed. Anal.

    63 (2012) 8794.[119] K. Wiberg, S.P. Jacobsson, Anal. Chim. Acta 514 (2004) 203209.[120] K. Wiberg, J. Chromatogr. A 1108 (2006) 5067.[121] R. Gimeno, E. Comas, R. Marc, J. Ferr, F. Rius, F. Borrull, Anal. Chim. Acta 498

    (2003) 4753.[122] E. Comas, R.A. Gimeno, J. Ferr, R.M. Marc, F. Borrull, F.X. Rius, J. Chromatogr.

    A 988 (2003) 277284.[123] H. Abdollahi, A. Golshan, Anal. Chim. Acta 693 (2011) 2634.[124] M. Bahram, M. Mabhooti, Anal. Chim. Acta 639 (2009) 1928.[125] Y.-B. Wu, L.-J. Zheng, J. Yi, J.-G. Wu, T.-Q. Chen, J.-Z. Wu, Int. J. Mol. Sci. 14

    (2013) 19992010.[126] C. Ho, G. Christian, E. Davidson, Anal. Chem. 52 (1980) 10711079.[127] C.N. Ho, G.D. Christian, E. Davidson, Anal. Chem. 53 (1981) 9298.[128] R. Tauler, S. Lacorte, D. Barcel, J. Chromatogr. A 730 (1996) 177183.[129] L. Wang, X. Wang, L. Kong, Biochem. Syst. Ecol. 40 (2012) 138145.[130] K. De Braekeleer, A. De Juan, D. Massart, J. Chromatogr. A 832 (1999) 6786.[131] E. Per-Trepat, A. Hildebrandt, D. Barcel, S. Lacorte, R. Tauler, Chemometr.

    Intell. Lab. 74 (2004) 293303.[132] J. SalauM. HoningR. TaulerD. Barcelo, J. Chromatogr. A 795 (1998) 312.[133] E. Per-Trepat, S. Lacorte, R. Tauler, J. Chromatogr. A 1096 (2005) 111122.[134] L. Wu, X. Jiang, L. Huang, S. Chen, PLoS ONE 8 (2013) e64178.[135] E. Comas, R.A. Gimeno, J. Ferr, R.M. Marc, F. Borrull, F.X. Rius, J. Chromatogr.

    A 1035 (2004) 195202.[136] S. Mas, G. Fonrodona, R. Tauler, J. Barbosa, Talanta 71 (2007) 14551463.[137] A.E. Sinha, J.L. Hope, B.J. Prazen, E.J. Nilsson, R.M. Jack, R.E. Synovec,

    J. Chromatogr. A 1058 (2004) 209215.[138] F. Cuesta Sanchez, S. Rutan, M. Gil Garcia, D. Massart, Chemometr. Intell. Lab.

    36 (1997) 153164.[139] M. Wasim, R.G. Brereton, J. Chromatogr. A 1096 (2005) 215.[140] D. Bylund, R. Danielsson, G. Malmquist, K.E. Markides, J. Chromatogr. A 961

    (2002) 237244.[141] C.G. Fraga, J. Chromatogr. A 1019 (2003) 3142.[142] H.A. Gad, S.H. El-Ahmady, M.I. AbouShoer, M.M. AlAzizi, Phytochem. Anal.

    24 (2012) 124.[143] C. Tistaert, B. Dejaegher, Y.V. Heyden, Anal. Chim. Acta 690 (2011) 148161.[144] B. Lavine, J. Workman, Anal. Chem. 82 (2010) 46994711.[145] A. Niazi, A. Azizi, Turk. J. Chem.