22
Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras MG Brasil Jul/2014

Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

  • Upload
    vannhan

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

Chemoface

User guide

Cleiton A. Nunes

Universidade Federal de Lavras

Lavras – MG – Brasil

Jul/2014

Page 2: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

2

Contents

Installing 3 Installation on Microsoft Windows 3 Installation on Unix-like systems 3 Updating 3 Running 3 Experimental Design 4 Designing the experiment 4 Inserting the responses 5 Analyzing the design (not applicable to Mixture Design) 5 Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs) 5 Additional features 6 Pattern Recognition 7 Inserting data set 7 Data preprocessing 8 Performing PCA 8 Performing HCA 9 Additional features 10 Multivariate calibration 11 Inserting data set 11 Data preprocessing 11 Selecting test set (optional) 12 Performing leave-one-out cross-validation 12 Performing the calibration (obtaining the model) 12 Performing external validation using test set (optional) 14 Predicting y for new samples 14 Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA) 14 Additional features 14 Data Plot 15 Inserting data set 15 Data preprocessing 15 Plotting data 15 Additional features 16 Data Organization 17 Importing files (only .txt, .dat, .csv, .bmp) 17 Exporting the dataset 19 Additional features 19 Figure handling 20 Saving figure with high resolution 20 Tools on the figure window 21 References 22

Page 3: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

3

Installing

Chemoface1 was developed on the MATLAB

2 environment. It is a stand-alone application and does not

require a MATLAB license installation to run. Indeed, only MATLAB Compiler Runtime (MCR) is required to

be installed, which is freely available along with Chemoface. MCR is a set of shared libraries that provides

complete support for all the features of MATLAB.

Installation on Microsoft Windows

Download and install MCR.

Download and install Chemoface.

Installation on Unix-like systems

Install MCR and Chemoface using the Wine (www.winehq.org).

For better viewing, install the corefonts and enable fontsmooth.

Installing corefonts using Winetricks:

Select the default wineprefix > Install a font > check corefonts > OK

Enabling fontsmooth using Winetricks:

Select the default wineprefix > Change settings > check fontsmooth=rgb > OK

Updating

At Chemoface home screen, access menu Help > Check for update. If there is a new version, download and

install Chemoface as reported above. To update, MCR installation is not required.

Running

Access the modules from Chemoface home screen (Figure 1).

A screen resolution lower than 1024x768 is not recommended.

Chemoface uses dot (.) as decimal mark.

Figure 1. Chemoface home screen

Page 4: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

4

Experimental Design

Figure 2. Designing experiments

Designing the experiment

Select the design type (1).

Select the number of factors (2).

Select the number of replications for the response (dependent variable) (3).

If appropriate, add central point (4) and select the number of replications (5).

If Fractional Factorial is used, select the size of the fraction (6).

If Mixture Design is used, select the Simplex type (7).

Set the label of the response and the labels and levels for the factors (8).

Note: If Full, Fractional, Central Composite or Plackett-Burman is used, provide only values for -1 and +1;

additional levels (e.g. axial and central points) are automatically calculated. If Mixture Design is used,

provide values ranging from 0 to 1; constraints can be used, providing values >0 and/or <1. If Plackett-

Burman is used, dummy factors are just left empty.

Press Design button (9).

Note: It is possible to alternate between coded and uncoded factor levels using menu Edit > Alternate coded-

uncoed variables. The model will be fitted according to the provided levels (coded or uncoded).

1

2

3

4 5

6

7

8 9

10

11 12

13 14

18 19 20 21

15

17 16

22

Page 5: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

5

Inserting the responses

Insert the responses (dependent variable) typing it on the table (10) or paste from a spreadsheet using

menu File > Paste responses.

Analyzing the design (not applicable to Mixture Design)

Select the significance level (11).

Select the mode to calculate the error (12). If selected, error is calculated by replications on central

points (if available). If unselected, the error is calculated by replications in each assay (if available). All

errors are calculated as pure error. If Plackett-Burman is used and it is unselected, error is calculated by

dummy factors.

Press Effects button (13) to obtain the effects table (Figure 3A).

Press Pareto’s Chart button (14) to obtain a Pareto’s graph for the effects (Figure 3B).

A

B Figure 3. Table and graph for the effects

Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs)

Select the model type (15).

Select the significance level (16).

Press Coefficients Stats button (17) to obtain a table with the model coefficients (Figure 4A).

Choose if experimental points should be plotted (18) and if only significant coefficients (19) should be

used for model fit.

Select the graph type (20).

For design with more than two factors, select the factors to be plotted (21) and set the level of the no

plotted factor(s) (Figure 4B).

Press Surface Plot button (22) to obtain the graph and the statistics for the model (Figure 4C-D).

Page 6: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

6

A

B

C

D Figure 4. Tables and graph for the model

Additional features

All obtained tables can be copied using menu Edit > Copy…

A designed experiment can be saved using menu File > Save data. Later, this file can be opened using

menu File > Open data.

Experimental assays for a Mixture design can be plotted using menu +plot > Experimental points.

Page 7: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

7

Pattern Recognition

Figure 5. Pattern recognition by PCA and HCA

Inserting data set

Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using

menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in

columns. In the example, samples are propolis from different seasons and variables are specific m/z

signals from its mass spectra.3

Note: see Data Organization chapter to detail about how to import data set such as spectra, bitmaps, etc.

If appropriate, insert sample labels using menu Edit > Sample labels or classes. If there are sample

classes, it can be inserted similarly (Figure 6A).

If appropriated, insert variable labels using menu Edit > Variable labels. If data set is spectroscopic, use

> Spectroscopic data (Figure 6C) and insert the spectral range. For all other data set, use > Generic

data (Figure 6B).

Note: if labels are not provided, these are numbered.

1

2 3

4

5

6

7

8

10

16

8

11

12

13 14

15

15

17

Page 8: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

8

A

B

C

D

Figure 6. Inserting sample and variable labels

Data preprocessing

Select the preprocessing method (2).

Press Apply button (3).

Note: Multiple preprocessing methods can be applied just by selecting them and pressing Apply.

To return to the original data after a preprocessing, select “No preprocess” and press Apply.

Performing PCA

Press Run PCA button (4).

Select PC to be plotted on X and Y axis (5). If a 2D plot is used, select PC to be plotted on X and Y

axis and set Z axis in “none”. However, for a 3D plot (Figure 7D), select the PC to be plotted on all 3

axis: X, Y and Z.

Note: If sample classes are provided, they may be colored on the scores plot: select “Colored by sample

classes” (6). Optionally sample labels also can be inserted: just provide the labels when solicited (Figure

6D).

For scores graph, press Scores plot button (7) (Figure 7A).

For loadings graph (Figure 7B), select the plot type: points or vectors (8).

Press Loadings plot button (9)

For a scores and loadings graph (Figure 7C), press Biplot button (10)

Page 9: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

9

A

B

C

D

Figure 7. PCA plots

Performing HCA

Select the distance type (11).

Select the linkage type (12).

Note: Optionally PCA can be preciously applied to HCA: select Apply PCA to data (13) and select the

number of PC to be used when solicited.

Press Dendrogram button (14) to run HCA and obtain the dendrogram (Figure 8B).

Insert the threshold to color the similar groups (Figure 8A). If it is not applicable, press Cancel.

Page 10: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

10

A

B

Figure 8. HCA plot

Additional features

Additional columns and rows can be inserted on the table using + buttons (15).

Columns and rows can be deleted using menu Edit > Delete…

Data set can be transposed using menu Edit > Transpose.

Data set can be saved using menu File > Save data. Later, this file can be opened using menu File >

Open data.

Numeric values used to build the graphs may be copied by pressing Copy plot data button (16).

PC variance table can be copied by pressing Copy table data button (17).

Page 11: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

11

Multivariate calibration

Figure 9. Performing multivariate calibration

Inserting data set

To insert the independent variables (matrix X) paste it from a spreadsheet (or from Data Organization

module) using menu File > Paste X. Use sample in rows and variables in columns. In the example, the

matrix X is formed by transmittances from 4000 to 600 cm-1 for coffee samples (1).

4

To insert the dependent variable (vector y) paste it from a spreadsheet using menu File > Paste Y. In

the example, y is the content of husk (%) in coffee samples (2).

Note: In order to insert multiple y and build the respective models, provide Y as a matrix with samples in

rows and each y in a column.

Optionally, sample and variables labels for X (3), as well as, property label for y (4) can be inserted

using menu Edit >… similarly to Figure 6. If labels are not provided, they are numbered.

Data preprocessing

Select the preprocessing method (5).

Press Apply button (6).

Note: Multiple preprocessing methods (7) can be applied just by selecting them and pressing Apply.

To return to the original data after a preprocessing, select “No preprocess” and press Apply.

1 2

3 4

5 6 7

8

9 10

11 22 15 22 18 22

12 23 16 23 19 23

13 14 17

20

24

24 24

Page 12: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

12

Selecting test set (optional)

Samples can be selected and used as test set (external validation) by using menu Edit > Select sample

for test set and select the method (Figure 10A). The test set samples can be informed manually (Figure

10B) or automatically selected by Kennard-Stone algorithm5 (Figure 10C). Test samples are marked

with a “t” on the tables (1,2).

A

B

C Figure 10. Selecting samples for test set

Performing leave-one-out cross-validation

Select the regression method (PLS, PCR or MLR) (8).

Select the maximum number of latent variables (for PLS) or principal components (for PCR) to be tested

on the cross-validation (9).

Press Run CV button (10).

To build the graphs (such as RMSECV, R², and others) (Figure 11A), select the data to be plotted on X

and Y axis (11) and press Plot button (12).

Performing the calibration (obtaining the model)

If PLS or PCR is used, select the adequate number of LV or PC to be used in the model (13).

Press Fit button (14).

To build the graphs (such as measured x predicted and others), select the data to be plotted on X and Y

axis (15) and press Plot button (16).

To obtain the performance parameters of the model (such as RMSEC, R² and others) (Figure 11B),

select “Model stats” in X axis or Y axis (15) and press Plot button (16).

Note: For more details about statistical parameters for model performance, please see cited references.6-7

Page 13: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

13

A

B

C

D

Figure 11. Plots for multivariate calibration

Page 14: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

14

Performing external validation using test set (optional)

If samples for test set was previously provided (informing it manually or by Kennard-Stone algorithm5),

press Predict button (17).

If test set was not provided as above, insert the X and y (or Y for multiple y) for external validation using

menu File > Paste new X for prediction and menu File > Paste new Y for prediction test respectively.

To build the graphs (such as measured x predicted and others), select the data to be plotted on X and Y

axis (18) and press Plot button (19).

To obtain the performance parameters of the prediction, select “Pred stats” in X axis or Y axis (18) and

press Plot button (19).

Predicting y for new samples

Insert the new X to have the predicted y using menu File > Paste new X for prediction.

Press Predict button (17).

View the predicted values selecting “Samples” in X axis and “Predicted” in Y axis (18) and press Plot

button (19).

Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA)

If discriminant analysis is used, provide the classes as numbers in dependent variables (Y data, 2);

e.g.: Class A = 1, Class B = 2, Class C = 3, etc.

Enable “Classes (For DA)” (20).

Perform leave-one-out cross validation as previously described.

Perform calibration as previously described.

Perform external validation using test set as previously described (optional).

Predict the classes for new samples as previously described above in “Predicting y for new samples”.

Additional features

Outliers samples can be checked (Figure 11C). After performing the calibration (14), select “Sample

leverages” in X axis and “Studentized residuals” in Y axis (15) and press Plot button (16).

If multiple y is used, a property (22) should be selected before each plot.

Data plotted on the graphs can be copied and used out of Chemoface; by using Copy plot data buttons

(23).

Additional columns and rows can be inserted on the tables using + buttons (24).

Columns and rows can be deleted using menu Edit > Delete…

Data set can be transposed using menu Edit > Transpose.

Measured x Predicted values for cross-validation, calibration and test sets can be plotted in a single

graph (Figure 11D) using menu Multiplot > Measured x Predicted. Scores for PLS-DA or PCR-DA can

be plotted similarly.

Data set can be saved using menu File > Save data. Later, this file can be opened using menu File >

Open data.

Obtained model can be saved using menu File > Save data. Later, this file can be opened using menu

File > Open model and the calibrated model can be used for further predictions.

Page 15: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

15

Data Plot

Figure 12. Plotting data

Inserting data set

Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using

menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in

columns. In the example, the data set is formed by transmittances from 4000 to 600 cm-1 (1).

4

Note: see Data Organization chapter for more details about how to import data set such as spectra, bitmaps,

etc.

Optionally, samples and variables labels (2) can be inserted using menu Edit >… similarly to Figure 6. If

labels are not provided, they are numbered.

Data preprocessing

Select the preprocessing method (3).

Press Apply button (4).

Note: Multiple preprocessing methods (5) can be applied just by selecting them and pressing Apply.

To return to the original data after a preprocessing, select “No preprocess” and press Apply.

1

2

3 4 5

6 7 9

8

11

12

12

Page 16: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

16

Plotting data

Insert the X axis label (6).

Insert the Y axis label (7).

If appropriate, insert the Y axis label for the preprocessed graph (8).

If appropriate, set reverse X axis (9).

To plot no-preprocessed/preprocessed data (Figure 13), check “Plot data without preprocessing” (10).

Press Pot data button (11).

Figure 13. No preprocessed and preprocessed spectra

Additional features

Additional columns and rows can be inserted on the tables using + buttons (12).

Columns and rows can be deleted using menu Edit > Delete…

Data set can be transposed using menu Edit > Transpose.

Data set can be saved using menu File > Save data. Later, this file can be opened using menu File >

Open data.

Page 17: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

17

Data Organization

Data Organization module allows importing numerical data from .txt, .dat, .csv files, and images in .bmp.

Multiple files, such as spectra files, can be imported simultaneously. The process of importing images (.bmp)

is based on converting them in a three-way array containing the RGB values for each pixel. Then the values

of R, G, and B are summed to each pixel, resulting in a two-way array (matrix). Finally, this matrix is unfolded

to generate a vector. This is particularly useful to import molecular figures to be used as descriptors in MIA-

QSAR models.8 At the following example, files are MIR spectra for adulterated coffee.

4

Figure 14. Importing data set

Importing files (only .txt, .dat, .csv, .bmp)

Menu File > Import data.

Select the file type (only .txt, .dat, .csv or .bmp) (Figure 15A).

Select the files to be imported (use Shift or Ctrl keys to select multiple files) (Figure 15B)

Press Open.

Note:

Rows from respective file names are showed as sample labels (1).

If data set is too large, a screen wills pop-up asking to show data set on the table. (Figure 15C). If

“No” is selected, the files are imported and a message is showed (2) (Figure 15D).

1

Page 18: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

18

A

B

C

D Figure 15. Selecting files to be imported

2

Page 19: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

19

Exporting the dataset

To use the obtained dataset into Chemoface, use menu File > Copy data set. Then, paste it in the

Chemoface tables using menu File > Paste… of the used module, as previously described in “Inserting

data”. Data set also can be pasted in other software with support to spreadsheet.

Data set can also be saved as .txt (ASCII with space as column separator) using menu File > Save

data. This saved file cannot be opened using the Chemoface modules using menu File > Open…, but

the numerical data can be copied and pasted using menu File > Paste… in the Chemoface modules, as

previously described.

Additional features

Specific, odd, even or null standard deviation columns and rows can be deleted using menu Edit >

Delete…

Data set can be transposed using menu Edit > Transpose.

Columns and rows can be numbered using menu Edit > Numbered…

Page 20: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

20

Figure handling

Saving figure with high resolution

Access menu File > Export setup at the figure window.

At the setup window (Figure 16A), access Rendering (1).

Set the color to be saved, e.g., RGB color, gray scale, etc (2).

Set the resolution to be saved, e.g., 600 dpi (3).

Press Export button (4).

Set the file name (5) (Figure 16B)

Select the file type, e.g., .tif (6).

Press Save button (7)

Note: If .fig (MATLAB figure) is selected in file type, the file can be opened only by using menu File > Open

Figure in the Chemoface home screen or by Matlab software. Figures (.fig) can be edited (e.g., font size,

font color) only on the Matlab software.

A

B

Figure 16. Exporting figure

1 2

3 4

5 7

6

Page 21: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

21

Tools on the figure window

*To move legend, click with left mouse button and drag.

Save figure (low resolution)

Print figure

Zoon

Move

graph Rotate 3D graph

Read data Click over graph to read values

Enable/disable color bar

Enable/disable

Legend*

Page 22: Chemoface - ufla.br · Chemoface User guide Cleiton A. Nunes Universidade Federal de Lavras Lavras – MG – Brasil Jul/2014

22

References

1. NUNES, C. A.; FREITAS, M. P.; PINHEIRO, A. C. M.; BASTOS, S. C. Chemoface: a novel free user-

friendly interface for Chemometrics. Journal of the Brazilian Chemical Society, 23, 2003-2010, 2012.

2. MATLAB. The MathWorks, Inc.: Natick, MA, USA.

3. NUNES, C. A.; GUERREIRO, M. C. Characterization of Brazilian green propolis throughout the seasons

by Headspace-GC/MS and ESI-MS. Journal of the Science of Food and Agriculture, 92, 433-438, 2012.

4. TAVARES, K. M.; PEREIRA, R. G. F.; PINHEIRO, A. C. M.; NUNES, C. A.; GUERREIRO, M. C.;

RODARTE, M. P. Mid-infrared spectroscopy and sensory analysis applied to detection of adulteration in

roasted coffee by addition of coffee husks. Química Nova, 35, 1164-1168, 2012.

5. DE GROOT, P. J.; POSTMA, G. J.; MELSSEN, W. J.; BUYDENS, L. M. C. Selecting a representative

training set for the classification of demolition waste using remote NIR sensing. Analytica Chimica Acta,

392, 67-75, 1999.

6. KIRALJ R.; FERREIRA M. M. C. Basic validation procedures for regression models in QSAR and QSPR

studies: theory and application. Journal of the Brazilian Chemical Society, 20, 770-787, 2009.

7. ROY, P. P.; PAUL, S.; MITRA, I.; ROY, K. On Two Novel Parameters for Validation of Predictive QSAR

Models. Molecules, 14, 1660-1701, 2009.

8. NUNES, C. A.; FREITAS, M. P. Introducing new dimensions in MIA-QSAR: a case for chemokine receptor

inhibitors. European Journal of Medicinal Chemistry, 62, 297-300, 2013.