Definition and overview of chemometrics. Paul Geladi Head of Research NIRCE Chairperson NIR Nord...

Preview:

Citation preview

Definition and overview of chemometrics

Paul Geladi

Head of Research NIRCEChairperson NIR Nord

Unit of Biomass Technology and ChemistrySwedish University of Agricultural SciencesUmeåTechnobothniaVasa

paul.geladi @ btk.slu.se paul.geladi @ syh.fi

Project geography

Chemometrics

Mathematics

Statistics

Computer Science

In Chemistry

Similar fields

• Biometrics ±1900

• Psychometrics ±1930

• Econometrics ±1950

• Technometrics ±1960

Chemometrics

• Design of Experiments (DOE)

• Exploratory Data Analysis

• Classification

• Regression and Calibration

Design of Experiments

• Most important where possible

• Uses:

• ANOVA

• F-test

• t-test

• Plots

• Response Surfaces

Design of Experiments

y = b0 + b1x1 + b2x2 +...+bKxK + b11x12 +

b22x22 +...+ bKKxK

2 + b12x1x2 +...+

Factors x1, x2,...xK changed systematically

Response y measured and modeled

Exploratory Data Analysis

• Design not possible• Sampling situations• Find structure• Find groupings• Find outliers

Classification

• Check for groupings = UNSUPERVISED• Existing groupings = SUPERVISED• Visualize groupings• Classify• Test

Regression / Calibration

• Two types of variables X / y

• Relationship linear / nonlinear

• Model

• Diagnostics

• Residual

x

y

Multivariate Data Analysis

Multivariate Data Analysis

• Sampled data and design with too many reponses:• Mining• Hospitals• Agriculture• Food industry• More

Nomenclature

• Samples are objects

• What is measured on the object is a variable

34.92 Spectrum

Samples

Vectors

1 K1

I

123.6

11.15.9340.51.417

A vector is a collectionof numbers.

It is always a columnvector.

The transpose of a vector is a row vector.

Symbols for transpose are’ and T. a’ or aT.

12 3.6 11.1 5.9 34 0.5 1.4 17

0 5 10 15 20 250

2

4

6

8

10

12

14

16

18

Particle size, 1 sample

0 5 10 15 20 25 30 35 400

2

4

6

8

10

12

Small particles, 35 samples

The Data Matrix

A data matrix is a vector of vectors

I

K

0 5 10 15 20 250

5

10

15

20

25

30

35

40

Size histograms, all samples

Particle area

0 200 400 600 800 1000 12000

0.5

1

1.5

2

2.5

3

3.5

4

NIR wavelengths

Times in batch reaction

Geometry of multivariate space

Problem

I and K can be large

Correlation

Univariate statistics does not apply

I patients

3 variables: blood oxygen,iron, hemoglobin

O2

Fe

Hb

O2

Fe

Hb

O2

Fe

Hb

O2

Fe

Hb

O2

Fe

Hb

O2

Fe

Hb

O2

Fe

Hb

O2

Fe

Hb

O2

Fe

Hb

Properties of multivariate spaceRotation

vectors unchanged / distance unchanged

Translation

vectors changed / distance unchanged

Rescaling / change units

all changes

Consequences

• We can move the coordinate sytem around

• The relative distances between objects do not change

• We can rotate the coordinate system

• Scale changes are important

• Move coordinate system to center of data

• Scale properly

Vectors (physics)

x = [ x1, x2, x3 ]

|| x || = ( x12 + x2

2 + x32 ) 1/2

Geometry

a

b

cc2 = a2 + b2

Vectors (K dimensions)

x = [ x1, x2,..., xK ]

|| x || = ( x12 + x2

2 +...+ xK2 ) 1/2

Problem

We can not see in more than 3 dimensions

Paper, computer screen: 2-2.5 dimensions

O2

Fe

Hb

O2

Fe

Hb

Projection

2D plane (screen, paper)

Many projections possible

Find a good one

Find a few good ones

What is good?

Recommended