View
18
Download
1
Category
Tags:
Preview:
DESCRIPTION
Modelling procedures for directed network of data blocks. Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen. Data structures : Directed network of data blocks Input data blocks Output data blocks Intermediate data blocks Methods - PowerPoint PPT Presentation
Citation preview
1
Modelling procedures for directed network of data blocks
Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen
Data structures:
Directed network of data blocksInput data blocksOutput data blocksIntermediate data blocks
Methods
Optimization procedures for each passage through the networkBalanced optimization of fit and prediction (H-principle) Scores, loadings, loading weights, regression coefficients for each data blockMethods of regression analysis applicable at each data blockEvaluation procedures at each data blockGraphic procedures at each data block
2
Chemometric methods1. Regression estimation,
X, Y. Traditional presentation: Yest=XB, and standard deviations for B.Latent structure:X=TP’ + X0. X0 not used.Y=TQ’+Y0. Y0 not explained.
2. Fit and precision. Both fit and precision are controlled.
3. Selection of score vectorsAs large as possibledescribe Y as well as possiblemodelling stops, when no more found (cross-validation)
4. Graphic analysis of latent structureScore and loading plotsPlot of weight (and loading weight) vectors
Chemometric methods
3
5. Covariance as measure of relationship X’Y for scaled data measures strength X1’Y=0, implies that X1 is remmoved from analysis
6. Causal analysis T=XR From score plots we can infer about the original measurement values Control charts for score values can be related to contribution charts
7. Analysis of X Most time of analysis is devoted to understand the structure of X. Plots are marked by symbols to better identify points in scor or loading plots.
8. Model validation. Cross-validation is used to validate the results Bootstrapping (re-sampling from data) used to establish confidence intervals
Chemometric methods
4
9. Different methods Different types of data/situations may require different type of method One is looking for interpretations of the latent structure found
10. Theory generation Results from analysis are used to establish views/theories on the data Results motivate further analysis (groupings, non-linearity etc)
5
Partitioning data, 1
X1 X2 XL Y1 Y2
Z1
Z2
Z3
Measurement data Responsedata
Reference data
6
Partitioning data, 2
-There is often a natural sub-division of data.
- It is often required to study the role of a sub-block
- Data block with few variables may ’disappear’ among one with many variables, e.g. Optical instruments often give many variables.
Instrumental data Response data
X YX1 X2 X3 Y1 Y2
engineering
chemicalprocess
quality
chemical results
7
Path diagram 1
X1
X2
X3
X4
X5
X6 X7
Examples:
Production processOrganisational dataDiagram for sub-processesCausal diagram
8
Path diagram 2, schematic application of modelling
X1
X2
X3
X4
X5
X6 X7
x10
x20
x30
x10 is a new sample from X1,x20 is a new one from X2,x30 is a new one from X3,
how do they generate new samples for X4, X5, X6 and X7?
Resulting estimating equations
X4,est=X1B14+X2B24+X3B34
X5,est=X1B15+X2B25+X3B35
X6,est=X4B46+X5B56
X7,est=X6B67
9
Path diagram 3
X1
X2
X3
X4
X5
X6 X7
Time t1
Time t2
Data blocks can be aligned to time.Modelling can start at time t2.
10
Notation and schematic illustrations
X Y
Instrumental data Response dataw
tq
u
w: weight vector (to be found)t: score vector, t = Xw =w1x1 + ... + wKxK
q: loading vector, q =YTt = [ (y1Tt), ... , (yM
Tt) ]u: Y-score vector, u=Yq = q1 y1 + ... + qM yM
Vectors are collected into matrices, e.g., T=(t1, ... , tA)
Adjustments:
XX – t pT/(tTt)
YY – t qT/(tTt)
11
Conjugate vectors 1
X
w
tp
r: t=Xw, p=XTt. paTrb=0 for ab.
X
r
tq
r
r: t=Xq, qaTrb=0 for ab.
Xt
p
r
r and s:
t=Xw, p=XTv,
paTrb=0, ta
Tsb=0 for ab.
ws v
12
Conjugate vectors 2
The conjugate vectors R=(r1, r2, ..., rA) satisfy: T=XR.
Latent structure solution:
X = T PT + X0, where X0 is the part of X that is not usedY = T QT + Y0, where Y0 is the part of Y that could not be explained
Y = T QT + Y0= X (R QT) + Y0= X B + Y0, for B= R QT
The conjugate vectors are always computed together with the score vectors.
When regression on score vectors has been computed, the regression on the original variables is computed as shown.
13
Optimization procedure, 1
Two data blocks: X1 X2
w1
t1q2
|q2|2 max
One data block: X1
w1
t1
|t1|2 max
14
Three data blocks
Ztz
qz
Start
|qz|2 max
X1
t1
X2
q2
w
Xt
Yty
qy
w
X3
t3
X4
t4q4
X basis Y estimated Y basis Z estimated
Adjustments:t1 describes X1: X1X1-t1p1
T/(t1Tt1), p1=X1
Tt1.
t1 describes X2: X2X2-t1q2T/(t1
Tt1), q2=X2Tt1.
q2 describes X3: X3X3-t3q2T/(q2
Tq2), t3=X3q2.
t3 describes X4: X4X4-t3q4T/(t3
Tt3), q4=X4Tt3.
15
Optimization procedure, 2
Two input and two output data blocks:
X2
X1X3
X4
w1
t1w2
t2
q13
q23
q14q24
Find w1 and w2:
|q13+q23+q14+q24|2 max
Two input, one intermediate and one output data blocks:
X2
X1
X3 X4
w1
t1w2
t2
q13q23
q134q234
Find w1 and w2:
|q134+q234|2 max
16
Balanced optimization of fit and prediction (H-principle) X Y
Linear regressionIn linear regression we are looking for a weight vector w, so that the resulting score vector t=Xw is good!
The basic measure of quality is the prediction variance for a sample, x0. Assuming negligible bias it can be written (assuming standard assumptions)
F(w) = Var(y(x0)) = k[1 – (yTt)2/(tTt)][1 + t02/(tTt)].
It can be shown that F(cw)=F(w) for all c>0. Choose c such that (tTt)=1. Then
F(w) = k[1 – (yTt)2][1 + t02].
In order to get a prediction variance as small as possible, it is natural to choose w such that (yTt)2 becomes as large as possible,
maximize (yTt)2 = maximize |q|2 (PLS regression)
17
Optimization procedure, 3
Weighing along objects (rows) (same algorithm, but using the transposes):
X1
X2
v1
p1
t2
Task: find weight vector v1:maximize |t2|2
X1
X2
v1
p1
t2
Task: find weight vector v1:maximize |q3|2
X3
q3
18
Optimization procedure, 4
X1
X2
p1
t2
Task: find weight vector w1:maximize |q3|2,
where
X3
q3
w1
t1
q3=X3Tt2
=X3TX2p1
=X3TX2X1
Tt1
=X3TX2X1
TX1w1
Regression equations
X3,est=X2B23
X2,est=B12X1
X1,est=X1B11
If p1 is a good weight vector for X2, a good result may be expected.
Pre-processing may be needed to find variables in X1 and in X2 that are highly correlated to each other.
19
Three types of reports
Reports:
How a data block is doing in a network
How a data block can be described bydata blocks that lead to it.
How a data block can be described byone data block that leads to it.
Xi
Xi-1
Xi-2
Xi
Xi-3
Xi
Xi
Xi-2
20
Production data, 1
X2 YX1
X1: Process parameters, 8 variables
X2: NIR data, 1560 variables (reduced to 120)
No |X2|2 |Y|2 |X|2 |Y|2 1 78,961 51,483 74,969 51,9642 91,538 67,559 86,786 69,5533 96,351 76,291 91,627 80,6434 97,942 81,383 95,373 85,0585 98,620 83,900 95,919 89,0566 98,967 85,705 97,054 90,0507 99,205 87,917 97,508 91,9908 99,294 90,472 97,990 93,4559 99,349 92,183 98,667 94,02010 99,426 92,947 98,896 94,70811 99,606 93,084 99,103 95,08212 99,657 93,376 99,202 95,740
X1 ’disappears’ inthe NIR data X2.
21
Production data, 2
At each step:
X1
X2
w1
t1w2
t2
Y
Results for X2, process parameters:5 score vectors explain 11.92% of Y.
Results for X1, NIR data:12 score vectorsexplain 84.141% of Y.
No Step |Y|2
1 1 4,957
2 2 9,315
3 5 10,393
4 6 10,929
5 8 11,920
No Step |Y|2
1 1 51,483
2 2 69,121
3 3 73,070
4 4 76,506
5 5 78,669
6 6 80,923
7 7 82,129
8 8 82,552
9 9 83,132
10 10 83,590
11 11 83,881
12 12 84,141
Total 96.06%=11.920%+84.14% is explained of Y.
At each step the score vectors are evaluated. Non-significant ones are excluded.
22
Production data, 3
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Plot of estimated versus observed quality variable using only score vectors for process parameters.X2
X1
Y75.12%
96.06%
R2-values:
87.75%
The process parameters contribute marginally by 11.92%. But if only they were used, they would explain 75.12% of the variation of Y.
R2=0.7512
23
Directed network of data blocks
...
... ...
Input blocks Intermediate blocks Output blocks
Give weight vectors for initial score vectors
Are described by previous blocks and give score vectors for succeeding blocks
Are described by previous blocks
24
Magnitudes computed between two data blocks
Xi
Xk
Ti: Score vectorsQi: Loading vectorsBi: Regression coefficients
Measures of precision
Measures of fit
Etc
Different views:a) As a part of a pathb) If the results are viewed
marginallyc) If only XiXk
...
25
Stages in batch processes
Y
Time
Batches
Stages
XkX2X1
1 2 K Final quality
Paths: X1 X2 ... XK Y Given a sample x10, the path modelgives estimated samples for later blocks
[X1 X2 X3] X4 Y Given values of (x10 x20 x30), estimatesfor values of x4 and y are given.
[X1 X2 X3] [X4 X5] Y Given values of (x10 x20 x30), estimatesfor values of (x4 x5) and y are given.
26
Schematic illlustration of the modelling task for sequential processes
Stages
X1
Initial conditions
Known process parameters
X2 X3
Next stage
X4
Later stages
Now
Y
27
Plots of score vectors
X1
t1
X2
t2
XL
tL
X1 X1 – X2
t1
t2 X1 – XL
t1
tL
The plots will show how the changes are relative to the first data block.
28
Graphic software to specify paths
X4
X5
XL
...
X1
X2
X3
Blocks are dragged into the screen. Relationships specified.
29
Pre-processing of data
• Centring. If desired centring of data is carried out
• Scaling. In the computations all variables are scaled to unit length (or unit standard deviation if centred). It is checked if scaling disturbs the variable, e.g. if it is constant except for two values, or if the variable is at the noise level. When analysis has been completed, values are scaled back so that units are in original values.
• Redundant variable. It is investigated if a variable does not contribute to the explanation of any of the variables that the presnt block lead to. If it is redundant, it iseliminated from analysis.
• Redundant data block. It is investigated if a data block can provide with a significant description of the block that it is connected to later in the network. If it can not contribute to the description of the blocks, it is removed from the network.
30
Post-processing of results
Score vectors computed in the passages through the network are evaluated in the analysis at one passage. Apart from the input blocks the score vectors found between passages are not independent. The score vectors found in a relationship XiXj are evaluated to see if all are significant or some should be removed for this relationship.
Cross-validation like in standard regression methods
Confidence intervals for parmeters by resampling technique
31
International workshop on
Multi-block and Path Methods
24. – 30. May 2009, Mijas, Malaga, Spain
Recommended