Upload
vuonganh
View
230
Download
2
Embed Size (px)
Citation preview
1
Multivariate Data Analysis (MVDA)•Any investigation of a real process or system is based on datameasurements•The data collected in science, technology and almost everywhere elseare multivariate; with multiple variables measured on multiple samplesor at multiple time points cannot be analyzed by simple graphs•More sophisticated, computer based methods such as PrincipleCompenent Analysis are required for a multivariate data set•New information and facts can be seen using MVDA•PCA is one of the most popular MVDA technique
Convert data tables to plots
2
with 104 observations, K = 34 X variables,and M = 1 Y‐variable
X3
X1
X2
X Y104 104
K=34 M=1
observations
factors response
Provides an overview of therelationships among all X‐variables and Y variables at the same time.
PLS
3
Projection methods (PCA and PLS)
X X Y
Overview andSummary
• PCA, PrincipalComponents Analysis
Relation between blocksof variables, X & Y• PLS analysis• PLS‐DA
• PCA models the correlationstructure of a data set
Partial Least Squares Projections to Latent Structures (PLS)‐Relating X to Y
Principal Component Analysis (PCA)Overview of data tables
• PLS find relationships betweensets of multivariate data X and Y
PLS differences to PCA*Projection of X that *Projection of X thatis an optimal both
approximation of X approximates X well, (least squares fit) and correlates well with Y
4
Case Study 2Gini (1999) used a back propagation neural network to predict the
carcinogenicity of aromatic Nitrogen compounds.34 Descriptors 104 Molecules
Carcinogenicity
Y
Y=f(x)
Molecular Descriptors
X
QSAR
Gini. G. Et al. (1999). Predictive carcinogenicity: A model for aromatic compounds with nitrogen-containing substituents, based on moleculardescriptors using an artifical neural network. J.of Che. Inf. Com. Sci., 1076-1080
5
Variables close to each other correlateObservations close to each other are similar.