93
Bioinformatics Bioinformatics Other data reduction techniques Other data reduction techniques Kristel Van Steen, PhD, ScD Kristel Van Steen, PhD, ScD ([email protected]) ([email protected]) Université de Liege - Institut Montefiore Université de Liege - Institut Montefiore 2008-2009 2008-2009

Acknowledgements

  • Upload
    oki

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatics Other data reduction techniques Kristel Van Steen, PhD, ScD ([email protected]) Université de Liege - Institut Montefiore 2008-2009. Acknowledgements. Material based on: work from Pradeep Mummidi class notes from Christine Steinhoff. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Acknowledgements

BioinformaticsBioinformaticsOther data reduction techniquesOther data reduction techniques

Kristel Van Steen, PhD, ScDKristel Van Steen, PhD, ScD

([email protected])([email protected])

Université de Liege - Institut MontefioreUniversité de Liege - Institut Montefiore

2008-20092008-2009

Page 2: Acknowledgements

AcknowledgementsAcknowledgements

Material based on: Material based on:

work from Pradeep Mummidiwork from Pradeep Mummidi

class notes from class notes from Christine SteinhoffChristine Steinhoff

Page 3: Acknowledgements

Outline

Intuition behind PCA Theory behind PCA Applications of PCA Extensions of PCA

Multidimensional scaling MDS (not to be confused with MDR)

Page 4: Acknowledgements

Intuition behind PCAIntuition behind PCA

Page 5: Acknowledgements

IntroductionIntroduction

Most of the scientific or industrial data is Most of the scientific or industrial data is Multivariate data (huge size of data)Multivariate data (huge size of data)

Is all the data useful?Is all the data useful?

If not, how do we quickly extract useful If not, how do we quickly extract useful information only?information only?

Page 6: Acknowledgements
Page 7: Acknowledgements

ProblemProblem

When we use traditional techniques,When we use traditional techniques, 1. Not easy to extract useful information from the 1. Not easy to extract useful information from the

multivariate datamultivariate data

1) Many bivariate plots are needed1) Many bivariate plots are needed

2) Bivariate plots, however, mainly represent 2) Bivariate plots, however, mainly represent correlations between variables (not samples).correlations between variables (not samples).

Page 8: Acknowledgements
Page 9: Acknowledgements

Visualization ProblemVisualization Problem

Not easy to visualize multivariate dataNot easy to visualize multivariate data - 1D: dot- 1D: dot

- 2D: Bivariate plot (i.e. X-Y plane)- 2D: Bivariate plot (i.e. X-Y plane)

- 3D: X-Y-Z plot - 3D: X-Y-Z plot

- 4D: ternary plot with a color code /Tetrahedron- 5D, - 4D: ternary plot with a color code /Tetrahedron- 5D, 6D, etc. : ???6D, etc. : ???

Page 10: Acknowledgements

Visualization?????Visualization?????

As the number of variables increases, data space becomes harder to visualizeAs the number of variables increases, data space becomes harder to visualize

Page 11: Acknowledgements
Page 12: Acknowledgements

Basics of PCABasics of PCA

PCA is useful when we need to extract useful PCA is useful when we need to extract useful information from multivariate data sets.information from multivariate data sets.

This technique is based on the reduced This technique is based on the reduced dimensionality.dimensionality.

Therefore, trends in multivariate data are easily Therefore, trends in multivariate data are easily visualized.visualized.

Page 13: Acknowledgements

Variable Reduction Variable Reduction ProcedureProcedure

Principal component analysis is a variable reduction Principal component analysis is a variable reduction procedure. It is useful when you have obtained data on a procedure. It is useful when you have obtained data on a number of variables (possibly a large number of variables), number of variables (possibly a large number of variables), and believe that there is some redundancy in those variablesand believe that there is some redundancy in those variables

Redundancy means that some of the variables are correlated Redundancy means that some of the variables are correlated with one another, possibly because they are measuring the with one another, possibly because they are measuring the same construct.same construct.

Because of this redundancy, you believe that it should be Because of this redundancy, you believe that it should be

possible to reduce the observed variables into a smaller possible to reduce the observed variables into a smaller number of principal components (artificial variables) that will number of principal components (artificial variables) that will account for most of the variance in the observed variables.account for most of the variance in the observed variables.

Page 14: Acknowledgements

What is Principal What is Principal ComponentComponent

A A principal component principal component can be defined as a linear can be defined as a linear combination of optimally-weighted observed variables.combination of optimally-weighted observed variables.

Based on how subject scores on a principal component Based on how subject scores on a principal component are are

computed.computed.

Page 15: Acknowledgements

7 Item measure of Job 7 Item measure of Job SatisfactionSatisfaction

Page 16: Acknowledgements
Page 17: Acknowledgements

General FormulaGeneral Formula

Below is the general form for the formula to compute Below is the general form for the formula to compute scores on the first component extracted (created) in a scores on the first component extracted (created) in a principal component analysis:principal component analysis:

C1 = b 11(X1) + b12(X 2) + ... b1p(Xp)C1 = b 11(X1) + b12(X 2) + ... b1p(Xp) wherewhere

C1 = the subject’s score on principal component 1 (the first C1 = the subject’s score on principal component 1 (the first component extracted)component extracted)

b1p = the regression coefficient (or weight) for observed b1p = the regression coefficient (or weight) for observed variable p, as used invariable p, as used in

creating principal component 1creating principal component 1 Xp = the subject’s score on observed variable p.Xp = the subject’s score on observed variable p.

Page 18: Acknowledgements

For example, assume that component 1 in the present For example, assume that component 1 in the present study was the “satisfaction with supervision” study was the “satisfaction with supervision” component. You could determine each subject’s score component. You could determine each subject’s score on principal component 1 byon principal component 1 by

using the following fictitious formula:using the following fictitious formula:

C1 = .44 (X1) + .40 (X2) + .47 (X3) + .32 (X4)+ .02 C1 = .44 (X1) + .40 (X2) + .47 (X3) + .32 (X4)+ .02 (X5) + .01 (X6) + .03 (X7)(X5) + .01 (X6) + .03 (X7)

Page 19: Acknowledgements

Obviously, a different equation, with different Obviously, a different equation, with different regression weights, would be used to compute subject regression weights, would be used to compute subject scores on component 2 (the satisfaction with pay scores on component 2 (the satisfaction with pay component). Below is a fictitious illustration of this component). Below is a fictitious illustration of this formula:formula:

C2 = .01 (X1) + .04 (X2) + .02 (X3) + .02 (X4)+ .48 C2 = .01 (X1) + .04 (X2) + .02 (X3) + .02 (X4)+ .48 (X5) + .31 (X6) + .39 (X7)(X5) + .31 (X6) + .39 (X7)

Page 20: Acknowledgements

Number of components Number of components ExtractedExtracted

If a principal component analysis were performed on If a principal component analysis were performed on data from the 7-item job satisfaction questionnaire, data from the 7-item job satisfaction questionnaire, only two components was created. However, such an only two components was created. However, such an impression would not be entirely correct.impression would not be entirely correct.

In reality, the number of components extracted in a In reality, the number of components extracted in a principal component analysis is equal to the number principal component analysis is equal to the number of observed variables being analyzed. of observed variables being analyzed.

However, in most analyses, only the first few However, in most analyses, only the first few components account for meaningful amounts of components account for meaningful amounts of variance, so only these first few components are variance, so only these first few components are retained, interpreted, and used in subsequent retained, interpreted, and used in subsequent analyses (such as in multiple regression analyses).analyses (such as in multiple regression analyses).

Page 21: Acknowledgements

Characteristics of principal Characteristics of principal componentscomponents

The first component extracted in a principal component The first component extracted in a principal component analysis accounts for a maximal amount of total variance analysis accounts for a maximal amount of total variance in the observed variables.in the observed variables.

Under typical conditions, this means that the first Under typical conditions, this means that the first component will be correlated with at least some of the component will be correlated with at least some of the observed variables. It may be correlated with many.observed variables. It may be correlated with many.

The second component extracted will have two The second component extracted will have two important characteristics. First, this component will important characteristics. First, this component will account for a maximal amount of variance in the data account for a maximal amount of variance in the data set that was not accounted for by the first component. set that was not accounted for by the first component.

Page 22: Acknowledgements

Under typical conditions, this means that the second Under typical conditions, this means that the second component will be correlated with some of the observed component will be correlated with some of the observed variables that did not display strong correlations with variables that did not display strong correlations with component 1.component 1.

The second characteristic of the second component is The second characteristic of the second component is that it will be that it will be uncorrelated uncorrelated with the first component. with the first component. Literally, if you were to compute the correlation between Literally, if you were to compute the correlation between components 1 and 2, that correlation would be zero.components 1 and 2, that correlation would be zero.

The remaining components that are extracted in the The remaining components that are extracted in the analysis display the same two characteristics: each analysis display the same two characteristics: each component accounts for a maximal amount of variance component accounts for a maximal amount of variance in the observed variables that was not accounted for by in the observed variables that was not accounted for by the preceding components, and is uncorrelated with all the preceding components, and is uncorrelated with all of the preceding components.of the preceding components.

Page 23: Acknowledgements

GeneralizationGeneralization

A principal component analysis proceeds in this A principal component analysis proceeds in this

fashion, with each new component accounting for fashion, with each new component accounting for progressively smaller and smaller amounts of variance progressively smaller and smaller amounts of variance (this is why only the first few components are usually (this is why only the first few components are usually retained and interpreted).retained and interpreted).

When the analysis is complete, the resulting When the analysis is complete, the resulting components will display varying degrees of correlation components will display varying degrees of correlation with the observed variables, but are completely with the observed variables, but are completely uncorrelated with one another.uncorrelated with one another.

Page 24: Acknowledgements
Page 25: Acknowledgements
Page 26: Acknowledgements
Page 28: Acknowledgements

Theory behind PCATheory behind PCA

Page 29: Acknowledgements

Theory behind PCATheory behind PCALinear Algebra

Page 30: Acknowledgements

OUTLINEOUTLINE

What do we need from „linear algebra“ for understanding What do we need from „linear algebra“ for understanding

principal component analysis ?principal component analysis ?

•Standard deviation, Variance, CovarianceStandard deviation, Variance, Covariance

•The Covariance matrixThe Covariance matrix

•Symmetric matrix and orthogonalitySymmetric matrix and orthogonality

•Eigenvalues and EigenvectorsEigenvalues and Eigenvectors

•PropertiesProperties

Page 31: Acknowledgements

Motivation Motivation

Page 32: Acknowledgements

Motivation Motivation

Protein1Protein1

Pro

tein

2Pro

tein

2

Proteins 1 and 2 measured for 200 patientsProteins 1 and 2 measured for 200 patients

Page 33: Acknowledgements

MotivationMotivation

GenesGenes

11

22,00022,000

Patients 1 200Patients 1 200

Microarray ExperimentMicroarray Experiment

? Visualize ?? Visualize ?

? Which genes are important ?? Which genes are important ?

? For which subgroup of patients ?? For which subgroup of patients ?

Page 34: Acknowledgements

MotivationMotivation

Patients 1 10Patients 1 10

GenesGenes

11

200200

Page 35: Acknowledgements

Basics for Principal Component AnalysisBasics for Principal Component Analysis

•Orthogonal/OrthonormalOrthogonal/Orthonormal

•Some Theorems...Some Theorems...

•Standard deviation, Variance, CovarianceStandard deviation, Variance, Covariance

•The Covariance matrixThe Covariance matrix

•Eigenvalues and EigenvectorsEigenvalues and Eigenvectors

Page 36: Acknowledgements

Standard DeviationStandard Deviation

The average distance from the mean of the data set to a pointThe average distance from the mean of the data set to a point

MEAN:MEAN:

Example:Example:

Measurement 1: 0,8,12,20Measurement 1: 0,8,12,20

Measurement 2: 8,9,11,12Measurement 2: 8,9,11,12

M1M1 M2M2

Mean 10Mean 10 Mean 10Mean 10

SD 8.33SD 8.33 SD 1.83SD 1.83

Page 37: Acknowledgements

VarianceVariance

Example:Example:

Measurement 1: 0,8,12,20Measurement 1: 0,8,12,20

Measurement 2: 8,9,11,12Measurement 2: 8,9,11,12

M1M1 M2M2

Mean 10Mean 10 Mean 10Mean 10

SD 8.33SD 8.33 SD 1.83SD 1.83

Var 69.33Var 69.33 Var 3.33Var 3.33

Page 38: Acknowledgements

CovarianceCovariance

Standard Deviation and Variance are 1-dimensionalStandard Deviation and Variance are 1-dimensional

How much do the dimensions vary from the mean with respect to each other ?How much do the dimensions vary from the mean with respect to each other ?

Covariance measures between 2 dimensionsCovariance measures between 2 dimensions

We easily see, if X=Y we end up with varianceWe easily see, if X=Y we end up with variance

Page 39: Acknowledgements

Covariance MatrixCovariance Matrix

Let XLet X be a random vector. be a random vector.

Then the covariance matrix of XThen the covariance matrix of X,, denoted by Cov(X) denoted by Cov(X),, is is

The diagonals of Cov(X) The diagonals of Cov(X) are are ..

In matrix notation, In matrix notation,

The covariance matrix is The covariance matrix is symmetricsymmetric

Page 40: Acknowledgements

Symmetric MatrixSymmetric Matrix

Let be a square matrix of size nxn. The matrix A is symmetric, ifLet be a square matrix of size nxn. The matrix A is symmetric, if

for all for all

Page 41: Acknowledgements

Orthogonality/OrthonormalityOrthogonality/Orthonormality

0.5 1.0 1.50.5 1.0 1.5

1.51.5

11

0.50.5

<v1,v2> = <(1 0),(0 1)><v1,v2> = <(1 0),(0 1)>

= 0= 0

Unit vectors which are orthogonal are said to be orthonormal. Unit vectors which are orthogonal are said to be orthonormal.

Two vectors v1 and v2 for which <v1,v2>=0 holds are said to be orthogonal Two vectors v1 and v2 for which <v1,v2>=0 holds are said to be orthogonal

Page 42: Acknowledgements

Eigenvalues/EigenvectorsEigenvalues/Eigenvectors

Let A be an nxn square matrix and x an nx1 column vector. Then a (right) Let A be an nxn square matrix and x an nx1 column vector. Then a (right)

eigenvector of A is a nonzero vector x such that:eigenvector of A is a nonzero vector x such that:

For some scalarFor some scalar

EigenvalueEigenvalue EigenvectorEigenvector

Procedure: Procedure:

Finding the eigenvaluesFinding the eigenvalues

=0=0 Finding lambdas Finding lambdas

Finding corresponding eigenvectors Finding corresponding eigenvectors

R: R: eigen(matrix)eigen(matrix)

Matlab: Matlab: eig(matrix)eig(matrix)

Page 43: Acknowledgements

Some RemarksSome Remarks

If If AA and and BB are matrices whose sizes are such that the given operations are are matrices whose sizes are such that the given operations are

defined and defined and cc is any scalar then, is any scalar then,

( )

( )

( )

( )

t t

t t t

t t

t t t

A A

A B A B

cA cA

AB B A

Page 44: Acknowledgements

Now,…Now,…

We have enough definitions to go into the procedure how toWe have enough definitions to go into the procedure how to

perform Principal Component Analysisperform Principal Component Analysis

Page 45: Acknowledgements

Theory behind PCATheory behind PCALinear algebra applied

Page 46: Acknowledgements

OUTLINEOUTLINE

What is principal component analysis good for?What is principal component analysis good for?

Principal Component Analysis: PCAPrincipal Component Analysis: PCA

•The basic Idea of Principal Component AnalysisThe basic Idea of Principal Component Analysis

•The idea of transformationThe idea of transformation

•How to get there ? The mathematics partHow to get there ? The mathematics part

•Some remarksSome remarks

•Basic algorithmic procedureBasic algorithmic procedure

Page 47: Acknowledgements

Idea of PCAIdea of PCA

•Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of

multivariate data in terms of a set of uncorrelated variablesmultivariate data in terms of a set of uncorrelated variables

•We typically have a data matrix of We typically have a data matrix of nn observations on observations on pp correlated variables correlated variables x1,x2,…xpx1,x2,…xp

•PCA looks for a transformation of the PCA looks for a transformation of the xi xi into into pp new variables new variables yiyi that are uncorrelated that are uncorrelated

Page 48: Acknowledgements

IdeaIdea

GenesGenes

x1x1

xpxp

Patients 1 Patients 1

nn

Dimension highDimension high

So how can we reduce the dimension ? So how can we reduce the dimension ?

Simplest way: take the first one, two, three;Simplest way: take the first one, two, three;

Plot and discard the rest:Plot and discard the rest:

Obviously a very bad idea.Obviously a very bad idea.

Matrix: XMatrix: X

Page 49: Acknowledgements

TransformationTransformation

We want to find a transformation that involves ALL columns, not only the first We want to find a transformation that involves ALL columns, not only the first

onesones

So find a new basis, order it such that in the first component lies almost ALL So find a new basis, order it such that in the first component lies almost ALL

information of the whole datasetinformation of the whole datasetLooking for a transformation of the data matrix Looking for a transformation of the data matrix XX ( (ppxxnn) such that) such that

Y= Y= TT XX==1 X1+ 1 X1+ 2 X2+..+ 2 X2+..+ p Xpp Xp

Page 50: Acknowledgements

TransformationTransformation

Maximize the variance of the projection of the observations on the Y variables !Maximize the variance of the projection of the observations on the Y variables !

Find Find such thatsuch that

Var(Var(T T

XX) is maximal) is maximal

The matrix The matrix C=Var(X)C=Var(X) is the covariance matrix of the is the covariance matrix of the Xi Xi variablesvariables

What is a reasonable choice for the What is a reasonable choice for the ? ?

Remember: Remember: We wanted a transformation that maximizes „information“ We wanted a transformation that maximizes „information“

That means: captures „Variance in the data“That means: captures „Variance in the data“

Page 51: Acknowledgements

TransformationTransformation

Can we intuitively see that in a picture?Can we intuitively see that in a picture?

GoodGood BetterBetter

Page 52: Acknowledgements

TransformationTransformation

PC1PC1PC2PC2

OrthogonalityOrthogonality

Page 53: Acknowledgements

How do we get there?How do we get there?

GenesGenes

x1x1

xpxp

Patients 1 Patients 1

nn

X is a real valued pxn matrixX is a real valued pxn matrix

Cov(X) is a real value pxp matrix or nxn matrixCov(X) is a real value pxp matrix or nxn matrix

-> decide whether you want to analyse patient groups-> decide whether you want to analyse patient groups

Or do you want to analyse gene groups?Or do you want to analyse gene groups?

Page 54: Acknowledgements

)(..........

........)(

........)(

21

2221

1211

ppp

p

p

xv),xc(x),xc(x

),xc(xxv),xc(x

),xc(x),xc(xxv

Cov(X)=Cov(X)=

Lets decide for genes:Lets decide for genes:

How do we get there?How do we get there?

Page 55: Acknowledgements

How do we get thereHow do we get there

Some Features on Cov(X)Some Features on Cov(X)

•Cov(X) is a symmetric pxp matrixCov(X) is a symmetric pxp matrix

•The diagonal terms of Cov(X) are the variance genes across patientsThe diagonal terms of Cov(X) are the variance genes across patients

•The off-diagonal terms of Cov(X) are the covariance between gene vectorsThe off-diagonal terms of Cov(X) are the covariance between gene vectors

•Cov(X) captures the correlations between all possible pairs of measurementsCov(X) captures the correlations between all possible pairs of measurements

•In the diagonal terms, by assumption, large values correspond to interesting dynamicsIn the diagonal terms, by assumption, large values correspond to interesting dynamics

•In the off diagonal terms large values correspond to high redundancyIn the off diagonal terms large values correspond to high redundancy

Page 56: Acknowledgements

How do we get there?How do we get there?

The principal Components of X are the Eigenvectors of Cov(X)The principal Components of X are the Eigenvectors of Cov(X)

Assume, we can „manipulate“ X a bit: Lets call this YAssume, we can „manipulate“ X a bit: Lets call this Y

Y should be manipulated in a way that it is a bit more optimal than X wasY should be manipulated in a way that it is a bit more optimal than X was

What does optimal mean?What does optimal mean?

That means: That means:

VarVarVarVar

VarVar

CovCov

LARGE!LARGE!

SMALL!SMALL!

In other words: should be diagonal and large values on the diagonalIn other words: should be diagonal and large values on the diagonal

Page 57: Acknowledgements

How do we get there?How do we get there?

The manipulation is a change of the basis with orthonormal vectors The manipulation is a change of the basis with orthonormal vectors

And they are ordered in a way that the most important comes first (principal) ...And they are ordered in a way that the most important comes first (principal) ...

How do we put this in mathematical terms?How do we put this in mathematical terms?Find orthonormal P such that Find orthonormal P such that

Y = P X Y = P X With Cov(Y) diagonalizedWith Cov(Y) diagonalized

Then the rows of P are the principal components of XThen the rows of P are the principal components of X

Page 58: Acknowledgements

How do we get there?How do we get there?

1( ) ( )( )

1tCov Y PX PX

n

1

1t tPXX P

n

1( )

1t tP XX P

n

1

1tPAP

n

Y PX Cov(Y) = 1/(n-1) YY Cov(Y) = 1/(n-1) YY tt

A:=XX A:=XX tt

Page 59: Acknowledgements

How do we get there?How do we get there?

tA EDE

: tP E

tA P DP

A is symmetricA is symmetric

Therefore there is a matrix E of eigenvectors and a diagonal matrix D such that:Therefore there is a matrix E of eigenvectors and a diagonal matrix D such that:

Now define P to be the transpose of the matrix E of eigenvectorsNow define P to be the transpose of the matrix E of eigenvectors

Then we can write A:Then we can write A:

Page 60: Acknowledgements

How do we get there?How do we get there?

Now we can go back to our Covariance Expression:Now we can go back to our Covariance Expression:

1

1tPAP

n

Cov(Y)Cov(Y)

1( ) ( )

1t tCov Y P P DP P

n

1( ) ( )1

t tPP D PPn

Page 61: Acknowledgements

How do we get there?How do we get there?

1 tP P

1 11( ) ( )1PP D PP

n

1

1D

n

The inverse of an orthogonal matrix is its transpose (due to its definition): The inverse of an orthogonal matrix is its transpose (due to its definition):

In our context that means:In our context that means:

Cov(Y)Cov(Y)

Page 62: Acknowledgements

How do we get there?How do we get there?

P diagonalizes Cov(Y)P diagonalizes Cov(Y)

Where P is the transpose of the matrix of Eigenvectors of XX Where P is the transpose of the matrix of Eigenvectors of XX tt

The principal components of X are the eigenvectors of XX The principal components of X are the eigenvectors of XX tt

(thats the same as the rows of P)(thats the same as the rows of P)

The ith diagonal value of Cov(Y) is the variance of X along pi (=along the ith principal)The ith diagonal value of Cov(Y) is the variance of X along pi (=along the ith principal)

Essentially we need to compute Essentially we need to compute

EIGENVALUES and EIGENVECTORSEIGENVALUES and EIGENVECTORS

Explained varianceExplained variance Principal componentsPrincipal components

Of the covariance matrix of the original matrix XOf the covariance matrix of the original matrix X

Page 63: Acknowledgements

Some RemarksSome Remarks

•If you multiply one variable by a scalar you get different results If you multiply one variable by a scalar you get different results

•This is because it uses covariance matrix (and not correlation)This is because it uses covariance matrix (and not correlation)

•PCA should be applied on data that have approximately the same scale in each variablePCA should be applied on data that have approximately the same scale in each variable

•The relative variance explained by each PC is given by The relative variance explained by each PC is given by eigenvalue/sum(eigenvalues)eigenvalue/sum(eigenvalues)

• When to stop? For example: Enough PCs to have a cumulative variance explained by the PCs that is >50-When to stop? For example: Enough PCs to have a cumulative variance explained by the PCs that is >50-

70%70%

•Kaiser criterion: keep PCs with eigenvalues >1Kaiser criterion: keep PCs with eigenvalues >1

Page 64: Acknowledgements

Some RemarksSome Remarks

Page 65: Acknowledgements

Some RemarksSome Remarks

If variables have very heterogenous variances we standardize them If variables have very heterogenous variances we standardize them

The standardized variables Xi* The standardized variables Xi*

Xi*= (Xi-mean)/Xi*= (Xi-mean)/variancevariance

The new variables all have the same variance, so each variable have the same weight.The new variables all have the same variance, so each variable have the same weight.

Page 66: Acknowledgements

REMARKSREMARKS

•PCA is useful for finding new, more informative, uncorrelated features; it reduces dimensionality by PCA is useful for finding new, more informative, uncorrelated features; it reduces dimensionality by

rejecting low variance featuresrejecting low variance features

•PCA is only powerful if the biological question is related to the highest variance in the datasetPCA is only powerful if the biological question is related to the highest variance in the dataset

Page 67: Acknowledgements

AlgorithmAlgorithm

Data = (Data.old – mean ) /sqrt(variance)Data = (Data.old – mean ) /sqrt(variance)

Cov(data) = 1/(N-1) Data*tr(Data)Cov(data) = 1/(N-1) Data*tr(Data)

Find Eigenvector/Eigenvalue (Function in R and matlab: eig) and sortFind Eigenvector/Eigenvalue (Function in R and matlab: eig) and sort

Eigenvectors: VEigenvectors: V

Eigenvalues: PEigenvalues: P

Project the original data: P * dataProject the original data: P * data

Plot as many components as necessaryPlot as many components as necessary

Page 68: Acknowledgements

Applications of PCAApplications of PCA

Page 69: Acknowledgements

ApplicationsApplications

Include:Include:

Image ProcessingImage Processing Micro array ExperimentsMicro array Experiments Pattern RecognitionPattern Recognition

Page 70: Acknowledgements

OUTLINEOUTLINE

Principal component analysis in bioinformaticsPrincipal component analysis in bioinformatics

Page 71: Acknowledgements

OUTLINEOUTLINE

Principal component analysis in bioinformaticsPrincipal component analysis in bioinformatics

Page 72: Acknowledgements

Example 1Example 1

Page 73: Acknowledgements

Lefkovits et al.Lefkovits et al.

SpotsSpots

x1x1

xpxp

Clones 1 nClones 1 n

X is a real valued pxn matrixX is a real valued pxn matrix

They want to analyse relatedness of clonesThey want to analyse relatedness of clones

Cov(X) is a real value nxn matrixCov(X) is a real value nxn matrix

They take Correlation matrix (which is on the top the division by They take Correlation matrix (which is on the top the division by

the standard deviations)the standard deviations)

Page 74: Acknowledgements

Lefkovits et al.Lefkovits et al.

Page 75: Acknowledgements

Example 2Example 2

Page 76: Acknowledgements

Yang et al.Yang et al.

Page 77: Acknowledgements

Yang et al.Yang et al.

BaboBabotkvtkv

ControlControl

Page 78: Acknowledgements

Ulloa-Montoya et al.Ulloa-Montoya et al.

Multipotent Multipotent

Adult progenitor cellsAdult progenitor cells

Pluripotent Pluripotent

Embryonic stem cellsEmbryonic stem cells

Mesenchymal Mesenchymal

stem cellsstem cells

Page 79: Acknowledgements

Ulloa-Montoya et al.Ulloa-Montoya et al.

Page 80: Acknowledgements

Yang et al.Yang et al.

But:But:

We only see the different experimentsWe only see the different experiments

If we do it the other way round – that means analysing for the genes not for the experiments we see grouping of If we do it the other way round – that means analysing for the genes not for the experiments we see grouping of

genesgenes

But we never see both together. But we never see both together.

So, can we relate somehow the experiments and the genes? So, can we relate somehow the experiments and the genes?

That means group genes whose expression might be explained by the the respective experimental group (tkv, That means group genes whose expression might be explained by the the respective experimental group (tkv,

babo, control)?babo, control)?

This goes into „correspondence analysis“This goes into „correspondence analysis“

Page 81: Acknowledgements

Extensions of PCAExtensions of PCA

Page 82: Acknowledgements

Difficult exampleDifficult example

Page 83: Acknowledgements

Non-linear PCANon-linear PCA

Page 84: Acknowledgements

Kernel PCAKernel PCA

(http://research.microsoft.com/users/Cambridge/nicolasl/papers/eigen_dimred.pdf)(http://research.microsoft.com/users/Cambridge/nicolasl/papers/eigen_dimred.pdf)

Page 85: Acknowledgements

PCA in feature spacePCA in feature space

Page 86: Acknowledgements

PCA in feature spacePCA in feature space

Page 87: Acknowledgements

PCA in feature spacePCA in feature space

Page 88: Acknowledgements

PCA in feature spacePCA in feature space

Page 89: Acknowledgements

Side remarkSide remark

Page 90: Acknowledgements

Summary of kernel PCASummary of kernel PCA

Page 91: Acknowledgements

Multidimensional Scaling Multidimensional Scaling (MDS)(MDS)

Page 92: Acknowledgements

Common stress functionsCommon stress functions

Page 93: Acknowledgements