Upload
makayla-gavaghan
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Non-linear Principal Manifolds a Useful Tool in Bioinformatics and Medical Applications. Andrei Zinovyev Institute des Hautes Etudes Scientifique, France. Plan of the talk. Object of study Definition of principal manifold (PM) Constructing PMs: elastic maps - PowerPoint PPT Presentation
Citation preview
Non-linear Principal Manifoldsa Useful Tool in Bioinformatics and Medical Applications
Andrei ZinovyevInstitute des Hautes Etudes
Scientifique,France
Plan of the talk
Object of study Definition of principal manifold
(PM) Constructing PMs: elastic maps Examples of biomedical
applications
Principal manifoldsElastic maps framework
SVM
Principal manifolds
Regression,approximation
Supervisedclassification
K-means
SOM
Clustering
Multidim.scaling
VisualizationPCA
Factor analysis
LLE ISOMAP
Non-linearData-miningmethods
Finite set of objects in RN
X i
i=1..m
IRIS database
Petal heght
Petal width
Sepal width
Sepal height
SPECIES
4.9 3 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.3 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
7 3.2 4.7 1.4 Iris-versicolor
6.4 3.2 4.5 1.5 Iris-versicolor
6.9 3.1 4.9 1.5 Iris-versicolor
6.3 3.3 6 2.5 Iris-virginica
5.8 2.7 X 1.9 Iris-virginica
7.1 3 5.9 2.1 Iris-virginica
6.3 2.9 5.6 1.8 Iris-virginica
Mean point
m
iiX
mX
1
1
min1
2
m
ii XX
K-meansclustering
min1
2
m
ii YclosestX
Principal “Object”
,
min1
2
m
i
Principal Component Analysis
,
Max
imal
disp
ersio
n
1st Principalaxis
2nd principalaxis
Principal manifold
What do we want?
Non-linear surface (1D, 2D, 3D …) Smooth and not twisted The data model is unknown Speed (time linear with Nm) Uniqueness
Fast way to project datapoints
Metaphor of elasticity
Datapoints
Graphnodes
U(Y)U(E), U(R)
Constructing elastic nets
y E (0) E (1) R (1) R (0) R (2)
Definition of elastic energy
)()()( REY UUUU
2)(
1
)(
)()(
1 ijp
i Kx
Y yXN
Uij
2)()(
1
)( )0()1( iis
ii
E EEU
r
i
iiii
R RRRU1
2)()()()( )0(2)2()1(.
E (0) E (1)
R (1) R (0) R (2)
y
Xj
00 , ii
Elastic manifold
Global minimum and softening
0, 0 103
0, 0 102
0, 0 101
0, 0 10-1
Adaptive algorithms
Growing net
Adaptive net
Refining net:
Idea of scaling:
Projection onto the manifold
Closest node of the net
Closest point of the manifold
Colorings: visualize any function
Density visualization
Example: different topologies
RN
R2
VIDAExpert tool and elmap C++ package
Regression and principal manifolds
regression principal component
x
F(x)
min2 ii Pxx min)(
2 ii xFx
Data
Gen.curve
Grid
Image skeletonization or clustering around curves
Approximation of molecular surfaces
Application: economical data
Gross output
Density
ProfitGrowth temp
Medical table1700 patients with infarctus myocarde
Lethal casesPatients map, density
Medical table1700 patients with infarctus myocarde
128 indicators
Age Numberof infarctusin anamnesis
Stenocardia functionalclass
Codon usage in all genes of one genome
Escherichia coli Bacillus subtilis
Majority of genes
Highly expressed genes
“Foreign” genes
“Hydrophobic” genes
Golub’s leukemia dataset3051 genes, 38 samples (ALL/B-cell,ALL/T-cell,AML)
ALL sample AML sample
Map of genes: vote for ALL vote for AML used by T.Golub used by W.Lie
Golub’s leukemia datasetmap of samples: AML ALL/B-cell ALL/T-cell
density
Cystatin C Retinoblastomabinding protein P48
CA2 Carbonic anhydrase II
X-linked Helicase II
Thank you for your attention!
Questions?