1 UNC, Stat OR U. C. Davis, F. R. G. Workshop Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North

Embed Size (px)

DESCRIPTION

3 UNC, Stat & OR Object Oriented Data Analysis, II Examples: Medical Image Analysis Images as Data Objects? Shape Representations as Objects Micro-arrays Just multivariate analysis?

Citation preview

1 UNC, Stat & OR U. C. Davis, F. R. G. Workshop Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina February 21, 2016 2 UNC, Stat & OR Object Oriented Data Analysis, I What is the atom of a statistical analysis? 1 st Course: Numbers Multivariate Analysis Course : Vectors Functional Data Analysis: Curves More generally: Data Objects 3 UNC, Stat & OR Object Oriented Data Analysis, II Examples: Medical Image Analysis Images as Data Objects? Shape Representations as Objects Micro-arrays Just multivariate analysis? 4 UNC, Stat & OR Object Oriented Data Analysis, III Typical Goals: Understanding population variation Principal Component Analysis + Discrimination (a.k.a. Classification) Time Series of Data Objects 5 UNC, Stat & OR Object Oriented Data Analysis, IV Major Statistical Challenge, I: High Dimension Low Sample Size (HDLSS) Dimension d >> sample size n Multivariate Analysis nearly useless Cant normalize the data Land of Opportunity for Statisticians Need for creative statisticians 6 UNC, Stat & OR Object Oriented Data Analysis, V Major Statistical Challenge, II: Data may live in non-Euclidean space Lie Group / Symmetric Spaces Trees/Graphs as data objects Interesting Issues: What is the mean (popn center)? How do we quantify popn variation? 7 UNC, Stat & OR Statistics in Image Analysis, I First Generation Problems: Denoising Segmentation Registration (all about single images) 8 UNC, Stat & OR Statistics in Image Analysis, II Second Generation Problems: Populations of Images Understanding Population Variation Discrimination (a.k.a. Classification) Complex Data Structures (& Spaces) HDLSS Statistics 9 UNC, Stat & OR HDLSS Statistics in Imaging Why HDLSS (High Dim, Low Sample Size)? Complex 3-d Objects Hard to Represent Often need d = 100s of parameters Complex 3-d Objects Costly to Segment Often have n = 10s cases 10 UNC, Stat & OR Object Representation Landmarks (hard to find) Boundary Repns (no correspondence) Medial representations Find skeleton Discretize as atoms called M-reps 11 UNC, Stat & OR 3-d m-reps Bladder Prostate Rectum (multiple objects, J. Y. Jeong) Medial Atoms provide skeleton Implied Boundary from spokes surface 12 UNC, Stat & OR Illuminating Viewpoint Object Space Feature Space Focus here on collection of data objects Here conceptualize population structure via point clouds 13 UNC, Stat & OR PCA for m-reps, I Major issue: m-reps live in (locations, radius and angles) E.g. average of: = ??? Natural Data Structure is: Lie Groups ~ Symmetric spaces (smooth, curved manifolds) 14 UNC, Stat & OR PCA for m-reps, II PCA on non-Euclidean spaces? (i.e. on Lie Groups / Symmetric Spaces) T. Fletcher: Principal Geodesic Analysis Idea: replace linear summary of data With geodesic summary of data 15 UNC, Stat & OR PGA for m-reps, Bladder-Prostate-Rectum Bladder Prostate Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong) 16 UNC, Stat & OR PGA for m-reps, Bladder-Prostate-Rectum Bladder Prostate Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong) 17 UNC, Stat & OR PGA for m-reps, Bladder-Prostate-Rectum Bladder Prostate Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong) 18 UNC, Stat & OR HDLSS Classification (i.e. Discrimination) Background: Two Class (Binary) version: Using training data from Class +1, and from Class -1 Develop a rule for assigning new data to a Class Canonical Example: Disease Diagnosis New Patients are Healthy or Ill Determined based on measurements 19 UNC, Stat & OR HDLSS Classification (Cont.) Ineffective Methods: Fisher Linear Discrimination Gaussian Likelihood Ratio Less Useful Methods: Nearest Neighbors Neural Nets (black boxes, no directions or intuition) 20 UNC, Stat & OR HDLSS Classification (Cont.) Currently Fashionable Methods: Support Vector Machines Trees Based Approaches New High Tech Method Distance Weighted Discrimination (DWD) Specially designed for HDLSS data Avoids data piling problem of SVM Solves more suitable optimization problem 21 UNC, Stat & OR HDLSS Classification (Cont.) Currently Fashionable Methods: Trees Based Approaches Support Vector Machines: 22 UNC, Stat & OR HDLSS Classification (Cont.) Comparison of Linear Methods (toy data): Optimal Direction Excellent, but need dirn in dim = 50 Maximal Data Piling (J. Y. Ahn, D. Pea) Great separation, but generalizability??? Support Vector Machine More separation, genity, but some data piling? Distance Weighted Discrimination Avoids data piling, good genity, Gaussians? 23 UNC, Stat & OR Distance Weighted Discrimination Maximal Data Piling 24 UNC, Stat & OR Maximal Data Piling Mind boggling? J. Y. Ahn has characterized Formula ~ FLD There are many Publishable??? 25 UNC, Stat & OR Distance Weighted Discrimination Based on Optimization Problem: More precisely work in appropriate penalty for violations Optimization Method (Michael Todd): Second Order Cone Programming Still Convex gention of quadratic proging Fast greedy solution Can use existing software 26 UNC, Stat & OR Simulation Comparison E.G. Above Gaussians: Wide array of dims SVM Substly worse MD Bayes Optimal DWD close to MD 27 UNC, Stat & OR Simulation Comparison E.G. Outlier Mixture: Disaster for MD SVM & DWD much more solid Dirns are robust SVM & DWD similar 28 UNC, Stat & OR Simulation Comparison E.G. Wobble Mixture: Disaster for MD SVM less good DWD slightly better Note: All methods come together for larger d ??? 29 UNC, Stat & OR DWD in Face Recognition, I Face Images as Data (with M. Benito & D. Pea) Registered using landmarks Male Female Difference? Discrimination Rule? 30 UNC, Stat & OR DWD in Face Recognition, II DWD Direction Good separation Images make sense Garbage at ends? (extrapolation effects?) 31 UNC, Stat & OR DWD in Face Recognition, III Interesting summary: Jump between means (in DWD direction) Clear separation of Maleness vs. Femaleness 32 UNC, Stat & OR DWD in Face Recognition, IV Fun Comparison: Jump between means (in SVM direction) Also distinguishes Maleness vs. Femaleness But not as well as DWD 33 UNC, Stat & OR DWD in Face Recognition, V Analysis of difference: Project onto normals SVM has small gap (feels noise artifacts?) DWD more informative (feels real structure?) 34 UNC, Stat & OR DWD in Face Recognition, VI Current Work: Focus on drivers: (regions of interest) Relation to Discrn? Which is best? Lessons for human perception? 35 UNC, Stat & OR DWD & Microarrays for Gene Expression Skip due to time pressure Some have seen this DWD provides excellent tool for: Combining Data Sets (caBIG funded) Visualization of HDLSS data HDLSS hypothesis testing Lets talk informally if you are interested 36 UNC, Stat & OR Discrimination for m-reps Classification for Lie Groups Symm. Spaces S. K. Sen & S. Joshi What is separating plane (for SVM-DWD)? 37 UNC, Stat & OR Trees as Data Points, I Brain Blood Vessel Trees - E. Bullit & H. Wang Statistical Understanding of Population? Mean? PCA? Challenge: Very Non-Euclidean 38 UNC, Stat & OR Trees as Data Points, II Mean of Tree Population: Frecht Approach PCA on Trees (based on tree lines) Theory in Place - Implementation? 39 UNC, Stat & OR HDLSS Asymptotics: Simple Paradoxes, I For dimal Standard Normal distn: Euclidean Distance to Origin (as ): - Data lie roughly on surface of sphere of radius - Yet origin is point of highest density??? - Paradox resolved by: density w. r. t. Lebesgue Measure 40 UNC, Stat & OR HDLSS Asymptotics: Simple Paradoxes, II For dimal Standard Normal distn: indep. of Euclidean Dist. between and (as ): Distance tends to non-random constant: Can extend to Where do they all go??? (we can only perceive 3 dimns) 41 UNC, Stat & OR HDLSS Asymptotics: Simple Paradoxes, III For dimal Standard Normal distn: indep. of High dimal Angles (as ): - -Everything is orthogonal??? - Where do they all go??? (again our perceptual limitations) - Again 1st order structure is non-random 42 UNC, Stat & OR HDLSS Asys: Geometrical Representation, I Assume, let Study Subspace Generated by Data a. Hyperplane through 0, of dimension b. Points are nearly equidistant to 0, & dist c. Within plane, can rotate towards Unit Simplex d. All Gaussian data sets arenear Unit Simplex Vertices!!! Randomness appears only in rotation of simplex With P. Hall & A. Neemon 43 UNC, Stat & OR HDLSS Asys: Geometrical Representation, II Assume, let Study Hyperplane Generated by Data a. dimensional hyperplane b. Points are pairwise equidistant, dist c. Points lie at vertices of regular hedron d. Again randomness in data is only in rotation e. Surprisingly rigid structure in data? 44 UNC, Stat & OR HDLSS Asys: Geometrical Representation, III Simulation View: shows rigidity after rotation 45 UNC, Stat & OR HDLSS Asys: Geometrical Representation, III Straightforward Generalizations: non-Gaussian data: only need moments non-independent: use mixing conditions Mild Eigenvalue condition on Theoretical Cov. (with J. Ahn, K. Muller & Y. Chi) All based on simple Laws of Large Numbers 46 UNC, Stat & OR HDLSS Asys: Geometrical Representation, IV Explanation of Observed (Simulation) Behavior: everything similar for very high d 2 popns are 2 simplices (i.e. regular n-hedrons) All are same distance from the other class i.e. everything is a support vector i.e. all sensible directions show data piling so sensible methods are all nearly the same Including 1 - NN 47 UNC, Stat & OR HDLSS Asys: Geometrical Representation, V Further Consequences of Geometric Representation 1. Inefficiency of DWD for uneven sample size (motivates weighted version, work in progress) 2. DWD more stable than SVM (based on deeper limiting distributions) (reflects intuitive idea feeling sampling variation) (something like mean vs. median) 3. 1-NN rule inefficiency is quantified. 48 UNC, Stat & OR The Future of Geometrical Representation? HDLSS version of optimality results? Contiguity approach? Params depend on d? Rates of Convergence? Improvements of DWD? (e.g. other functions of distance than inverse) It is still early days 49 UNC, Stat & OR Some Carry Away Lessons Atoms of the Analysis: Object Oriented HDLSS contexts deserve further study DWD is attractive for HDLSS classification Randomness in HDLSS data is only in rotations (Modulo rotation, have context simplex shape) How to put HDLSS asymptotics to work?