Testing in unsupervised learning (ICA, really), with ... · Testing the mixing matrix Testing...

IntroductionConsistency over datasetsTesting the mixing matrix

Testing independent componentsConclusion

Testing in unsupervised learning (ICA, really),with applications to brain imaging

Aapo Hyvarinen

Dept of Computer Science & Dept of Mathematics and StatisticsUniversity of Helsinki, Finland

withPavan Ramkumar (Aalto University, Finland)

Aapo Hyvarinen Testing in unsupervised learning (ICA, really), with applications to brain imaging

AbstractIndependent component analysisImportance of testing

Abstract

I Theory of independent component analysis (ICA) almostexclusively about estimation

I Here, we propose fundamental testing methods

I Which independent components are reliable/significant?

I Test can be about the mixing matrix or the component values

I We propose a null hypothesis based on the idea of intersubjectconsistency

Independent component analysis

I ICA is widely used for analyzing neuroimaging dataI One of the main methods in resting-state analysisI Finds easily resting-state networks in fMRI (Beckmann et al

2005, Van de Ven 2005), recently similar results in MEG(Hyvarinen et al 2010, Brookes et al 2011)

I Decomposes data matrix X into amixing matrix A and componentmatrix S

X = AS

I Maximizing independence ornon-gaussianity of the rows of S.

(Beckmann et al, 2005)Aapo Hyvarinen Testing in unsupervised learning (ICA, really), with applications to brain imaging

Importance of testing independent components

I How do we know that an estimated component is not just arandom effect?

I ICA algorithms give a fixed number of components and do nottell which ones are reliable (statistically significant)

I Algorithmic artifacts also possible (local minima)

I In general, any estimation method should be complementedby a testing method

I Previously, testing zeros in the mixing matrix was prosed byShimizu et al. (2006), but often zeros are not priviledged.

Testing using intersubject or intersession consistency

I We propose the following approach:I We assume we have a number of similar datasets availableI Do ICA separately on each of themI A component is significant if it appears in two or more

datasets in a sufficiently similar form

I Different datasets can come from different subjects, orsessions.

I Similarity could be about components in S or columns ofmixing matrix A

I Key question: How to quantify the case of completerandomness, i.e. null hypothesis

Null hypothesisDefining similarities and significancesClusteringCorrections for multiple testingSimulationsResults on MEG data

Testing the mixing matrix: Null hypothesis

I ICA is a rotation of whitened data X: after whitening, we have

X = US (1)

I Assume all the subjects/sessions can be whitened using thesame matrix.

I Under null hypothesis, spatial patterns of different subjectsare “completely random” rotations in the PCA subspace(uniformly distributed in the set of orthogonal matrices).

I This models both the actual randomness in the data(differences in brain anatomy) and errors in ICA estimation.

Definition and significance of similarities

I Consider columns of the mixing matrix pjl of components jand dataset l .

I Compute similarities of spatial patterns using Mahalanobismetric

γij ,kl =|pT

ikMpjl |√pTikMpik

√pTjl Mpjl

with M is (stabilized) inverse of covariance matrix of pI Under null hypothesis, marginal distribution of γ can be

obtained in closed form: e.g.

t =γ√d − 1√

1− γ2(3)

follows a Student’s t-distribution with d − 1 DOF.Aapo Hyvarinen Testing in unsupervised learning (ICA, really), with applications to brain imaging

Clustering

I Once significances have beencomputed, use them in clustering

I Prune connections (similarities)which are not significant

I No more than one component persubject

I Similar to hierarchical clustering⇒ Single-linkage vs.complete-linkage strategies

Subject 1 Subject 2 Subject 3

Corrections for multiple testing

I We are testing over many connections, so false positive rateshave to be corrected

I We use two different correctionsI For initial creation of cluster: Bonferroni correction

I Probability of having any false positive clusters < α.I We don’t want to have any false positive clusters

I For adding more components to cluster: false discovery rateI Percentage of false positive components < α.I A few false positive components is not too serious, and we

don’t want to be too conservative.I Can be computed by Simes’ procedure, or using a simple

formula:αcorr =

number of subjects(4)

Simulations on artificial data

1 2 3 4 50

Data scenario

n=20,r=6n=20,r=20n=50,r=6n=50,r=20

2 3 4 50

Data scenario

n=20,r=6n=20,r=20n=50,r=6n=50,r=20

False positive rates and false discovery rates for simulated data.The desired rates were set to 5%.

Testing ICs: results

beep spch v/hf v/bd tact2

z−sc

Modulation by stimulation

10 15 20 25 30

Fourier spectrum

Frequency (Hz)

beep spch v/hf v/bd tact2

z−sc

Modulation by stimulation

10 15 20 25 30

Fourier spectrum

Frequency (Hz)

11 subjects, PCA dimension 64, α = 0.05, 43 clusters foundAapo Hyvarinen Testing in unsupervised learning (ICA, really), with applications to brain imaging

Spatial ICAEmpirical approach to null distributionResults on fMRI dataComputational complexity

Testing independent components themselves

I Spatial ICA: scans at different time points are linear sums of“source images”

= a11 +a12 ... +a1n

I Almost always used in fMRI

I Can also be useful with MEG (Ramkumar et al, 2011)I The independent components (rows of S) are similar over

datasets in X = ASAapo Hyvarinen Testing in unsupervised learning (ICA, really), with applications to brain imaging

Empirical approach to null distribution

I Same basic approach:I ICA separately on multiple datasets k , l (subjects/sessions)I Compute similarities γ = SkST

lI Null hypothesis: random orthogonal rotation

I But: Independent components contain a lot of noise⇒ Similarities necessarily small

I Null distribution modelled empirically

I For random orthogonal matrices, we have

γ2 ∼ Beta(1/2, β) (5)

where β equals the dimension of the space.

I Here, we estimate β by fitting to the empirical distribution

Resting- state networks on fMRI data (preliminary)

Somatomotor Cerebellar Visual

Default Attention (L) Attention (R)Control ???????

Visual Visual Auditory

I 11 subjects

I PCA dimension 75

I α = 0.001

I 56 clusters found

Computational complexity

I Main difficulty is memory: We need to store similarity matrix

I This can be reduced by storing just the strongest similarityfrom each components

I We can handle 100-200 subjects with 100-200 components ona desktop computer

Computation Memory

8 16 32 64 128 256−10

# subjects = PCA dimension 8 16 32 64 128 256

log2 b

# subjects = PCA dimension

Special bonus slide for Algodan

I The method could be applied in general unsupervised learning

I Assume the features live in a set of finite volume (compact),e.g. the unit sphere

I Then we can define the null hypothesis

I Consider e.g. clustering, where data is normalized to unitsphere

I Any data set can be divided into n subsets and learning canbe performed for each data set

I Maybe you can apply this testing for your own method?

Conclusion

I We introduces methods for testing of which independentcomponents are reliable, i.e. statistically significant

I We can test columns of the mixing matrix, or the values ofthe independent components themselves

I Based on doing ICA separately on many datasets, i.e. differentsubjects or sessions

I Null hypothesis defined as orthogonal rotations in whitenedspace

I Null distribution obtained analytically for mixing matrix caseEmpirical approximation needed for ICs

I Application on MEG and fMRI promising

Testing in unsupervised learning (ICA, really), with ... · Testing the mixing matrix Testing...

Documents

ICA-M220 / ICA-M220W · Wired / Wireless . Mega-Pixel CMOS PT IP Camera . ICA-M220 / ICA-M220W . User’s Manual . Version: 2.00 . 1 Date: April, 2009

Unsupervised machine learning for analysis of EEG and MEG ... · Introduction Issues in ICA of EEG/MEG Testing independent components Causal analysis Discussion Unsupervised machine

NTPBP-INV-ICA-0543 ICA and Telemetry Investment Case · NTPBP-INV-ICA-0543 ICA and Telemetry Investment Case bristolwater.co.uk 5 3 Background To Our Investment Case 3.1 Context ICA

ICA-AtoM 1.1 Training WorkshopICA-AtoM 1.1 Training Workshop. ICA-AtoM 1.1 User Training Page 2. ICA-AtoM 1.1 User Training Page 3 Introduction to ICA-AtoM 1.1 ICA-AtoM, which stands

Schedule of National ICA Documents · Web viewInterstate Certification Assurance (ICA) Scheme – Schedule of National ICA Documents. Interstate Certification Assurance (ICA) Scheme

Applying Unsupervised Learning - MathWorks · Applying Unsupervised Learning3 Unsupervised Learning Techniques As we saw in section 1, most unsupervised learning techniques are a

A Test-Taker's Guide to Technology-Based Testing · For testing that occurs outside of a testing centre - whether supervised or unsupervised - a technical support contact information

Unsupervised Learning. Supervised learning vs. unsupervised learning

Unsupervised Classification

ICA of Text Documents Based on Unsupervised Topic Separation and Keyword Identification in Document Collections: A Projection Approach Ata Kabán and Mark

Lecture 11 - csd.uwo.ca · Lecture 11 Unsupervised Learning EM. Today • New Topic: Unsupervised Learning • supervised vs. unsupervised learning • unsupervised learning • nonparametric

ICA International Diploma in Governance, Risk & …FILE/ey-ica... · ICA International Diploma in Governance, Risk ... designed as an advanced programme for senior ... ICA International

ICA-M220 / ICA-M220W

Unsupervised Learning

Unsupervised icml2012

Unsupervised Learning Learning Unsupervised...Unsupervised Learning and Data Mining Learning Mining Unsupervised Learning and Data Mining Unsupervised Data Clustering Supervised Learning

Wired / Wireless / POE IP Camera ICA-107 / ICA-107W / ICA-107P ICA-M220 / ICA-M220W Quick

1 METTLE: a METamorphic Testing approach to assessing and ... · 1 METTLE: a METamorphic Testing approach to assessing and validating unsupervised machine LEarning systems Xiaoyuan

Unsupervised Feature Extraction by Time-Contrastive Learning … · 2017-06-20 · Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA Aapo Hyvärinen 1;2

Unsupervised Feature Extraction by Time-Contrastive ......Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA Aapo Hyvärinen 1;2 and Hiroshi Morioka 1 Department