A New, Nonparametric Information- Splitting Image Analysis Technique Mark Inlow Jing Wan, Sungeun...

Preview:

Citation preview

A New, Nonparametric Information-Splitting Image Analysis Technique

Mark InlowJing Wan, Sungeun Kim, Kwansik Nho,

Shannon Risacher, Andrew Saykin, Li Shen

Life as a Statistics Professor…

Image Analysis Setup

• Data: image value at location for subject .• Question: Does the image mean depend on

predictor at any location ?• Methods:

1. Parametric: Random Field Theory• Con: Assumptions

2. Nonparametric: Permutation• Con: Slow

3. New Approach: weaker assumptions, faster?

Theoretical Basis

• One-sample case: test vs. for at least one location .

• Theorem 1 (New Result):– Let be the t-test statistic for location .– Let – If is then and are independent under .

• Note: is an increasing function of .

Information Splitting

Suppose we have a continuous predictor:

1. Partition the sample into subsamples2. Let be t-stat for , subsample . 3. Define , 4. If large, .5. Compute and ; apply Theorem 1

One Monotonic Recipe

1. else 2. Let = average of for smallest 1% of ; Let = average of for next smallest 1% of ; … Let = average of for largest 1% of .3. Fit model .4. Test using permutation.5. If normal, use permutation t-test.

Hippocampus Surface Normal Data

• = value of normal at left hippocampus at location for subject j

• = value of normal for right hippocampus• n = 582 subjects; k = 6611 locations• Let (assume bilateral symmetry)• Is there a relationship between (or ) and a

given SNP at one or more locations?

SA vs. P for (LR Hippo Sum)APOE BIN1

New Approach vs. RFT Results

Hippo Data

SNP New Approach

RFT Peak Amplitude

Left, APOELeft, BIN1LR Sum, APOELR Sum, BIN1

Permutation Distribution Normality

APOE BIN1• 10

SurfStat APOE T-Map for LR Sum

SurfStat BIN1 T-map for LR Sum

Comments

• Information splitting: info at location shared by and which are independent under .

• Performance/properties: seem favorable compared to RFT and permutation methods

• Going forward:– Incorporate spatial information!–Apply to larger images–Do formal simulation studies

Acknowledgements

1. Andrew Saykin, Li Shen, and the Department of Radiology and Imaging Sciences, IU School of Medicine, who supported and financed my 2010-2011 sabbatical.

2. My main coauthor: Jing Wan, who did the SurfStat statistical analyses and data management.

3. My other coauthors/colleagues: Sungeun Kim, Kwansik Nho, and Shannon Risacher.

Hippocampus Surface Data

• FreeSurfer and Large Deformation Diffeomorphic Metric Mapping (FS+LDDMM) were used to segment hippocampal surfaces from MRI scans

• To remove size effect, total intracranial volume (ICV) was adjusted to a constant and each hippocampus was scaled accordingly.

• Rigid body transformation was applied to register each hippocampus to a template.

• 6611 Surface signals were extracted as the deformation along the surface normal direction of the template and were adjusted for baseline age, gender, education and handedness.

Genetic (SNP) Data

• Single Nucleotide Polymorphism (SNP) – DNA sequence location possessing nucleotide variants of length one, i.e., T vs. C or A vs. G.

• The SNP data were genotyped using the Human 610-Quad BeadChip.

• Top 23 SNPs from AlzGene database and a SNP from the TOMM40 gene were considered.

• After quality controls, 20 SNPs remained.

Random Field Theory

• Suppose we want to test the global composite null Ho: for all for a given SNP.

• By the Bonferroni inequality:

• Gaussian Random Field Theory (RFT) provides much less conservative estimate:

where the sum is over the number of dimensions of the image (K. Worsley)

Random Field Theory, Cont.:

• RFT p-value for maximum statistic

• is the number of -dimensionalresels (resolution elements); it depends on smoothness (correlation) of image, e.g.

• is the -dimensional Euler Characteristic density. For large values of Euler C. is 0 or 1 depending if for any

Random Field Theory Varieties

• Maximum Test Statistic: P-value =

• Spatial Extent of Suprathreshold ’s:P-value =

where is the number of connected suprathreshold ’s; is observed numberexceeding threshold .

• Cluster Maximum and Spatial Extent

Left Spherical Distribution TheoryTheorem: Let be a matrix of -dimensional observations which is multivariate normal Let be a -dimensional vector of weights determined uniquely by .• let .• Let .• Let .• Then has a distribution.

Comparison of Maps

Information-Splitting: Statistical Parametric Map:

Materials• 582 non-Hispanic Caucasian participants 166 healthy controls (HCs), 287 mild cognitive impairment

(MCI), and 129 AD

• Magnetic resonance imaging (MRI) data• 20 SNPs were selected from the AlzGene database and

TOMM40 gene and coded to test additive genetic effect (i.e. dose dependent effect of the minor allele).

Recommended