View
4
Download
0
Category
Preview:
Citation preview
Parameter Estimation in AutologisticRegression Models for Detection of Smoke in
Satellite Images
Mark Wolters Shanghai Center for Mathematical SciencesCharmaine Dean Western University
June 15, 2015SSC 2015, Halifax
1
Outline
mwolters@fudan.edu.cn 2/19
1. Introduction
Application; goals
2. Modelling Framework
CRF approach; autologistic regression; challenges
3. Approach to Estimation
Independence model + optimal smoothing
4. Some Results
Simulated data; preliminary findings
5. Discussion
Possible improvements; estimation in a prediction setting
2
1. Introduction
mwolters@fudan.edu.cn 3/19
Earth-orbiting satellites help studylarge-scale environmental phenomena.
Our interest: smoke from forest fires.
Data: MODIS images1 per day, 143 days1.2 Mp eachCentered at Kelowna,BCHand-drawn smoke ar-eas
Goal: classify pixels intosmoke/nonsmoke
Why?Health studiesModel input or valida-tionMonitoring & archiv-ing
3
mwolters@fudan.edu.cn 4/19
Data characteristics
Hyperspectral images.
Spectra at each pixel arecovariates for predictingsmoke.
High-dimensional predictorspace.
Expect spatial association.
35 image planes+
higher order terms
4
2. Modelling Framework
mwolters@fudan.edu.cn 5/19
Joint PMF for an image: autologistic regression (ALR)
Pr(Y = y) ∝ exp
(
(Xβ)T y +λ
2yT Ay
)
y Class labels (n-vector), yi ∈ {L,H}X Model matrix (spectral data, n × p)πi P(pixel i is smoke | neighbour pixels)G = (E ,V) Graph structure for dependence among pixels
(regular 4-connected grid)
linear predictorsare the unary
coefficientsλ = pairwiseassociationparameter
A = adjacencymatrix
5
mwolters@fudan.edu.cn 6/19
Autologistic regression
Notes:
Intractable normalizing constant
ALR is a conditional random field (CRF) model (Lafferty et al., 2001):given X, Y is a Markov random field.
Conditional logit form:
log
(πi
1 − πi
)
= (H − L)
xTi β + λ
∑
j∼i
yj
logistic regression ⇐⇒ λ = 0
Spatial effect is homogeneous, isotropic
For Yi ∈ {0, 1}, the sum∑
yj increases log-odds unless all neigh-bours are zero.
=⇒ Estimates of β, λ are strongly coupled (Caragea and Kaiser, 2009;
Hughes et al., 2011)
6
mwolters@fudan.edu.cn 7/19
Extensions
Centered ALR model (Caragea and Kaiser (2009)) aims to correct for theasymmetry of the pairwise term when Yi ∈ {0, 1}:
logit(πi) = xTi β + λ
∑
j∼i
(yj − μj)
where μj = E[Yj |λ = 0] is the independence expectation.
Claim: just use Yi ∈ {−1, +1} to get the same effect.
Proposal: let λ = λij = λ(xi,xj) for adaptive smoothing.
Then
logit(πi) = 2
xTi β +
∑
j∼i
λ(xi,xj)yj
7
3. Approach to Estimation
mwolters@fudan.edu.cn 8/19
Existing possibilities
1. Ignore spatial association (logistic regression, large n, large p).
2. Pseudolikelihood (PL): L(β, λ) ≈∏
img
n∏
i=1
logit(πi)
3. Monte Carlo ML
4. Bayesian approach
Problems
We have ∼ 108 pixels
We have thousands of predictors, need model selection
We’re still developing models—rapid evaluation of candidates is ben-eficial
}Hughes et al. use perfect sampling; rec-
ommend PL for large n.
8
mwolters@fudan.edu.cn 9/19
Proposal: plug-in estimation
a) Use independence (logistic) to get βIncluding model selectionSample pixels to reduce n to manageable size
b) Choose λ to optimize predictive power
Rationale
Treat λ as a smoothing parameter.
Assuming independence, β captures how information in X can beused to predict Y.
For fixed β, tuning λ will optimally reduce noise in the predictedprobabilities.
9
mwolters@fudan.edu.cn 10/19
Proposed procedure:
model selection
images
training images validation images
test images
training pixels
validation pixels
fitting
performance
evaluation
best independence
model
choose l
to minimize
prediction
error
best ALR
model
final performance
results
logistic regression
autologistic
regression
10
4. Results
mwolters@fudan.edu.cn 11/19
Simulated Images
Predictors: R, G, B
Random ellipses =Class 1 (smoke)
Background = class 2(nonsmoke)
90 images at 3 sizes:100, 200, 400 pixelssquare
11
mwolters@fudan.edu.cn 12/19
A) plug-in vs. PL
pixels method R G B λerror
rate (%)time(min)
1002 plug-in −2.20 −2.00 1.95 0.45 19.2 1.2∗
PL −2.04 −1.99 2.06 0.99 23.3 0.9
2002 plug-in −1.65 −1.36 1.71 0.5 20.8 5.0∗
PL −1.61 −1.30 1.70 1.19 48.2 2.8
4002 plug-in −2.00 −1.41 1.64 0.6 20.8 21.5∗
PL −2.08 −1.40 1.68 1.36 50.5 16.1
∗ time per candidate λ value
12
mwolters@fudan.edu.cn 13/19
Example predicted probabilities
Truth Plug-in PL
13
mwolters@fudan.edu.cn 14/19
B) Effect of coding & centering
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.20
0.30
0.40
0.50
Error rate vs. λ for 200 x 200 images
Pairwise Parameter
Pre
dict
ion
Err
or
Centered (0,1)Standard (0,1)Standard (−1, 1)
logit(πi) = (H − L)
xTi β + λ
∑
j∼i
(yj − μj)
14
mwolters@fudan.edu.cn 15/19
C) Preliminary smoke results
RGB image
Logistic model Autologistic model
ALR model with 50 predictors
1. Smoke-free areas: OK2. Clouds vs. smoke: OK3. Snow vs. smoke: OK4. Spatial smoothing: OK5. Smoke + Cloud: Problem
15
mwolters@fudan.edu.cn 16/19
Preliminary smoke results (continued)
RGB Logistic Autologistic
6. “Thin” smoke: Problem7. Original masks (training data): Problem
16
5. Discussion
mwolters@fudan.edu.cn 17/19
Advantages of working on a data-rich prediction problem
If you get low out-of-sample prediction error, the following are of littleconcern:
The “truth” of your modelThe complexity or statistical efficiency of your modelwhether or not your parameters are statistically significantwhether or not your parameter estimates are stable
Computational feasibility & run time become paramount.
How would things change if we’re interested in interpretation?
Trade predictive power for model simplicity.The “plug-in” estimation approach is no longer helpful.The issue of model centering and coding becomes critical.
17
mwolters@fudan.edu.cn 18/19
Future plans: smoke
Improve the base logistic regression model
Address “true” label ambiguity
Future plans: models
Computational improvements:
low-level codeparallelization
Revisit adaptive smoothing
A beta CRF for direct modelling of probabilities
Multi-class (autobinomial) extension
18
References
mwolters@fudan.edu.cn 19/19
Caragea, P. C. and Kaiser, M. S. (2009), “Autologistic models with interpretableparameters,” Journal of agricultural, biological, and environmental statis-tics, 14, 281–300.
Hughes, J., Haran, M., and Caragea, P. C. (2011), “Autologistic models for bi-nary data on a lattice,” Environmetrics, 22, 857–871.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001), “Conditional RandomFields: Probabilistic Models for Segmenting and Labeling Sequence Data,”in Proceedings of the Eighteenth International Conference on Machine Learn-ing, ICML ’01, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,pp. 282–289.
19
Recommended