Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Parameter Estimation in AutologisticRegression Models for Detection of Smoke in
Satellite Images
Mark Wolters Shanghai Center for Mathematical SciencesCharmaine Dean Western University
June 15, 2015SSC 2015, Halifax
1
Outline
[email protected] 2/19
1. Introduction
Application; goals
2. Modelling Framework
CRF approach; autologistic regression; challenges
3. Approach to Estimation
Independence model + optimal smoothing
4. Some Results
Simulated data; preliminary findings
5. Discussion
Possible improvements; estimation in a prediction setting
2
1. Introduction
[email protected] 3/19
Earth-orbiting satellites help studylarge-scale environmental phenomena.
Our interest: smoke from forest fires.
Data: MODIS images1 per day, 143 days1.2 Mp eachCentered at Kelowna,BCHand-drawn smoke ar-eas
Goal: classify pixels intosmoke/nonsmoke
Why?Health studiesModel input or valida-tionMonitoring & archiv-ing
3
[email protected] 4/19
Data characteristics
Hyperspectral images.
Spectra at each pixel arecovariates for predictingsmoke.
High-dimensional predictorspace.
Expect spatial association.
35 image planes+
higher order terms
4
2. Modelling Framework
[email protected] 5/19
Joint PMF for an image: autologistic regression (ALR)
Pr(Y = y) ∝ exp
(
(Xβ)T y +λ
2yT Ay
)
y Class labels (n-vector), yi ∈ {L,H}X Model matrix (spectral data, n × p)πi P(pixel i is smoke | neighbour pixels)G = (E ,V) Graph structure for dependence among pixels
(regular 4-connected grid)
linear predictorsare the unary
coefficientsλ = pairwiseassociationparameter
A = adjacencymatrix
5
[email protected] 6/19
Autologistic regression
Notes:
Intractable normalizing constant
ALR is a conditional random field (CRF) model (Lafferty et al., 2001):given X, Y is a Markov random field.
Conditional logit form:
log
(πi
1 − πi
)
= (H − L)
xTi β + λ
∑
j∼i
yj
logistic regression ⇐⇒ λ = 0
Spatial effect is homogeneous, isotropic
For Yi ∈ {0, 1}, the sum∑
yj increases log-odds unless all neigh-bours are zero.
=⇒ Estimates of β, λ are strongly coupled (Caragea and Kaiser, 2009;
Hughes et al., 2011)
6
[email protected] 7/19
Extensions
Centered ALR model (Caragea and Kaiser (2009)) aims to correct for theasymmetry of the pairwise term when Yi ∈ {0, 1}:
logit(πi) = xTi β + λ
∑
j∼i
(yj − μj)
where μj = E[Yj |λ = 0] is the independence expectation.
Claim: just use Yi ∈ {−1, +1} to get the same effect.
Proposal: let λ = λij = λ(xi,xj) for adaptive smoothing.
Then
logit(πi) = 2
xTi β +
∑
j∼i
λ(xi,xj)yj
7
3. Approach to Estimation
[email protected] 8/19
Existing possibilities
1. Ignore spatial association (logistic regression, large n, large p).
2. Pseudolikelihood (PL): L(β, λ) ≈∏
img
n∏
i=1
logit(πi)
3. Monte Carlo ML
4. Bayesian approach
Problems
We have ∼ 108 pixels
We have thousands of predictors, need model selection
We’re still developing models—rapid evaluation of candidates is ben-eficial
}Hughes et al. use perfect sampling; rec-
ommend PL for large n.
8
[email protected] 9/19
Proposal: plug-in estimation
a) Use independence (logistic) to get βIncluding model selectionSample pixels to reduce n to manageable size
b) Choose λ to optimize predictive power
Rationale
Treat λ as a smoothing parameter.
Assuming independence, β captures how information in X can beused to predict Y.
For fixed β, tuning λ will optimally reduce noise in the predictedprobabilities.
9
[email protected] 10/19
Proposed procedure:
model selection
images
training images validation images
test images
training pixels
validation pixels
fitting
performance
evaluation
best independence
model
choose l
to minimize
prediction
error
best ALR
model
final performance
results
logistic regression
autologistic
regression
10
4. Results
[email protected] 11/19
Simulated Images
Predictors: R, G, B
Random ellipses =Class 1 (smoke)
Background = class 2(nonsmoke)
90 images at 3 sizes:100, 200, 400 pixelssquare
11
[email protected] 12/19
A) plug-in vs. PL
pixels method R G B λerror
rate (%)time(min)
1002 plug-in −2.20 −2.00 1.95 0.45 19.2 1.2∗
PL −2.04 −1.99 2.06 0.99 23.3 0.9
2002 plug-in −1.65 −1.36 1.71 0.5 20.8 5.0∗
PL −1.61 −1.30 1.70 1.19 48.2 2.8
4002 plug-in −2.00 −1.41 1.64 0.6 20.8 21.5∗
PL −2.08 −1.40 1.68 1.36 50.5 16.1
∗ time per candidate λ value
12
[email protected] 14/19
B) Effect of coding & centering
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0.20
0.30
0.40
0.50
Error rate vs. λ for 200 x 200 images
Pairwise Parameter
Pre
dict
ion
Err
or
Centered (0,1)Standard (0,1)Standard (−1, 1)
logit(πi) = (H − L)
xTi β + λ
∑
j∼i
(yj − μj)
14
[email protected] 15/19
C) Preliminary smoke results
RGB image
Logistic model Autologistic model
ALR model with 50 predictors
1. Smoke-free areas: OK2. Clouds vs. smoke: OK3. Snow vs. smoke: OK4. Spatial smoothing: OK5. Smoke + Cloud: Problem
15
[email protected] 16/19
Preliminary smoke results (continued)
RGB Logistic Autologistic
6. “Thin” smoke: Problem7. Original masks (training data): Problem
16
5. Discussion
[email protected] 17/19
Advantages of working on a data-rich prediction problem
If you get low out-of-sample prediction error, the following are of littleconcern:
The “truth” of your modelThe complexity or statistical efficiency of your modelwhether or not your parameters are statistically significantwhether or not your parameter estimates are stable
Computational feasibility & run time become paramount.
How would things change if we’re interested in interpretation?
Trade predictive power for model simplicity.The “plug-in” estimation approach is no longer helpful.The issue of model centering and coding becomes critical.
17
[email protected] 18/19
Future plans: smoke
Improve the base logistic regression model
Address “true” label ambiguity
Future plans: models
Computational improvements:
low-level codeparallelization
Revisit adaptive smoothing
A beta CRF for direct modelling of probabilities
Multi-class (autobinomial) extension
18
References
[email protected] 19/19
Caragea, P. C. and Kaiser, M. S. (2009), “Autologistic models with interpretableparameters,” Journal of agricultural, biological, and environmental statis-tics, 14, 281–300.
Hughes, J., Haran, M., and Caragea, P. C. (2011), “Autologistic models for bi-nary data on a lattice,” Environmetrics, 22, 857–871.
Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001), “Conditional RandomFields: Probabilistic Models for Segmenting and Labeling Sequence Data,”in Proceedings of the Eighteenth International Conference on Machine Learn-ing, ICML ’01, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,pp. 282–289.
19