Parameter Estimation in Autologistic Regression Models for

Parameter Estimation in AutologisticRegression Models for Detection of Smoke in

Satellite Images

Mark Wolters Shanghai Center for Mathematical SciencesCharmaine Dean Western University

June 15, 2015SSC 2015, Halifax

1

Outline

[email protected] 2/19

1. Introduction

Application; goals

2. Modelling Framework

CRF approach; autologistic regression; challenges

3. Approach to Estimation

Independence model + optimal smoothing

4. Some Results

Simulated data; preliminary findings

5. Discussion

Possible improvements; estimation in a prediction setting

2

1. Introduction


Earth-orbiting satellites help studylarge-scale environmental phenomena.

Our interest: smoke from forest fires.

Data: MODIS images1 per day, 143 days1.2 Mp eachCentered at Kelowna,BCHand-drawn smoke ar-eas

Goal: classify pixels intosmoke/nonsmoke

Why?Health studiesModel input or valida-tionMonitoring & archiv-ing

3


Data characteristics

Hyperspectral images.

Spectra at each pixel arecovariates for predictingsmoke.

High-dimensional predictorspace.

Expect spatial association.

35 image planes+

higher order terms

4

2. Modelling Framework


Joint PMF for an image: autologistic regression (ALR)

Pr(Y = y) ∝ exp

(

(Xβ)T y +λ

2yT Ay

)

y Class labels (n-vector), yi ∈ {L,H}X Model matrix (spectral data, n × p)πi P(pixel i is smoke | neighbour pixels)G = (E ,V) Graph structure for dependence among pixels

(regular 4-connected grid)

linear predictorsare the unary

coefficientsλ = pairwiseassociationparameter

A = adjacencymatrix

5


Autologistic regression

Notes:

Intractable normalizing constant

ALR is a conditional random field (CRF) model (Lafferty et al., 2001):given X, Y is a Markov random field.

Conditional logit form:

log

(πi

1 − πi

)

= (H − L)

xTi β + λ

∑

j∼i

yj

logistic regression ⇐⇒ λ = 0

Spatial effect is homogeneous, isotropic

For Yi ∈ {0, 1}, the sum∑

yj increases log-odds unless all neigh-bours are zero.

=⇒ Estimates of β, λ are strongly coupled (Caragea and Kaiser, 2009;

Hughes et al., 2011)

6


Extensions

Centered ALR model (Caragea and Kaiser (2009)) aims to correct for theasymmetry of the pairwise term when Yi ∈ {0, 1}:

logit(πi) = xTi β + λ

∑

j∼i

(yj − μj)

where μj = E[Yj |λ = 0] is the independence expectation.

Claim: just use Yi ∈ {−1, +1} to get the same effect.

Proposal: let λ = λij = λ(xi,xj) for adaptive smoothing.

Then

logit(πi) = 2

xTi β +

∑

j∼i

λ(xi,xj)yj

7

3. Approach to Estimation


Existing possibilities

1. Ignore spatial association (logistic regression, large n, large p).

2. Pseudolikelihood (PL): L(β, λ) ≈∏

img

n∏

i=1

logit(πi)

3. Monte Carlo ML

4. Bayesian approach

Problems

We have ∼ 108 pixels

We have thousands of predictors, need model selection

We’re still developing models—rapid evaluation of candidates is ben-eficial

}Hughes et al. use perfect sampling; rec-

ommend PL for large n.

8


Proposal: plug-in estimation

a) Use independence (logistic) to get βIncluding model selectionSample pixels to reduce n to manageable size

b) Choose λ to optimize predictive power

Rationale

Treat λ as a smoothing parameter.

Assuming independence, β captures how information in X can beused to predict Y.

For fixed β, tuning λ will optimally reduce noise in the predictedprobabilities.

9


Proposed procedure:

model selection

images

training images validation images

test images

training pixels

validation pixels

fitting

performance

evaluation

best independence

model

choose l

to minimize

prediction

error

best ALR

model

final performance

results

logistic regression

autologistic

regression

10

4. Results


Simulated Images

Predictors: R, G, B

Random ellipses =Class 1 (smoke)

Background = class 2(nonsmoke)

90 images at 3 sizes:100, 200, 400 pixelssquare

11


A) plug-in vs. PL

pixels method R G B λerror

rate (%)time(min)

1002 plug-in −2.20 −2.00 1.95 0.45 19.2 1.2∗

PL −2.04 −1.99 2.06 0.99 23.3 0.9

2002 plug-in −1.65 −1.36 1.71 0.5 20.8 5.0∗

PL −1.61 −1.30 1.70 1.19 48.2 2.8

4002 plug-in −2.00 −1.41 1.64 0.6 20.8 21.5∗

PL −2.08 −1.40 1.68 1.36 50.5 16.1

∗ time per candidate λ value

12


Example predicted probabilities

Truth Plug-in PL

13


B) Effect of coding & centering

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.20

0.30

0.40

0.50

Error rate vs. λ for 200 x 200 images

Pairwise Parameter

Pre

dict

ion

Err

or

Centered (0,1)Standard (0,1)Standard (−1, 1)

logit(πi) = (H − L)

xTi β + λ

∑

j∼i

(yj − μj)

14


C) Preliminary smoke results

RGB image

Logistic model Autologistic model

ALR model with 50 predictors

1. Smoke-free areas: OK2. Clouds vs. smoke: OK3. Snow vs. smoke: OK4. Spatial smoothing: OK5. Smoke + Cloud: Problem

15


Preliminary smoke results (continued)

RGB Logistic Autologistic

6. “Thin” smoke: Problem7. Original masks (training data): Problem

16

5. Discussion


Advantages of working on a data-rich prediction problem

If you get low out-of-sample prediction error, the following are of littleconcern:

The “truth” of your modelThe complexity or statistical efficiency of your modelwhether or not your parameters are statistically significantwhether or not your parameter estimates are stable

Computational feasibility & run time become paramount.

How would things change if we’re interested in interpretation?

Trade predictive power for model simplicity.The “plug-in” estimation approach is no longer helpful.The issue of model centering and coding becomes critical.

17


Future plans: smoke

Improve the base logistic regression model

Address “true” label ambiguity

Future plans: models

Computational improvements:

low-level codeparallelization

Revisit adaptive smoothing

A beta CRF for direct modelling of probabilities

Multi-class (autobinomial) extension

18

References


Caragea, P. C. and Kaiser, M. S. (2009), “Autologistic models with interpretableparameters,” Journal of agricultural, biological, and environmental statis-tics, 14, 281–300.

Hughes, J., Haran, M., and Caragea, P. C. (2011), “Autologistic models for bi-nary data on a lattice,” Environmetrics, 22, 857–871.

Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001), “Conditional RandomFields: Probabilistic Models for Segmenting and Labeling Sequence Data,”in Proceedings of the Eighteenth International Conference on Machine Learn-ing, ICML ’01, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,pp. 282–289.

19

Documents

Parameter Estimation in Autologistic Regression Models for