Parameter Estimation in Autologistic Regression Models for

Preview:

Citation preview

Parameter Estimation in AutologisticRegression Models for Detection of Smoke in

Satellite Images

Mark Wolters Shanghai Center for Mathematical SciencesCharmaine Dean Western University

June 15, 2015SSC 2015, Halifax

1

Outline

mwolters@fudan.edu.cn 2/19

1. Introduction

Application; goals

2. Modelling Framework

CRF approach; autologistic regression; challenges

3. Approach to Estimation

Independence model + optimal smoothing

4. Some Results

Simulated data; preliminary findings

5. Discussion

Possible improvements; estimation in a prediction setting

2

1. Introduction

mwolters@fudan.edu.cn 3/19

Earth-orbiting satellites help studylarge-scale environmental phenomena.

Our interest: smoke from forest fires.

Data: MODIS images1 per day, 143 days1.2 Mp eachCentered at Kelowna,BCHand-drawn smoke ar-eas

Goal: classify pixels intosmoke/nonsmoke

Why?Health studiesModel input or valida-tionMonitoring & archiv-ing

3

mwolters@fudan.edu.cn 4/19

Data characteristics

Hyperspectral images.

Spectra at each pixel arecovariates for predictingsmoke.

High-dimensional predictorspace.

Expect spatial association.

35 image planes+

higher order terms

4

2. Modelling Framework

mwolters@fudan.edu.cn 5/19

Joint PMF for an image: autologistic regression (ALR)

Pr(Y = y) ∝ exp

(

(Xβ)T y +λ

2yT Ay

)

y Class labels (n-vector), yi ∈ {L,H}X Model matrix (spectral data, n × p)πi P(pixel i is smoke | neighbour pixels)G = (E ,V) Graph structure for dependence among pixels

(regular 4-connected grid)

linear predictorsare the unary

coefficientsλ = pairwiseassociationparameter

A = adjacencymatrix

5

mwolters@fudan.edu.cn 6/19

Autologistic regression

Notes:

Intractable normalizing constant

ALR is a conditional random field (CRF) model (Lafferty et al., 2001):given X, Y is a Markov random field.

Conditional logit form:

log

(πi

1 − πi

)

= (H − L)

xTi β + λ

j∼i

yj

logistic regression ⇐⇒ λ = 0

Spatial effect is homogeneous, isotropic

For Yi ∈ {0, 1}, the sum∑

yj increases log-odds unless all neigh-bours are zero.

=⇒ Estimates of β, λ are strongly coupled (Caragea and Kaiser, 2009;

Hughes et al., 2011)

6

mwolters@fudan.edu.cn 7/19

Extensions

Centered ALR model (Caragea and Kaiser (2009)) aims to correct for theasymmetry of the pairwise term when Yi ∈ {0, 1}:

logit(πi) = xTi β + λ

j∼i

(yj − μj)

where μj = E[Yj |λ = 0] is the independence expectation.

Claim: just use Yi ∈ {−1, +1} to get the same effect.

Proposal: let λ = λij = λ(xi,xj) for adaptive smoothing.

Then

logit(πi) = 2

xTi β +

j∼i

λ(xi,xj)yj

7

3. Approach to Estimation

mwolters@fudan.edu.cn 8/19

Existing possibilities

1. Ignore spatial association (logistic regression, large n, large p).

2. Pseudolikelihood (PL): L(β, λ) ≈∏

img

n∏

i=1

logit(πi)

3. Monte Carlo ML

4. Bayesian approach

Problems

We have ∼ 108 pixels

We have thousands of predictors, need model selection

We’re still developing models—rapid evaluation of candidates is ben-eficial

}Hughes et al. use perfect sampling; rec-

ommend PL for large n.

8

mwolters@fudan.edu.cn 9/19

Proposal: plug-in estimation

a) Use independence (logistic) to get βIncluding model selectionSample pixels to reduce n to manageable size

b) Choose λ to optimize predictive power

Rationale

Treat λ as a smoothing parameter.

Assuming independence, β captures how information in X can beused to predict Y.

For fixed β, tuning λ will optimally reduce noise in the predictedprobabilities.

9

mwolters@fudan.edu.cn 10/19

Proposed procedure:

model selection

images

training images validation images

test images

training pixels

validation pixels

fitting

performance

evaluation

best independence

model

choose l

to minimize

prediction

error

best ALR

model

final performance

results

logistic regression

autologistic

regression

10

4. Results

mwolters@fudan.edu.cn 11/19

Simulated Images

Predictors: R, G, B

Random ellipses =Class 1 (smoke)

Background = class 2(nonsmoke)

90 images at 3 sizes:100, 200, 400 pixelssquare

11

mwolters@fudan.edu.cn 12/19

A) plug-in vs. PL

pixels method R G B λerror

rate (%)time(min)

1002 plug-in −2.20 −2.00 1.95 0.45 19.2 1.2∗

PL −2.04 −1.99 2.06 0.99 23.3 0.9

2002 plug-in −1.65 −1.36 1.71 0.5 20.8 5.0∗

PL −1.61 −1.30 1.70 1.19 48.2 2.8

4002 plug-in −2.00 −1.41 1.64 0.6 20.8 21.5∗

PL −2.08 −1.40 1.68 1.36 50.5 16.1

∗ time per candidate λ value

12

mwolters@fudan.edu.cn 13/19

Example predicted probabilities

Truth Plug-in PL

13

mwolters@fudan.edu.cn 14/19

B) Effect of coding & centering

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.20

0.30

0.40

0.50

Error rate vs. λ for 200 x 200 images

Pairwise Parameter

Pre

dict

ion

Err

or

Centered (0,1)Standard (0,1)Standard (−1, 1)

logit(πi) = (H − L)

xTi β + λ

j∼i

(yj − μj)

14

mwolters@fudan.edu.cn 15/19

C) Preliminary smoke results

RGB image

Logistic model Autologistic model

ALR model with 50 predictors

1. Smoke-free areas: OK2. Clouds vs. smoke: OK3. Snow vs. smoke: OK4. Spatial smoothing: OK5. Smoke + Cloud: Problem

15

mwolters@fudan.edu.cn 16/19

Preliminary smoke results (continued)

RGB Logistic Autologistic

6. “Thin” smoke: Problem7. Original masks (training data): Problem

16

5. Discussion

mwolters@fudan.edu.cn 17/19

Advantages of working on a data-rich prediction problem

If you get low out-of-sample prediction error, the following are of littleconcern:

The “truth” of your modelThe complexity or statistical efficiency of your modelwhether or not your parameters are statistically significantwhether or not your parameter estimates are stable

Computational feasibility & run time become paramount.

How would things change if we’re interested in interpretation?

Trade predictive power for model simplicity.The “plug-in” estimation approach is no longer helpful.The issue of model centering and coding becomes critical.

17

mwolters@fudan.edu.cn 18/19

Future plans: smoke

Improve the base logistic regression model

Address “true” label ambiguity

Future plans: models

Computational improvements:

low-level codeparallelization

Revisit adaptive smoothing

A beta CRF for direct modelling of probabilities

Multi-class (autobinomial) extension

18

References

mwolters@fudan.edu.cn 19/19

Caragea, P. C. and Kaiser, M. S. (2009), “Autologistic models with interpretableparameters,” Journal of agricultural, biological, and environmental statis-tics, 14, 281–300.

Hughes, J., Haran, M., and Caragea, P. C. (2011), “Autologistic models for bi-nary data on a lattice,” Environmetrics, 22, 857–871.

Lafferty, J. D., McCallum, A., and Pereira, F. C. N. (2001), “Conditional RandomFields: Probabilistic Models for Segmenting and Labeling Sequence Data,”in Proceedings of the Eighteenth International Conference on Machine Learn-ing, ICML ’01, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.,pp. 282–289.

19

Recommended