A Fully Nonparametric Modeling Approach to Binary …mnd13/SBIES2012.pdf · Introduction...

IntroductionMethodology

Data IllustrationsDiscussion

A Fully Nonparametric Modeling Approach toBinary Regression

Maria De Yoreo

Department of Applied Mathematics and StatisticsUniversity of California, Santa Cruz

SBIES, April 27-28, 2012

De Yoreo BNP Binary Regression

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

Outline

1 Introduction

4 Discussion

Outline

1 Introduction

4 Discussion

Outline

1 Introduction

4 Discussion

Motivation

I binary responses along with covariates are present inmany settings, including biometrics, econometrics, andsocial sciences

I Goal: determine the relationship between response andcovariates

I examples: credit scoring, medicine, population dynamics,environmental sciences

I the response-covariate relationship is described by theregression function

I standard approaches involve linearity and distributionalassumptions, e.g., GLMs

Motivation

I binary responses along with covariates are present inmany settings, including biometrics, econometrics, andsocial sciences

I Goal: determine the relationship between response andcovariates

I examples: credit scoring, medicine, population dynamics,environmental sciences

I the response-covariate relationship is described by theregression function

I standard approaches involve linearity and distributionalassumptions, e.g., GLMs

Bayesian Nonparametrics

I Bayesian nonparametrics can be used to relax commondistributional assumptions, resulting in flexible regressionmodels with proper uncertainty quantification

I rather than modeling directly the regression function,model the joint distribution of response and covariatesusing a nonparametric mixture model (West et al., 1994,Müller et al., 1996)

I this implies a form for the conditional response distribution,which is implicitly modeled nonparametrically

I involves random covariates

Latent Variable Formulation

I introduce latent continuous random variables z thatdetermine the binary responses y , so that y = 1 if-f z > 0(e.g., Albert and Chib, 1993)

I estimate the joint distribution of latent responses andcovariates f (z, x) using a nonparametric mixture model, toobtain flexible inference for the regression functionpr(y = 1|x)

I the latent variables may be of interest in some applications,containing more information than just a 0/1 observation

I in biology applications, these may be thought of asmaturity, latent survivorship, or measure of health

Model FormulationPosterior Inference

Outline

1 Introduction

4 Discussion

DP Mixture Model

The Dirichlet Process (DP) (Ferguson, 1973) generatesrandom distributions, and can be used as a prior for spaces ofdistribution functions.

I DP constructive definition (Sethuraman, 1994): ifG ∼ DP(α,G0), then it is almost surely of the form∑∞

l=1 plδνl

→ νliid∼ G0, l = 1,2, ...

→ zriid∼ Beta(1, α), r = 1,2, ...

→ define p1 = z1, and pl = zl∏l−1

r=1(1− zr ), for l = 2,3, ...I DP mixture model for the latent responses and covariates

f (z, x ; G) =

∫Np+1(z, x ;µ,Σ)dG(µ,Σ)

G|α,ψ ∼ DP(α,G0(µ,Σ;ψ))

DP Mixture Model

The Dirichlet Process (DP) (Ferguson, 1973) generatesrandom distributions, and can be used as a prior for spaces ofdistribution functions.

I DP constructive definition (Sethuraman, 1994): ifG ∼ DP(α,G0), then it is almost surely of the form∑∞

l=1 plδνl

→ νliid∼ G0, l = 1,2, ...

→ zriid∼ Beta(1, α), r = 1,2, ...

→ define p1 = z1, and pl = zl∏l−1

r=1(1− zr ), for l = 2,3, ...I DP mixture model for the latent responses and covariates

f (z, x ; G) =

∫Np+1(z, x ;µ,Σ)dG(µ,Σ)

G|α,ψ ∼ DP(α,G0(µ,Σ;ψ))

Implied Conditional Regression

I From the constructive definition, the model has an a.s.representation as a countable mixture of MVNs

f (z, x ; G) =∞∑

plNp+1(z, x ;µl ,Σl)

I Binary regression functional: pr(y = 1|x ; G)

→ marginalize over z to obtain f (x ; G) and f (y , x ; G)

f (x ; G) =∞∑

plNp(x ;µxl ,Σ

And the joint distribution f (y , x ; G) =

∞∑l=1

plNp(x ;µxl ,Σ

xxl )Bern

(y ; Φ

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

))De Yoreo BNP Binary Regression

f (z, x ; G) =∞∑

f (x ; G) =∞∑

plNp(x ;µxl ,Σ

∞∑l=1

plNp(x ;µxl ,Σ

xxl )Bern

(y ; Φ

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

f (z, x ; G) =∞∑

f (x ; G) =∞∑

plNp(x ;µxl ,Σ

∞∑l=1

plNp(x ;µxl ,Σ

xxl )Bern

(y ; Φ

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

The Regression Function

I implied regression function:pr(y = 1|x ; G) =

∑∞l=1 wl(x)πl(x), with covariate

dependent weights

wl(x) ∝ plN(x ;µxl ,Σ

and probabilities

πl(x) = Φ

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

I Notice that the probabilities have the probit form withcomponent-specific intercept and slope parameters

The Regression Function

I implied regression function:pr(y = 1|x ; G) =

∑∞l=1 wl(x)πl(x), with covariate

dependent weights

wl(x) ∝ plN(x ;µxl ,Σ

and probabilities

πl(x) = Φ

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

I Notice that the probabilities have the probit form withcomponent-specific intercept and slope parameters

Identifiability

Can the entire covariance matrix Σ be estimated?I Probit Regression: z ∼ N(xTβ,1)

I the binary responses are not able to inform about the scaleof the latent responses

I retaining Σzx is important, if we set it to 0, then πl(x)becomes just πl

I We have shown that if Σzz is fixed, the remainingparameters are identifiable in the kernel of the mixturemodel for y and x

Identifiability

Facilitating Identifiability

How to fix only one element of the covariance matrix?I the usual inverse-Wishart distribution will not workI square-root-free Cholesky decomposition of Σ uses the

relationship ∆ = βΣβT , with ∆ diagonal with all elementsδi > 0, and β lower triangular with 1 on its diagonal(Daniels and Pourahmadi, 2002; Webb and Forster, 2007)

I For y = (y1, ..., ym) ∼ N(µ,Σ), with ∆ = βΣβT , the jointdistribution for y can be expressed in a recursive form:y1 ∼ N(µ1, δ1),(yk |y1, . . . , yk−1) ∼ N(µk −

∑k−1j=1 βk ,j(yj − µj), δk ),

k = 2, ...,m→ useful for modeling longitudinal data and specifying

conditional independence assumptions

∑k−1j=1 βk ,j(yj − µj), δk ),

I here, no natural ordering is present, but theparamaterization has other useful properties which weexploit

I δ1 = Σzz

→ fix δ1, and mix on δ2, . . . , δp+1 and p(p + 1)/2 free elementsof β, denoted by vector β̃

Then the DP mixture model becomes

f (z, x ; G) =

∫Np+1(z, x ;µ, β−1∆β−T )dG(µ, β,∆)

I computationally convenient: there exist conjugate priordistributions for β̃ and δ2, ..., δp+1, which are MVN and(independent) inverse-gamma

I here, no natural ordering is present, but theparamaterization has other useful properties which weexploit

I δ1 = Σzz

→ fix δ1, and mix on δ2, . . . , δp+1 and p(p + 1)/2 free elementsof β, denoted by vector β̃

Then the DP mixture model becomes

f (z, x ; G) =

∫Np+1(z, x ;µ, β−1∆β−T )dG(µ, β,∆)

I computationally convenient: there exist conjugate priordistributions for β̃ and δ2, ..., δp+1, which are MVN and(independent) inverse-gamma

Outline

1 Introduction

4 Discussion

Hierarchical Model

Blocked Gibbs sampler: truncate G to GN(·) =∑N

l=1 plδWl (·),with Wl = (µl , β̃l ,∆l), and introduce configuration variables(L1, ...,Ln) taking values in 1, ...,N.

yi |ziind∼ 1(yi=1)1(zi>0) + 1(yi=0)1(zi≤0), i = 1, . . . ,n

(zi , xi)|W ,Liind∼ Np+1((zi , xi);µLi , β

−1Li

∆Liβ−TLi

), i = 1, ...,n

Li |p ∼N∑

plδl(Li), i = 1, . . . ,n

Wl |ψind∼ Np+1(µl ; m,V )Nq(β̃l ; θ, cI)

p+1∏i=2

IG(δi,l ; νi , si), l = 1, . . . ,N

Posterior Inference

I Gibbs sampling may be used to simulate from full posteriorp(W ,L,p, ψ, α, z|data), with the conditionally conjugatebase distribution, and conjugate priors on ψ and α.

I The posterior for GN = (p,W ) is imputed in the MCMC,enabling full inference for any functional of f (z, x ; GN), nowa finite sum

I Binary regression functional: for any covariate value x0, atiteration r of the MCMC, calculate pr(y = 1|x0; G(r)

→ provides point estimate and uncertainty quantification forregression function

I Same can be done for other functionals, such as latentresponse distribution f (z|x0; GN) at any covariate value x0

Posterior Inference

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Outline

1 Introduction

4 Discussion

Simulated Data

I Data {(zi , xi) : i = 1, . . . ,n} was simulated from a mixtureof 3 bivariate normals, and y determined from z.

I compare inference from the binary regression model withdata (y , x) to that from model which views (z, x) as data

I a practical prior specification approach which isappropriate when little is known about the problem isapplied here

I to specify priors on ψ, consider only one mixturecomponent and use an approximate center and range ofthe data, as well as prior simulation to induce anapproximate unif(−1,1) prior on corr(z, x)

Simulated Data

−2 0 2 4

Pr(z>0|x;G)

−2 0 2 4

xPr(y=1|x;G)

The inference for pr(z > 0|x ; G) (left) is compared to that forpr(y = 1|x ; G) (right) and the truth (solid line).

−4 −3 −2 −1 0 1 2 3

f(z|x=x1)

−4 −3 −2 −1 0 1 2 3

f(z|x=x2)

−4 −3 −2 −1 0 1 2 3

f(z|x=x3)

f(z|x=x1)

−3.9 0.0 2.9

f(z|x=x2)

−3.9 0.0 2.9

f(z|x=x3)

−3.9 0.0 2.9

Top row: Inference for f (z|x0; G) under the model which views zas observed, with true densities as dashed lines, at 3 values ofx0. Bottom: Inference from the binary regression model.

Outline

1 Introduction

4 Discussion

Ozone and Wind Speed

I 111 daily measurements of wind speed (mph) and ozoneconcentration (parts per billion) in NYC over 4 monthperiod

I objective: model the probability of exceeding a certainozone concentration as a function of wind speed

I the model only sees whether or not there was anexceedance, but there is an actual ozone concentrationunderlying this 0/1 value

5 10 15 20

wind speed

abilit

5 10 15 20

wind speed

tratio

Left: The probability that ozone concentration (parts per billion)exceeds a threshold of 70 decreases with wind speed (mph).Right: For comparison, here are the actual non-discretizedozone measurements as a function of wind speed.

−3 −1 0 1 2 30.0

f(z|x0)

−3 −1 0 1 2 3

f(z|x0)

−3 −1 0 1 2 3

f(z|x0)

−3 −1 0 1 2 30.0

f(z|x0)

Estimates for f (z|x0; G) at wind speed values of 5, 8, 10, and15 mph.

Outline

1 Introduction

4 Discussion

Credit Cards and Income

I n = 100 subjects in a study were asked whether or notthey owned a travel credit card, and their income wasrecorded (Agresti, 1996)

I In this situation, it is not clear that there is somemeaningful interpretation of the latent continuous randomvariables, but we can still use the method for regression

I Does probability of owning a credit card change withincome?

10 20 30 40 50 60 70

income in thousands

●●●●●●●●

●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●

●●●●●●●●●●

●●

● ●●●●

●● ●●●●●● ●●●●●●

●● ●● ● ●

●●●●●● ●

Probability of owning a credit card appears to increase withincome, with a slight dip or leveling off around income of 40-50,since all subjects in that region did not own a credit card.

Extensions to Ordinal Reponses

I similar methodology, wider range of applicationsI for an ordinal response with C categories, assume y = j

if-f γj−1 < z ≤ γj , for j = 1, ...C, and apply the same DPmixture of MVNs for (z, x)

I for fixed cut-off points γ, it can be shown that all of µ and Σare identifiable in the induced kernel for the observables

I the C − 1 free cut-off points can be fixed to arbitraryincreasing values (Kottas et al., 2005), which is an attributein a computational sense

Other Extensions

I multivariate ordinal responses: J ordinal responsesassociated with a vector of covariates for each subject;with Cj categories associated with the j th response

I several applications, but limited existing methods forflexible inference

I y and z are vectors, and yj = l if-f γj,l−1 < zj ≤ γj,l , forj = 1, ..., J, and l = 1, ...,Cj

I Cj > 2 for all j , then no identifiability restrictions neededI Cj = 2 for some j , then (β,∆) paramaterization can be

used, and fixing certain elements of δ provides thenecessary restrictions

I mixed ordinal-continuous responses

Other Extensions

Conclusions

? Binary responses measured along with covariatesrepresents a simple setting, but the scope of problemswhich lie in this category is large.

? This framework allows flexible, nonparametric inference tobe obtained for the regression relationship in a generalbinary regression problem.

? The methodology extends easily to larger classes ofproblems in ordinal regression, including multivariateresponses and mixed responses, making the frameworkmuch more powerful, with utility in a wide variety ofapplications.

A Fully Nonparametric Modeling Approach to Binary …mnd13/SBIES2012.pdf · Introduction...

Documents

Nonparametric Inference

NONPARAMETRIC TEXTURE ANALYSIS WITH COMPLEMENTARY … · Ojala, Pietikäinen and Harw ood [17] introduced the Local Binary Pattern (LBP) texture opera-tor shown in Fig. 1. The original

Meta-analysis of functional neuroimaging data … › ~meng › Papers › aoas.Yue.etal.2011.pdfMETA-ANALYSIS OF FUNCTIONAL NEUROIMAGING DATA USING BAYESIAN NONPARAMETRIC BINARY REGRESSION

Module 9: Nonparametric Statistics - Naval …faculty.nps.edu/rdfricke/OA3102/Nonparametric Statistics.pdf2 Goals for this Lecture • Discuss advantages and disadvantages of nonparametric

NONPARAMETRIC AND PARTIALLY NONPARAMETRIC … › researcher › files › us... · 2010-03-30 · NONPARAMETRIC AND PARTIALLY NONPARAMETRIC STATISTICAL INFERENCE IN WIRELESS SENSOR

Experiments : design, parametric and nonparametric ... · Experiments : design, parametric and nonparametric ... design, parametric and nonparametric analysis, and ... Fisher'sbook

Gibbons Nonparametric (2003)

Nonparametric Statistical Methods

Applied Nonparametric Regression - Kuliah Umum 19 …ft-sipil.unila.ac.id/dbooks/applied nonparametric regression.pdf · Applied Nonparametric Regression ... exibility in data analysis

Module 9: Nonparametric Tests - Nova Southeastern …apps.fischlerschool.nova.edu/toolbox...Module 9 Overview ! Nonparametric Tests ! Parametric vs. Nonparametric Tests ! Restrictions

Nonparametric Test

Nonparametric Methods

Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Nonparametric Estimation: Part II - Stanford Universityyjhan/nonparametric... · Nonparametric Estimation: Part II Regression in Transformed Space Yanjun Han Department of Electrical

Nonparametric Bayesian Dictionary Learning for Analysis of … · Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images ... nonparametric Bayesian

Bhattacharya Nonparametric

nonparametric lecture.ppt

Chap. 9: Nonparametric Statistics - nu.edu.sd Nonparametric Statistics.pdf · Learning Objectives 1. Distinguish Parametric & Nonparametric Test Procedures 2. Explain commonly used

Ch10 Nonparametric Tests

Introduction to Nonparametric Analysis - SAS Support for Normality ... SAS/STAT software provides several nonparametric tests for ... Introduction to Nonparametric Analysis