A Fully Nonparametric Modeling Approach to Binary …mnd13/SBIES2012.pdf · Introduction...

Preview:

Citation preview

IntroductionMethodology

Data IllustrationsDiscussion

A Fully Nonparametric Modeling Approach toBinary Regression

Maria De Yoreo

Department of Applied Mathematics and StatisticsUniversity of California, Santa Cruz

SBIES, April 27-28, 2012

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Motivation

I binary responses along with covariates are present inmany settings, including biometrics, econometrics, andsocial sciences

I Goal: determine the relationship between response andcovariates

I examples: credit scoring, medicine, population dynamics,environmental sciences

I the response-covariate relationship is described by theregression function

I standard approaches involve linearity and distributionalassumptions, e.g., GLMs

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Motivation

I binary responses along with covariates are present inmany settings, including biometrics, econometrics, andsocial sciences

I Goal: determine the relationship between response andcovariates

I examples: credit scoring, medicine, population dynamics,environmental sciences

I the response-covariate relationship is described by theregression function

I standard approaches involve linearity and distributionalassumptions, e.g., GLMs

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Bayesian Nonparametrics

I Bayesian nonparametrics can be used to relax commondistributional assumptions, resulting in flexible regressionmodels with proper uncertainty quantification

I rather than modeling directly the regression function,model the joint distribution of response and covariatesusing a nonparametric mixture model (West et al., 1994,Müller et al., 1996)

I this implies a form for the conditional response distribution,which is implicitly modeled nonparametrically

I involves random covariates

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Bayesian Nonparametrics

I Bayesian nonparametrics can be used to relax commondistributional assumptions, resulting in flexible regressionmodels with proper uncertainty quantification

I rather than modeling directly the regression function,model the joint distribution of response and covariatesusing a nonparametric mixture model (West et al., 1994,Müller et al., 1996)

I this implies a form for the conditional response distribution,which is implicitly modeled nonparametrically

I involves random covariates

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Bayesian Nonparametrics

I Bayesian nonparametrics can be used to relax commondistributional assumptions, resulting in flexible regressionmodels with proper uncertainty quantification

I rather than modeling directly the regression function,model the joint distribution of response and covariatesusing a nonparametric mixture model (West et al., 1994,Müller et al., 1996)

I this implies a form for the conditional response distribution,which is implicitly modeled nonparametrically

I involves random covariates

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Latent Variable Formulation

I introduce latent continuous random variables z thatdetermine the binary responses y , so that y = 1 if-f z > 0(e.g., Albert and Chib, 1993)

I estimate the joint distribution of latent responses andcovariates f (z, x) using a nonparametric mixture model, toobtain flexible inference for the regression functionpr(y = 1|x)

I the latent variables may be of interest in some applications,containing more information than just a 0/1 observation

I in biology applications, these may be thought of asmaturity, latent survivorship, or measure of health

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Latent Variable Formulation

I introduce latent continuous random variables z thatdetermine the binary responses y , so that y = 1 if-f z > 0(e.g., Albert and Chib, 1993)

I estimate the joint distribution of latent responses andcovariates f (z, x) using a nonparametric mixture model, toobtain flexible inference for the regression functionpr(y = 1|x)

I the latent variables may be of interest in some applications,containing more information than just a 0/1 observation

I in biology applications, these may be thought of asmaturity, latent survivorship, or measure of health

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Latent Variable Formulation

I introduce latent continuous random variables z thatdetermine the binary responses y , so that y = 1 if-f z > 0(e.g., Albert and Chib, 1993)

I estimate the joint distribution of latent responses andcovariates f (z, x) using a nonparametric mixture model, toobtain flexible inference for the regression functionpr(y = 1|x)

I the latent variables may be of interest in some applications,containing more information than just a 0/1 observation

I in biology applications, these may be thought of asmaturity, latent survivorship, or measure of health

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

DP Mixture Model

The Dirichlet Process (DP) (Ferguson, 1973) generatesrandom distributions, and can be used as a prior for spaces ofdistribution functions.

I DP constructive definition (Sethuraman, 1994): ifG ∼ DP(α,G0), then it is almost surely of the form∑∞

l=1 plδνl

→ νliid∼ G0, l = 1,2, ...

→ zriid∼ Beta(1, α), r = 1,2, ...

→ define p1 = z1, and pl = zl∏l−1

r=1(1− zr ), for l = 2,3, ...I DP mixture model for the latent responses and covariates

f (z, x ; G) =

∫Np+1(z, x ;µ,Σ)dG(µ,Σ)

G|α,ψ ∼ DP(α,G0(µ,Σ;ψ))

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

DP Mixture Model

The Dirichlet Process (DP) (Ferguson, 1973) generatesrandom distributions, and can be used as a prior for spaces ofdistribution functions.

I DP constructive definition (Sethuraman, 1994): ifG ∼ DP(α,G0), then it is almost surely of the form∑∞

l=1 plδνl

→ νliid∼ G0, l = 1,2, ...

→ zriid∼ Beta(1, α), r = 1,2, ...

→ define p1 = z1, and pl = zl∏l−1

r=1(1− zr ), for l = 2,3, ...I DP mixture model for the latent responses and covariates

f (z, x ; G) =

∫Np+1(z, x ;µ,Σ)dG(µ,Σ)

G|α,ψ ∼ DP(α,G0(µ,Σ;ψ))

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Implied Conditional Regression

I From the constructive definition, the model has an a.s.representation as a countable mixture of MVNs

f (z, x ; G) =∞∑

l=1

plNp+1(z, x ;µl ,Σl)

I Binary regression functional: pr(y = 1|x ; G)

→ marginalize over z to obtain f (x ; G) and f (y , x ; G)

f (x ; G) =∞∑

l=1

plNp(x ;µxl ,Σ

xxl )

And the joint distribution f (y , x ; G) =

∞∑l=1

plNp(x ;µxl ,Σ

xxl )Bern

(y ; Φ

(µz

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

))De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Implied Conditional Regression

I From the constructive definition, the model has an a.s.representation as a countable mixture of MVNs

f (z, x ; G) =∞∑

l=1

plNp+1(z, x ;µl ,Σl)

I Binary regression functional: pr(y = 1|x ; G)

→ marginalize over z to obtain f (x ; G) and f (y , x ; G)

f (x ; G) =∞∑

l=1

plNp(x ;µxl ,Σ

xxl )

And the joint distribution f (y , x ; G) =

∞∑l=1

plNp(x ;µxl ,Σ

xxl )Bern

(y ; Φ

(µz

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

))De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Implied Conditional Regression

I From the constructive definition, the model has an a.s.representation as a countable mixture of MVNs

f (z, x ; G) =∞∑

l=1

plNp+1(z, x ;µl ,Σl)

I Binary regression functional: pr(y = 1|x ; G)

→ marginalize over z to obtain f (x ; G) and f (y , x ; G)

f (x ; G) =∞∑

l=1

plNp(x ;µxl ,Σ

xxl )

And the joint distribution f (y , x ; G) =

∞∑l=1

plNp(x ;µxl ,Σ

xxl )Bern

(y ; Φ

(µz

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

))De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

The Regression Function

I implied regression function:pr(y = 1|x ; G) =

∑∞l=1 wl(x)πl(x), with covariate

dependent weights

wl(x) ∝ plN(x ;µxl ,Σ

xxl )

and probabilities

πl(x) = Φ

(µz

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

)

I Notice that the probabilities have the probit form withcomponent-specific intercept and slope parameters

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

The Regression Function

I implied regression function:pr(y = 1|x ; G) =

∑∞l=1 wl(x)πl(x), with covariate

dependent weights

wl(x) ∝ plN(x ;µxl ,Σ

xxl )

and probabilities

πl(x) = Φ

(µz

l + Σzxl (Σxx

l )−1(x − µxl )

(Σzzl − Σzx

l (Σxxl )−1Σxz

l )1/2

)

I Notice that the probabilities have the probit form withcomponent-specific intercept and slope parameters

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Identifiability

Can the entire covariance matrix Σ be estimated?I Probit Regression: z ∼ N(xTβ,1)

I the binary responses are not able to inform about the scaleof the latent responses

I retaining Σzx is important, if we set it to 0, then πl(x)becomes just πl

I We have shown that if Σzz is fixed, the remainingparameters are identifiable in the kernel of the mixturemodel for y and x

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Identifiability

Can the entire covariance matrix Σ be estimated?I Probit Regression: z ∼ N(xTβ,1)

I the binary responses are not able to inform about the scaleof the latent responses

I retaining Σzx is important, if we set it to 0, then πl(x)becomes just πl

I We have shown that if Σzz is fixed, the remainingparameters are identifiable in the kernel of the mixturemodel for y and x

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Identifiability

Can the entire covariance matrix Σ be estimated?I Probit Regression: z ∼ N(xTβ,1)

I the binary responses are not able to inform about the scaleof the latent responses

I retaining Σzx is important, if we set it to 0, then πl(x)becomes just πl

I We have shown that if Σzz is fixed, the remainingparameters are identifiable in the kernel of the mixturemodel for y and x

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Identifiability

Can the entire covariance matrix Σ be estimated?I Probit Regression: z ∼ N(xTβ,1)

I the binary responses are not able to inform about the scaleof the latent responses

I retaining Σzx is important, if we set it to 0, then πl(x)becomes just πl

I We have shown that if Σzz is fixed, the remainingparameters are identifiable in the kernel of the mixturemodel for y and x

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Identifiability

Can the entire covariance matrix Σ be estimated?I Probit Regression: z ∼ N(xTβ,1)

I the binary responses are not able to inform about the scaleof the latent responses

I retaining Σzx is important, if we set it to 0, then πl(x)becomes just πl

I We have shown that if Σzz is fixed, the remainingparameters are identifiable in the kernel of the mixturemodel for y and x

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Facilitating Identifiability

How to fix only one element of the covariance matrix?I the usual inverse-Wishart distribution will not workI square-root-free Cholesky decomposition of Σ uses the

relationship ∆ = βΣβT , with ∆ diagonal with all elementsδi > 0, and β lower triangular with 1 on its diagonal(Daniels and Pourahmadi, 2002; Webb and Forster, 2007)

I For y = (y1, ..., ym) ∼ N(µ,Σ), with ∆ = βΣβT , the jointdistribution for y can be expressed in a recursive form:y1 ∼ N(µ1, δ1),(yk |y1, . . . , yk−1) ∼ N(µk −

∑k−1j=1 βk ,j(yj − µj), δk ),

k = 2, ...,m→ useful for modeling longitudinal data and specifying

conditional independence assumptions

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Facilitating Identifiability

How to fix only one element of the covariance matrix?I the usual inverse-Wishart distribution will not workI square-root-free Cholesky decomposition of Σ uses the

relationship ∆ = βΣβT , with ∆ diagonal with all elementsδi > 0, and β lower triangular with 1 on its diagonal(Daniels and Pourahmadi, 2002; Webb and Forster, 2007)

I For y = (y1, ..., ym) ∼ N(µ,Σ), with ∆ = βΣβT , the jointdistribution for y can be expressed in a recursive form:y1 ∼ N(µ1, δ1),(yk |y1, . . . , yk−1) ∼ N(µk −

∑k−1j=1 βk ,j(yj − µj), δk ),

k = 2, ...,m→ useful for modeling longitudinal data and specifying

conditional independence assumptions

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Facilitating Identifiability

How to fix only one element of the covariance matrix?I the usual inverse-Wishart distribution will not workI square-root-free Cholesky decomposition of Σ uses the

relationship ∆ = βΣβT , with ∆ diagonal with all elementsδi > 0, and β lower triangular with 1 on its diagonal(Daniels and Pourahmadi, 2002; Webb and Forster, 2007)

I For y = (y1, ..., ym) ∼ N(µ,Σ), with ∆ = βΣβT , the jointdistribution for y can be expressed in a recursive form:y1 ∼ N(µ1, δ1),(yk |y1, . . . , yk−1) ∼ N(µk −

∑k−1j=1 βk ,j(yj − µj), δk ),

k = 2, ...,m→ useful for modeling longitudinal data and specifying

conditional independence assumptions

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Facilitating Identifiability

How to fix only one element of the covariance matrix?I the usual inverse-Wishart distribution will not workI square-root-free Cholesky decomposition of Σ uses the

relationship ∆ = βΣβT , with ∆ diagonal with all elementsδi > 0, and β lower triangular with 1 on its diagonal(Daniels and Pourahmadi, 2002; Webb and Forster, 2007)

I For y = (y1, ..., ym) ∼ N(µ,Σ), with ∆ = βΣβT , the jointdistribution for y can be expressed in a recursive form:y1 ∼ N(µ1, δ1),(yk |y1, . . . , yk−1) ∼ N(µk −

∑k−1j=1 βk ,j(yj − µj), δk ),

k = 2, ...,m→ useful for modeling longitudinal data and specifying

conditional independence assumptions

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Facilitating Identifiability

I here, no natural ordering is present, but theparamaterization has other useful properties which weexploit

I δ1 = Σzz

→ fix δ1, and mix on δ2, . . . , δp+1 and p(p + 1)/2 free elementsof β, denoted by vector β̃

Then the DP mixture model becomes

f (z, x ; G) =

∫Np+1(z, x ;µ, β−1∆β−T )dG(µ, β,∆)

I computationally convenient: there exist conjugate priordistributions for β̃ and δ2, ..., δp+1, which are MVN and(independent) inverse-gamma

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Facilitating Identifiability

I here, no natural ordering is present, but theparamaterization has other useful properties which weexploit

I δ1 = Σzz

→ fix δ1, and mix on δ2, . . . , δp+1 and p(p + 1)/2 free elementsof β, denoted by vector β̃

Then the DP mixture model becomes

f (z, x ; G) =

∫Np+1(z, x ;µ, β−1∆β−T )dG(µ, β,∆)

I computationally convenient: there exist conjugate priordistributions for β̃ and δ2, ..., δp+1, which are MVN and(independent) inverse-gamma

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Hierarchical Model

Blocked Gibbs sampler: truncate G to GN(·) =∑N

l=1 plδWl (·),with Wl = (µl , β̃l ,∆l), and introduce configuration variables(L1, ...,Ln) taking values in 1, ...,N.

yi |ziind∼ 1(yi=1)1(zi>0) + 1(yi=0)1(zi≤0), i = 1, . . . ,n

(zi , xi)|W ,Liind∼ Np+1((zi , xi);µLi , β

−1Li

∆Liβ−TLi

), i = 1, ...,n

Li |p ∼N∑

l=1

plδl(Li), i = 1, . . . ,n

Wl |ψind∼ Np+1(µl ; m,V )Nq(β̃l ; θ, cI)

p+1∏i=2

IG(δi,l ; νi , si), l = 1, . . . ,N

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Posterior Inference

I Gibbs sampling may be used to simulate from full posteriorp(W ,L,p, ψ, α, z|data), with the conditionally conjugatebase distribution, and conjugate priors on ψ and α.

I The posterior for GN = (p,W ) is imputed in the MCMC,enabling full inference for any functional of f (z, x ; GN), nowa finite sum

I Binary regression functional: for any covariate value x0, atiteration r of the MCMC, calculate pr(y = 1|x0; G(r)

N )

→ provides point estimate and uncertainty quantification forregression function

I Same can be done for other functionals, such as latentresponse distribution f (z|x0; GN) at any covariate value x0

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Posterior Inference

I Gibbs sampling may be used to simulate from full posteriorp(W ,L,p, ψ, α, z|data), with the conditionally conjugatebase distribution, and conjugate priors on ψ and α.

I The posterior for GN = (p,W ) is imputed in the MCMC,enabling full inference for any functional of f (z, x ; GN), nowa finite sum

I Binary regression functional: for any covariate value x0, atiteration r of the MCMC, calculate pr(y = 1|x0; G(r)

N )

→ provides point estimate and uncertainty quantification forregression function

I Same can be done for other functionals, such as latentresponse distribution f (z|x0; GN) at any covariate value x0

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Model FormulationPosterior Inference

Posterior Inference

I Gibbs sampling may be used to simulate from full posteriorp(W ,L,p, ψ, α, z|data), with the conditionally conjugatebase distribution, and conjugate priors on ψ and α.

I The posterior for GN = (p,W ) is imputed in the MCMC,enabling full inference for any functional of f (z, x ; GN), nowa finite sum

I Binary regression functional: for any covariate value x0, atiteration r of the MCMC, calculate pr(y = 1|x0; G(r)

N )

→ provides point estimate and uncertainty quantification forregression function

I Same can be done for other functionals, such as latentresponse distribution f (z|x0; GN) at any covariate value x0

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Simulated Data

I Data {(zi , xi) : i = 1, . . . ,n} was simulated from a mixtureof 3 bivariate normals, and y determined from z.

I compare inference from the binary regression model withdata (y , x) to that from model which views (z, x) as data

I a practical prior specification approach which isappropriate when little is known about the problem isapplied here

I to specify priors on ψ, consider only one mixturecomponent and use an approximate center and range ofthe data, as well as prior simulation to induce anapproximate unif(−1,1) prior on corr(z, x)

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Simulated Data

I Data {(zi , xi) : i = 1, . . . ,n} was simulated from a mixtureof 3 bivariate normals, and y determined from z.

I compare inference from the binary regression model withdata (y , x) to that from model which views (z, x) as data

I a practical prior specification approach which isappropriate when little is known about the problem isapplied here

I to specify priors on ψ, consider only one mixturecomponent and use an approximate center and range ofthe data, as well as prior simulation to induce anapproximate unif(−1,1) prior on corr(z, x)

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Simulated Data

I Data {(zi , xi) : i = 1, . . . ,n} was simulated from a mixtureof 3 bivariate normals, and y determined from z.

I compare inference from the binary regression model withdata (y , x) to that from model which views (z, x) as data

I a practical prior specification approach which isappropriate when little is known about the problem isapplied here

I to specify priors on ψ, consider only one mixturecomponent and use an approximate center and range ofthe data, as well as prior simulation to induce anapproximate unif(−1,1) prior on corr(z, x)

De Yoreo BNP Binary Regression

−2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

x

Pr(z>0|x;G)

−2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

xPr(y=1|x;G)

The inference for pr(z > 0|x ; G) (left) is compared to that forpr(y = 1|x ; G) (right) and the truth (solid line).

−4 −3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

z

f(z|x=x1)

−4 −3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

z

f(z|x=x2)

−4 −3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

z

f(z|x=x3)

z

f(z|x=x1)

−3.9 0.0 2.9

0.0

1.2

z

f(z|x=x2)

−3.9 0.0 2.9

0.0

1.2

z

f(z|x=x3)

−3.9 0.0 2.9

0.0

1.2

Top row: Inference for f (z|x0; G) under the model which views zas observed, with true densities as dashed lines, at 3 values ofx0. Bottom: Inference from the binary regression model.

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Ozone and Wind Speed

I 111 daily measurements of wind speed (mph) and ozoneconcentration (parts per billion) in NYC over 4 monthperiod

I objective: model the probability of exceeding a certainozone concentration as a function of wind speed

I the model only sees whether or not there was anexceedance, but there is an actual ozone concentrationunderlying this 0/1 value

De Yoreo BNP Binary Regression

5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

wind speed

prob

abilit

y of

ozo

ne e

xcee

denc

e

5 10 15 20

050

100

150

wind speed

ozon

e co

ncen

tratio

n

Left: The probability that ozone concentration (parts per billion)exceeds a threshold of 70 decreases with wind speed (mph).Right: For comparison, here are the actual non-discretizedozone measurements as a function of wind speed.

−3 −1 0 1 2 30.0

0.2

0.4

0.6

z

f(z|x0)

−3 −1 0 1 2 3

0.0

0.2

0.4

0.6

z

f(z|x0)

−3 −1 0 1 2 3

0.0

0.2

0.4

0.6

z

f(z|x0)

−3 −1 0 1 2 30.0

0.2

0.4

0.6

z

f(z|x0)

Estimates for f (z|x0; G) at wind speed values of 5, 8, 10, and15 mph.

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Outline

1 Introduction

2 MethodologyModel FormulationPosterior Inference

3 Data IllustrationsSimulation ExampleAtmospheric MeasurementsCredit Card Data

4 Discussion

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Simulation ExampleAtmospheric MeasurementsCredit Card Data

Credit Cards and Income

I n = 100 subjects in a study were asked whether or notthey owned a travel credit card, and their income wasrecorded (Agresti, 1996)

I In this situation, it is not clear that there is somemeaningful interpretation of the latent continuous randomvariables, but we can still use the method for regression

I Does probability of owning a credit card change withincome?

De Yoreo BNP Binary Regression

10 20 30 40 50 60 70

0.0

0.2

0.4

0.6

0.8

1.0

income in thousands

Pr(

y=1|

x;G

)

●●●●●●●●

●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●

●●●●●●●●●●

●●

● ●●●●

●● ●●●●●● ●●●●●●

●● ●● ● ●

●●●●●● ●

Probability of owning a credit card appears to increase withincome, with a slight dip or leveling off around income of 40-50,since all subjects in that region did not own a credit card.

IntroductionMethodology

Data IllustrationsDiscussion

Extensions to Ordinal Reponses

I similar methodology, wider range of applicationsI for an ordinal response with C categories, assume y = j

if-f γj−1 < z ≤ γj , for j = 1, ...C, and apply the same DPmixture of MVNs for (z, x)

I for fixed cut-off points γ, it can be shown that all of µ and Σare identifiable in the induced kernel for the observables

I the C − 1 free cut-off points can be fixed to arbitraryincreasing values (Kottas et al., 2005), which is an attributein a computational sense

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Extensions to Ordinal Reponses

I similar methodology, wider range of applicationsI for an ordinal response with C categories, assume y = j

if-f γj−1 < z ≤ γj , for j = 1, ...C, and apply the same DPmixture of MVNs for (z, x)

I for fixed cut-off points γ, it can be shown that all of µ and Σare identifiable in the induced kernel for the observables

I the C − 1 free cut-off points can be fixed to arbitraryincreasing values (Kottas et al., 2005), which is an attributein a computational sense

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Extensions to Ordinal Reponses

I similar methodology, wider range of applicationsI for an ordinal response with C categories, assume y = j

if-f γj−1 < z ≤ γj , for j = 1, ...C, and apply the same DPmixture of MVNs for (z, x)

I for fixed cut-off points γ, it can be shown that all of µ and Σare identifiable in the induced kernel for the observables

I the C − 1 free cut-off points can be fixed to arbitraryincreasing values (Kottas et al., 2005), which is an attributein a computational sense

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Extensions to Ordinal Reponses

I similar methodology, wider range of applicationsI for an ordinal response with C categories, assume y = j

if-f γj−1 < z ≤ γj , for j = 1, ...C, and apply the same DPmixture of MVNs for (z, x)

I for fixed cut-off points γ, it can be shown that all of µ and Σare identifiable in the induced kernel for the observables

I the C − 1 free cut-off points can be fixed to arbitraryincreasing values (Kottas et al., 2005), which is an attributein a computational sense

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Other Extensions

I multivariate ordinal responses: J ordinal responsesassociated with a vector of covariates for each subject;with Cj categories associated with the j th response

I several applications, but limited existing methods forflexible inference

I y and z are vectors, and yj = l if-f γj,l−1 < zj ≤ γj,l , forj = 1, ..., J, and l = 1, ...,Cj

I Cj > 2 for all j , then no identifiability restrictions neededI Cj = 2 for some j , then (β,∆) paramaterization can be

used, and fixing certain elements of δ provides thenecessary restrictions

I mixed ordinal-continuous responses

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Other Extensions

I multivariate ordinal responses: J ordinal responsesassociated with a vector of covariates for each subject;with Cj categories associated with the j th response

I several applications, but limited existing methods forflexible inference

I y and z are vectors, and yj = l if-f γj,l−1 < zj ≤ γj,l , forj = 1, ..., J, and l = 1, ...,Cj

I Cj > 2 for all j , then no identifiability restrictions neededI Cj = 2 for some j , then (β,∆) paramaterization can be

used, and fixing certain elements of δ provides thenecessary restrictions

I mixed ordinal-continuous responses

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Other Extensions

I multivariate ordinal responses: J ordinal responsesassociated with a vector of covariates for each subject;with Cj categories associated with the j th response

I several applications, but limited existing methods forflexible inference

I y and z are vectors, and yj = l if-f γj,l−1 < zj ≤ γj,l , forj = 1, ..., J, and l = 1, ...,Cj

I Cj > 2 for all j , then no identifiability restrictions neededI Cj = 2 for some j , then (β,∆) paramaterization can be

used, and fixing certain elements of δ provides thenecessary restrictions

I mixed ordinal-continuous responses

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Other Extensions

I multivariate ordinal responses: J ordinal responsesassociated with a vector of covariates for each subject;with Cj categories associated with the j th response

I several applications, but limited existing methods forflexible inference

I y and z are vectors, and yj = l if-f γj,l−1 < zj ≤ γj,l , forj = 1, ..., J, and l = 1, ...,Cj

I Cj > 2 for all j , then no identifiability restrictions neededI Cj = 2 for some j , then (β,∆) paramaterization can be

used, and fixing certain elements of δ provides thenecessary restrictions

I mixed ordinal-continuous responses

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Other Extensions

I multivariate ordinal responses: J ordinal responsesassociated with a vector of covariates for each subject;with Cj categories associated with the j th response

I several applications, but limited existing methods forflexible inference

I y and z are vectors, and yj = l if-f γj,l−1 < zj ≤ γj,l , forj = 1, ..., J, and l = 1, ...,Cj

I Cj > 2 for all j , then no identifiability restrictions neededI Cj = 2 for some j , then (β,∆) paramaterization can be

used, and fixing certain elements of δ provides thenecessary restrictions

I mixed ordinal-continuous responses

De Yoreo BNP Binary Regression

IntroductionMethodology

Data IllustrationsDiscussion

Conclusions

? Binary responses measured along with covariatesrepresents a simple setting, but the scope of problemswhich lie in this category is large.

? This framework allows flexible, nonparametric inference tobe obtained for the regression relationship in a generalbinary regression problem.

? The methodology extends easily to larger classes ofproblems in ordinal regression, including multivariateresponses and mixed responses, making the frameworkmuch more powerful, with utility in a wide variety ofapplications.

De Yoreo BNP Binary Regression

Recommended