1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy

A Prediction Interval for theMisclassification Rate

E.B. Laber

S.A. Murphy

Outline

– Review– Three challenges in constructing PIs– Combining a statistical approach with a

learning theory approach to constructing PIs– Relevance to confidence measures for the value

of a dynamic treatment regime.

Review– X is the vector of features in Rq, Y is the binary

label in {-1,1}– Misclassification Rate:

– Data: N iid observations of (Y,X)

– Given a space of classifiers, , and the data, use some method to construct a classifier,

– The goal is to provide a PI for

Review

– Since the loss function is not smooth, one commonly uses a smooth surrogate loss to estimate the classifier

– Surrogate Loss: L(Y,f(X))

ReviewGeneral approach to providing a PI:

– We estimate using the data, resulting in

– Derive approximate distribution for

– Use this approximate distribution to construct a prediction interval for

Review

A common choice for is the resubstitution error or training error:

evaluated at e.g. if

Three challenges

1) is too large leading to over-fitting and

(negative bias)

2) is a non-smooth function of f.

3) may behave like an extreme quantity

No assumption that is close to optimal.

A Challenge

2) is non-smooth.

Example: The unknown optimal classifier has quadratic decision boundary. We fit, by least squares, a linear decision boundary

f(x)= sign(β0 + β1 x)

Density of

Three Point Dist. (n=30)

Three Point Dist. (n=100)

Bias of Common on Three Point Example

Coverage of Bootstrap PI in Three Point Example (goal =95%)

Coverage of Correctly Centered Bootstrap PI (goal= 95%)

Sample Size

Bootstrap Percentile

CUD-Bound

30 .79 .75 .97

50 .79 .62 .97

100 .78 .46 .96

200 .78 .35 .96

Coverage of 95% PI (Three Point Example)

PIs from Learning Theory

Given a result of the form: for all N

where is known to belong to and

forms a conservative 1-δ PI:

Combine statistical ideas with learning theory ideas

Construct a prediction interval for

where is chosen to be small yet contain

---from this PI deduce a conservative PI for

---use the surrogate loss to perform estimation and to construct

--- should contain all that are close to

--- all f for which

--- is the “limiting value” of ;

Prediction Interval

BootstrapWe use bootstrap to obtain an estimate of an

upper percentile of the distribution of

to obtain bU; similarly for bL. The PI is then

Implementation

• Approximation space for the classifier is linear:

• Surrogate loss is least squares:

Implementation

becomes

Implementation

• Bootstrap version:

• denotes the expectation for the bootstrap

distribution

Cud-Bound Level Sets (n=30)

Three Point Dist.

Computational Issues

• Partition Rq into equivalence classes defined by the 2N possible values of the first term.

• Each equivalence class, can be written as a set of β satisfying linear constraints.

• The first term is constant on

Computational Issues

can be written as

since g is non-decreasing.

Computational Issues• Reduced the problem to the computation of

at most 2N mixed integer quadratic programming problems.

• Using commercial solvers (e.g. CPLEX) the CUD bound can be computed for moderately sized data sets in a few minutes on a standard desktop (2.8 GHz processor 2GB RAM).

Comparisons, 95% PI

Data CUD BS M Y

Magic 1.0 .92 .98 .99

Mamm. 1.0 .68 .43 .98

Ion. 1.0 .61 .76 .99

Donut 1.0 .88 .63 .94

3-Pt .97 .83 .90 .75

Balance .95 .91 .61 .99

Liver 1.0 .96 1.0 1.0

Sample size = 30 (1000 data sets)

Comparisons, Length of PI

Sample size=30 (1000 data sets)

Data CUD BS M Y

Magic .60 .31 .28 .46

Mamm. .46 .53 .32 .42

Ion. .42 .43 .30 .50

Donut .47 .59 .32 .41

3-Pt .38 .48 .32 .46

Balance .38 .09 .29 .48

Liver .62 .37 .33 .49

Intuition In large samples

behaves like

Intuition The large sample distribution is the same as the

distribution of

Intuition If then the distribution is approximately that of a

(limiting distribution for binomial, as expected).

Intuition If

the distribution is approximately that of

Discussion• Further reduce the conservatism of the CUD-

bound.– Replace by other quantities. – Other surrogates (exponential, logit, hinge)

• Construct a principle for minimizing the length of the conservative PI?

• The real goal is to produce PIs for the Value of a policy.

The simplest Dynamic treatment regime (e.g. policy) is a decision rule if there is only one stage of treatment

1 Stage for each individual

Observation available at jth stage

Action at jth stage (usually a treatment)

Primary Outcome:

Construct decision rules that input patient information and output a recommended action; these decision rules should lead to a maximal mean Y.

In future one selects action:

Single Stage

• Find a confidence interval for the mean outcome if a particular estimated policy (here one decision rule) is employed.

• Treatment A is randomized in {-1,1}.

• Suppose the decision rule is of form

• We do not assume the optimal decision boundary is linear.

Single Stage

Mean outcome following this policy is

is the randomization probability

STAR*D "Sequenced Treatment to Relieve Depression"

Preference Treatment Intermediate Preference Treatment Two Outcome Three

Follow-up

CIT + BUS Remission L2-Tx +THY

Augment R Augment R

CIT + BUP-SR L2-Tx +LI

CIT Non-remission

Bup-SR MIRTSwitch Switch

R RVEN

SER NTP

Oslin ExTENd

Late Trigger forNonresponse

8 wks Response

TDM + Naltrexone

CBIRandom

assignment:

CBI +Naltrexone

Nonresponse

Early Trigger for Nonresponse

Randomassignment:

Naltrexone

8 wks Response

Randomassignment:

CBI +Naltrexone

TDM + Naltrexone

Naltrexone

Nonresponse

This seminar can be found at:

http://www.stat.lsa.umich.edu/~samurphy/

seminars/ColumbiaBioStat02.09.ppt

Email Eric or me with questions or if you would like a copy of the associated paper:

laber@umich.edu or samurphy@umich.edu

Bias of Common on Three Point Example

IntuitionConsider the large sample variance of

Variance is

if in place of we put where is close to 0

then due to the non-smoothness in

at we can get jittering.

1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy

Documents

Dipl. Inf.-wirt Olaf Laber Director Business …...Dipl. Inf.-wirt Olaf Laber Director Business Development, Ingres Corp. Ingres Facts: 2nd largest independent Open Source Company

1.Umoren E.B et al.pdf

A misclassification hazard - AABRI Home Page · 2015-03-18 · A misclassification hazard A misclassification hazard Chelsea Schrader Morgan State University ... independent contractor

Exempt Employee Determinations and Misclassification of Workers

Exempt or Non-Exempt? Employee Misclassification Challengesmedia.straffordpub.com/products/exempt-or-non-exempt-employee... · Exempt or Non-Exempt? Employee Misclassification Challenges

Independent Contractor Misclassification – A Problem for ... Briefing 9-17-15 PPT.pdfthe “growing problem” of independent contractor misclassification • September 2011 - DOL

Laber Architects · Laber Architects 3 4 Laber Architects Frampton Residence With Duane Piper 2007, Hualalai Resort, Hawaii. Collaborators included Duane Piper / Piper Design; Andrea

Minimizing Misclassification of Damage using Extreme Value ...minoe.stanford.edu/publications/Hyunwoo_Park/US... · Minimizing Misclassification of Damage using Extreme Value Statistics

Worker Misclassification: Recent Trends in Independent ...media.straffordpub.com/products/worker-misclassification-recent... · Worker Misclassification: Recent Trends in Independent

Matters Settled But Not Resolved: Worker Misclassification

Confounding and Misclassification

E.B. Stanley Middle School

Projetoe Análisede Algoritmos - PUC-Riolaber/01intro.pdf · Projetoe Análisede Algoritmos Prof. Eduardo Laber laber laber@inf.puc-rio.br. Introduction Jeff Edmonds York University

E.B. Price 1,3 , P. Bu 2,3 , P. Sethupathi 5 , E.B. Stubbs, Jr. 2,3 , J.I. Perlman 1,3,4

UI Tax Audits & Employee Misclassification

Rectification of misclassification value field)

E.B. White the Essayist

Brochure Productos E.B - EcoBikes

Exploited laber in us and global

Randomized Algorithms Eduardo Laber Loana T. Nogueira