Econometrics in Health Economics
Discrete Choice Modelingand
Frontier Modeling and Efficiency Estimation
Professor William GreeneStern School of Business
New York UniversitySeptember 2-4, 2007
Frontier and Efficiency Estimation Session 5
Efficiency Analysis Stochastic Frontier Model Efficiency Estimation
Session 6 Panel Data Models and Heterogeneity Fixed and Random Effects Bayesian and Classical Estimation
Session 7 Efficiency Models Stochastic Frontier and Data Envelopment Analysis Student Presentation: Silvio Daidone and Francesco D’Amico
Session 8: Computer Exercises and Applications
The Production Function“A single output technology is commonly
described by means of a production function f(z) that gives the maximum amount q of output that can be produced using input amounts (z1,…,zL-1) > 0.
“Microeconomic Theory,” Mas-Colell, Whinston, Green: Oxford, 1995, p. 129. See also Samuelson (1938) and Shephard (1953).
Thoughts on InefficiencyFailure to achieve the theoretical maximum Hicks (ca. 1935) on the benefits of monopoly Leibenstein (ca. 1966): X inefficiency Debreu, Farrell (1950s) on management
inefficiency
All related to firm behavior in the absence of market restraint – the exercise of market power.
A History of Empirical Investigation Cobb-Douglas (1927) Arrow, Chenery, Minhas, Solow (1963) Joel Dean (1940s, 1950s) Johnston (1950s) Nerlove (1960) Christensen et al. (1972)
Inefficiency in the “Real” WorldMeasurement of inefficiency in “markets” –
heterogeneous production outcomes: Aigner and Chu (1968) Timmer (1971) Aigner, Lovell, Schmidt (1977) Meeusen, van den Broeck (1977)
Production FunctionsProduction is a process of
transformation of a set of inputs, denoted x into a set of outputs, y
Transformation of inputs to outputs is via the transformation function: T(y,x) = 0.
K
M
Defining the Production Set
Level set:
The Production function is defined by the isoquant
The efficient subset is defined in terms of the level sets:
L .y x y x( ) = { : ( , ) is producible}
I( ) = { : L( ) and ( ) if 0 <1}.y x x y x yL
k k k j
ES( )={ : L( ) and ' L( ) for '
when k and < for some j}.
y x x y x y x
x x x x
Isoquants and Level Sets
The Distance Function
Inefficiency
Production Function Model with Inefficiency
Cost Inefficiencyy* = f(x) C* = g(y*,w)
(Samuelson – Shephard duality results)
Cost inefficiency: If y < f(x), then C must be greater than g(y,w). Implies the idea of a cost frontier.
lnC = lng(y,w) + u, u > 0.
Specification
1
121 1 1
Cobb Douglas
ln ln
Translog
ln ln ln ln
Box-Cox transformations to cope with zeros
Regularity Conditions: Monotonicity and Concavity
Translog Cost Model
ln ln
K
k kk
K K K
k k km k mk k m
k k
y x
y x x x
C w
121 1 1
L L1st21 s 1 t 1
1 1
ln ln
ln ln ln
ln ln ,
K K K
km k mk k m
L
s s s ts
K L
ks k sk s
w w
y y y
w y
Corrected Ordinary Least Squares
Modified OLSAn alternative approach that requires a parametric model of
the distribution of ui is modified OLS (MOLS). The OLS residuals, save for the constant displacement, are pointwise consistent estimates of their population counterparts, - ui. suppose that ui has an exponential distribution with mean λ. Then, the variance of ui is λ2, so the standard deviation of the OLS residuals is a consistent estimator of E[ui] = λ. Since this is a one parameter distribution, the entire model for ui can be characterized by this parameter and functions of it. The estimated frontier function can now be displaced upward by this estimate of E[ui].
COLS and MOLS
Deterministic Frontier: Programming Estimators
Estimating Inefficiency
Statistical Problems with Programming Estimators
They do correspond to MLEs. The likelihood functions are “irregular” There are no known statistical properties – no
estimable covariance matrix for estimates. They might be “robust,” like LAD. Noone knows
for sure. Never demonstrated.
A Model with a Statistical Basis
i
K Kki kiik ki ik=1 k=1
PP-1 -θuii i
i i1 1
Gamma Frontier Model (Greene (1980)
lny = α + + = α + - u β βεx x
θ h(u) = , u 0, θ > 0, P > 2u eΓ(P)
ln ( , , , ) ln ln ( ) ( 1) lnu u
N N
i iL P P N P P
Τ
i i i u =α+β x - y >0
Virtues : Known statistical properties, regular likelihood, etc.
Flaws: Completely unwieldy, impractical. (Nonetheless, was
used in several empirical studies.)
Extensions Cost frontiers, based on duality results:
ln y = f(x) – u ln C = g(y,w) + u’
u > 0. u’ > 0. Economies of scale and
allocative inefficiency blur the relationship. Corrected and modified least squares estimators based
on the deterministic frontiers are easily constructed.
Data Envelopment Analysis
Methodological Problems Measurement error Outliers Specification errors The overall problem with the deterministic
frontier approach
Stochastic Frontier Models Motivation:
Factors not under control of the firm Measurement error Differential rates of adoption of technology
frontier is randomly placed by the whole collection of stochastic elements which might enter the model outside the control of the firm.
Aigner, Lovell, Schmidt (1977), Meeusen, van den Broeck (1977)
Stochastic Frontier Model ( )
ln +
= + .
iviii
i i ii
i i
= fy eTE
= + v uy
+
x
x
x
ui > 0, but vi may take any value. A symmetric distribution, such as the normal distribution, is usually assumed for vi. Thus, the stochastic frontier is
+’xi+vi
and, as before, ui represents the inefficiency.
Least Squares EstimationAverage inefficiency is embodied in the third
moment of the disturbance εi = vi - ui.
So long as E[vi - ui] is constant, the OLS estimates of the slope parameters of the frontier function are unbiased and consistent. (The constant term estimates α-E[ui]. The average inefficiency present in the distribution is reflected in the asymmetry of the distribution, which can be estimated using the OLS residuals:
3
1
1 ˆˆ( - [ ])N
N
3 i ii
= Em
Application to Spanish Dairy Farms
Input Units Mean Std. Dev.
Minimum
Maximum
Milk Milk production (liters)
131,108 92,539 14,110 727,281
Cows # of milking cows 2.12 11.27 4.5 82.3
Labor
# man-equivalent units
1.67 0.55 1.0 4.0
Land Hectares of land devoted to pasture and crops.
12.99 6.17 2.0 45.1
Feed Total amount of feedstuffs fed to dairy cows (tons)
57,941 47,981 3,924.1 376,732
N = 247 farms, T = 6 years (1993-1998)
Example: Dairy Farms
EI
.56
1.13
1.69
2.25
2.81
.00-.500 -.250 .000 .250 .500 .750-.750
Kernel density estimate for EI
Den
sit
y
The Normal-Half Normal Model
Normal-Half Normal Variable
Decomposition
Standard Form
Estimation: Least Squares/MoM OLS estimator of β is consistent E[ui] = (2/π)1/2σu, so OLS constant
estimates α+ (2/π)1/2σu
Second and third moments of OLS residuals estimate
and 0
2 2 32 u v 3 u
- 2 2 4 = + = 1 - m m
A Problem with Method of Moments
Estimator of σu is [m3/-.21801]1/3
Theoretical m3 is < 0
Sample m3 may be > 0. If so, no solution for σu . (Negative to 1/3 power.)
Likelihood Function
Waldman (1982) result on skewness of OLS residuals: If the OLS residuals are positively skewed, rather than negative, then OLS maximizes the log likelihood, and there is no evidence of inefficiency in the data.
Alternative Model: Exponential
Normal-Exponential Likelihood
2 2n
ui=1
Ln ( ; ) =
(( ) / ( )1-ln ln
2
v u
u i i v u i i
v v u
L data
v u v u
Truncated Normal Model
Normal-Truncated Normal
Other Models Other Parametric Models (we will examine
gamma later in the course) Semiparametric and nonparametric – the
recent outer reaches of the theoretical literature
Other variations including heterogeneity in the frontier function and in the distribution of inefficiency
Estimating ui
No direct estimate of ui
Data permit estimation of yi – β’xi. Can this be used? εi = yi – β’xi = vi – ui
Indirect estimate of ui, using E[ui|vi – ui]
vi – ui is estimable with ei = yi – b’xi.
Fundamental Tool - JLMS
2
( )[ | ] ,
1 ( )it it
it it it itit
E u
We can insert our maximum likelihood estimates of all parameters.
Note: This estimates E[u|vi – ui], not ui.
Other Distributions
2 2
2
2
( / )| = + , = - /
( / )
For the Normal- Truncated Normal Model
For the Normal-Exponential Model
i u vi
vitit it it v it it v u
vit
zE u z z
z
Efficiency
** 2* *
***
2 2* 2 2 2 u v
i u * 2
1
2
[( / ) ][exp( ) | ] exp
[( / )]
= + / and
ii i i
i
i
E u
where
For the Normal- Truncated Normal Model
For the normal-half normal model, = 0.
Application: Electricity Generation
Estimated Translog Production Frontiers
Inefficiency Estimates
Estimated Inefficiency Distribution
TRNCNRML
1.86
3.72
5.59
7.45
9.31
.00.10 .20 .30 .40 .50.00
Kernel dens ity estimate for TRNCNRML
De
ns
ity
Confidence Region
Application (Based on Costs)Horrace/Schmidt Confidence Bounds for Cost Efficiency
FIR M
.724
.798
.872
.946
1.020
.65025 50 75 100 1250
E FFN E FFU P P E RE FFLO W E R
Ee(-u|e
)
Multiple Output Frontier The formal theory of production departs from the
transformation function that links the vector of outputs, y to the vector of inputs, x;
T(y,x) = 0. As it stands, some further assumptions are
obviously needed to produce the framework for an empirical model. By assuming homothetic separability, the function may be written in the form
A(y) = f(x).
Multiple Output Production Function
1/ qT
1x
M q qm i,t,m it it itmy v u
Inefficiency in this setting reflects the failure of the firm to achieve the maximum aggregate output attainable. Note that the model does not address the economic question of whether the chosen output mix is optimal with respect to the output prices and input costs. That would require a profit function approach. Berger (1993) and Adams et al. (1999) apply the method to a panel of U.S. banks – 798 banks, ten years.
Duality Between Production and Cost
T( ) = min{ : ( ) }C y, f yw w x x
Implied Cost Frontier Function
Stochastic Cost Frontier
Cobb-Douglas Cost Frontier
Translog Cost Frontier
2 21 1 1kl yy2 2 2
Cost frontier with K variable inputs, one fixed input (F) and
output, y.
ln ln ln ln
ln ln ln ln
ln ln ln ln
F Kk=1 k k F y
K Kk=1 l=1 k l FF
K Kk=1 kF k k=1 ky k
C w F y
w w F y
w F w y
K
k=1k
ln ln
Cost functions fit subject to theoretical homogeneity in prices
lnCrestriction: 1. Imposed by dividing C and all but
lnw
one of the input prices by the "last" (numeraire) price.
Fy i iF y v u
Restricted Translog Cost Function
212
2 21 12 2
ln ln ln ln ln
ln ln ln ln
ln ln ln l
K L y yy
KK LL KL
yK yL
C PK PLy y
PF PF PF
PK PL PK PL
PF PF PF PF
PKy y
PF
nPL
v uPF
Cost Application to C&G Data
Estimates of Economic Efficiency
Duality – Production vs. Cost
Multiple Output Cost Frontier
1 1 15
4
15
1ln ln ln ln
2
ln second order terms + ...
M M M
my m lm l mm l m
kkk
Cy y y
w
wv u
w
Allocative Inefficiency and Economic Inefficiency
Technical inefficiency: Off the isoquant.
Allocative inefficiency: Wrong input mix.
Cost Structure – Demand System
Cost Function
Cost = f(output, input prices) = C(y, )
Shephard's Lemma Produces Input Demands
C*(y, ) = Cost minimizing demands =
w
x ww
Cost Frontier Model
k kk
k
Stochastic cost frontier
lnC(y, ) = g(lny,ln ) + v + u
u = cost inefficiency
Factor demands in the form of cost shares
lnC(y, )s h(lny,ln ) + e
lnw
e allocative inefficiency
w w
ww
The Greene Problem Factor shares are derived from the cost function by
differentiation. Where does ek come from? Any nonzero value of ek, which can be positive or
negative, must translate into higher costs. Thus, u must be a function of e1,…,eK such that ∂u/∂ek > 0
Noone had derived a complete, internally consistent equation system the Greene problem.
Solution: Kumbhakar in several recent papers. Very complicated – near to impractical Apparently not of interest to practitioners
Observable Heterogeneity As opposed to unobservable
heterogeneity Observe: Y or C (outcome) and X or w
(inputs or input prices) Firm characteristics z. Not production or
cost, characterize the production process. Enter the production or cost function? Enter the inefficiency distribution? How?
Shifting the Outcome Function
ln f( , ) ( , ) ( )x zit it it it ity g h t v u
Firm specific heterogeneity can also be incorporated into the inefficiency model as follows: This modifies the mean of the truncated normal distribution
yi = xi + vi - ui
vi ~ N[0,v2]
ui = |Ui| where Ui ~ N[i, u2], i = 0 + 1zi,
Heterogeneous Mean
Estimated Efficiency
One Step or Two Step2 Step: Fit Half or truncated normal model, compute JLMS
ui, regress ui on zi
Airline EXAMPLE: Fit model without POINTS, LOADFACTOR, STAGE
1 Step: Include zi in the model, compute ui including zi
Airline example: Include 3 variables
Methodological issue: Left out variables in two step approach.
WHO Health Care Study
Application: WHO Data
One vs. Two Step
Unobservable Heterogeneity Parameters vary across firms
Random variation (heterogeneity, not Bayesian) Variation partially explained by observable indicators
Continuous variation – random parameter models: Considered with panel data models
Latent class – discrete parameter variation
A Latent Class Model
Latent Class ApplicationBanking Costs
Heteroscedasticity in v and/or uVar[vi | hi] = v
2gv(hi,) = vi2
gv(hi,0) = 1,
gv(hi,) = [exp(Thi)]2
Var[Ui | hi] = u2gu(hi,)= ui
2
gu(hi,0) = 1,
gu(hi,) = [exp(Thi)]2
Application: WHO Data
A “Scaling” Model
i i i
i
0
1
2
u ( , u * where f(u *) does not involve
Scales both mean and variance of u
Ln ( , , , , ) = -(N/2) ln 2 - ln + ln ( / ) +
1 ln
2
i i
N
i i uii
i i i
i i i
h
L
z z
1
2
exp( ),
exp( ),
exp( ),
/ ,
N i i
ii
i i
ui u i
vi v i
i ui vi
i vi
z
z
z
2 ui
Model Extensions Simulation Based Estimators
Normal-Gamma Frontier Model Bayesian Estimation of Stochastic Frontiers
Similar Model Structures Similar Estimation Methodologies Similar Results
Normal-GammaVery flexible model. VERY difficult log likelihood function.
Bayesians love it. Conjugate functional forms for other model parts
Normal-Gamma Model1( ) exp( / ) , 0, 0
( )
PPu
u i i u i if u u u u PP
2 21
ln ln ( ) ln ( 1, )
Ln ( ) = .- /1+ ln +
2
u i
N
v u i v i v u i
u v u
P P q P
L
i( , ) | > 0, ,riq r = E z z z ~ N[-i + v
2/u, v2].
q(r,εi) is extremely difficult to compute
Normal-Gamma
P u P 1
2
21i2
Gamma Frontier Model
Deterministic Frontier
y = x' - u
f(u) = [ / (P)]e u , u 0
Stochastic Frontier
y = x' + v - u = x' +
f(v) = N[0, ]
LogL=N[Pln + ln (P)] ln
N ii 1
P 1 i
0N 2i ii=1
i
0
z1z dz
+ ln , z1
dz
Simulating the Likelihood
2 2
111
1
- /1ln ln ( )+ ln +
2Ln ( ) = .
1ln ( (1 ) ( / )
v i v u iu
N u v uS v u i
PQ
i v iq iq i vq
P P
L
F FQ
i = yi - Txi, i = -i - v2/u, = v, and PL = (-i/) and Fq is a draw from the
continuous uniform(0,1) distribution.
Application to C&G Data
This is the standard data set for developing and testing Exponential, Gamma, and Bayesian estimators.
Application to C&G Data
Bayesian Estimation Short history – first developed post 1995 Range of applications
Largely replicated existing classical methods Recent applications have extended received
approaches Common features of the application
Bayesian Formulation of SF Model
2 2N
i=1
-(( ) / )1Ln ( ; ) = ln + ln +
2v i i v u i i
v u uu v u
v u v uL data
Normal – Exponential Model
vi – ui = yi - - Txi.
Estimation proceeds (in principle) by specifying priors over = (,,v,u), then deriving inferences from the joint posterior p(|data). In general, the joint posterior for this model cannot be derived in closed form, so direct analysis is not feasible. Using Gibbs sampling, and known conditional posteriors, it is possible use Markov Chain Monte Carlo (MCMC) methods to sample from the marginal posteriors and use that device to learn about the parameters and inefficiencies. In particular, for the model parameters, we are interested in estimating E[|data], Var[|data] and, perhaps even more fully characterizing the density f(|data).
Estimating Inefficiency One might, ex post, estimate E[ui|data]
however, it is more natural in this setting to include (u1,...,uN) with , and estimate the conditional means with those of the other parameters. The method is known as data augmentation.
Priors Over Parameters
v v
P 1
u
Diffuse priors are assumed for all of these
p( , ) Uniform over the real "line" so p(..)=1
p(1/ ) Gamma(1/ | ,P )
= exp (1/ ) (1/ ) , 1/ 0(P )
p( ) exp( )
v
v
u
v v
Pv
v v v vv
Pv
u uuP
1, 0.vPu u
Priors for Inefficiencies
Posterior
Gibbs Sampling: Conditional Posteriors
Bayesian Normal-Gamma Model Tsionas (2002)
Erlang form – Integer P “Random parameters” Applied to C&G
River Huang (2004) Fully general Applied (as usual) to C&G
Bayesian and Classical Results
Methodological Comparison Bayesian vs. Classical
Interpretation Practical results: Bernstein – von Mises Theorem in
the presence of diffuse priors Kim and Schmidt comparison (JPA, 2000) Important difference – tight priors over u i in this
context. Conclusions?