Measuring technical and allocative inefficiency in the translog cost system: a Bayesian approach

ARTICLE IN PRESS

Journal of Econometrics 126 (2005) 355–384

0304-4076/$ -

doi:10.1016/j

$This pap

Measuremen�CorrespoE-mail ad

www.elsevier.com/locate/econbase

Measuring technical and allocative inefficiency inthe translog cost system: a Bayesian approach$

Subal C. Kumbhakara,�, Efthymios G. Tsionasb

aDepartment of Economics, State University of New York, Binghamton, NY 13902, USAbDepartment of Economics, Athens University of Economics and Business, 76 Patission Street,

104 34 Athens, Greece

Available online 1 July 2004

Abstract

In this paper, we propose simulation-based Bayesian inference procedures in a cost system

that includes the cost function and the cost share equations augmented to accommodate

technical and allocative inefficiency. Markov chain Monte Carlo techniques are proposed and

implemented for Bayesian inferences on costs of technical and allocative inefficiency, input

price distortions and over- (under-) use of inputs. We show how to estimate a well-specified

translog system (in which the error terms in the cost and cost share equations are internally

consistent) in a random effects framework. The new methods are illustrated using panel data

on U.S. commercial banks.

r 2004 Elsevier B.V. All rights reserved.

JEL classification: C11; C13

Keywords: Technical efficiency; Translog cost function system; Markov chain Monte Carlo techniques;

Panel data; Nonlinear random effect models and commercial banks

see front matter r 2004 Elsevier B.V. All rights reserved.

.jeconom.2004.05.006

er was presented at the conference on ‘‘Current Developments in Productivity and Efficiency

t,’’ University of Georgia, October 25–26, 2002.

nding author. Tel.: +1-607-777-4762; fax: +1-607-777-2681.

dresses: [email protected] (S.C. Kumbhakar), [email protected] (E.G. Tsionas).

www.elsevier.com/locate/econbase

ARTICLE IN PRESS

S.C. Kumbhakar, E.G. Tsionas / Journal of Econometrics 126 (2005) 355–384356

1. Introduction

Empirical estimation of efficiency in the stochastic frontier (SF) models (developedby Aigner et al., 1977; Meeusen and van den Broeck, 1977) involve estimation of aparametric production/cost/profit function with a composed error term consisting ofa two-sided disturbance term that reflects exogenous shocks and a one-sided termthat captures technical inefficiency.1 Although the theory is well developed toestimate a system of equations either in the form of factor demand or cost functionand cost share equations, the system approach is rarely applied in the efficiencyliterature.2 The reason is that the error structure comprising noise, technical andallocative inefficiency complicates econometric estimation of the model. This isespecially the case when one uses flexible functional forms to represent theunderlying technology.3 Joint estimation of technical and allocative inefficiency in atranslog cost function presents a difficult problem (Greene, 1980).4 The difficulty isthat the cost function and the deviations of optimal shares from observed shares arecomplicated functions of allocative inefficiency. Although many attempts have beenproposed, none has been entirely successful. Recently, Kumbhakar (1997) proposeda solution for the Greene problem using a translog cost system, but empiricalestimation of this model has been restricted to panel data models in which technicaland allocative inefficiency are either assumed to be fixed parameters or functions ofthe data and unknown parameters (Atkinson and Cornwell, 1994; Maietta, 2002). Inthis paper, we show that relatively simple econometric tools can be used to estimatetechnical and allocative inefficiency and perform exact inference in this modelwithout assuming technical and allocative inefficiency as fixed parameters. Thus, themain contribution of the paper is to show how to estimate a well-specified translogsystem (in which the error terms in the cost and cost share equations are internallyconsistent) in a random effects framework.More specifically, here we consider a Bayesian approach to address the Greene

problem. Bayesian analysis of a stochastic frontier function was first proposed byvan den Broeck et al. (1994). The Gibbs sampler has been proposed as an effectivenumerical technique by Koop et al. (1995) where it is shown that Gibbs sampling hasan advantage over importance sampling. Koop et al. (1997) proposed measuringtechnical inefficiency in panel data models where technical inefficiency is time-invariant. Fernandez et al. (2000) considered Bayesian estimation of a system ofequations involving a multi-output production function without an explicitbehavioral assumption (such as cost minimization or revenue/profit maximization).

1For a review of the efficiency literature see Bauer (1990), Greene (1993, 2001), Kumbhakar and Lovell

(2000), and Koop and Steel (2001).2On the contrary, estimation of a cost system is a common practice when measurement of input

elasticities, returns to scale, productivity growth, etc., are sought (see, for example, Christensen and

Greene, 1976; Diewert and Wales, 1987).3System approach with self-dual production function is used in Schmidt and Lovell (1979), Kumbhakar

(1987), among many others.4It is now labeled in the literature as the Greene problem (see Bauer, 1990). For a simpler functional

form such as the Cobb–Douglas it is not a problem (see Schmidt and Lovell, 1979).

ARTICLE IN PRESS

S.C. Kumbhakar, E.G. Tsionas / Journal of Econometrics 126 (2005) 355–384 357

Here we consider a system approach that is derived from a translog cost functionand the cost share equations. Thus, a cost minimization assumption is formallyintroduced in our model. We propose a Bayesian approach to estimate the translogcost system with only technical inefficiency, first. This model is different from thesingle-equation cost function model of Koop et al. (1997). We then consider the costsystem, in which both technical and allocative inefficiency are present. Although theformer model is nested in the latter, estimation of the latter model is not a trivialextension of the former. Specialized numerical methods are needed to provideparameter inferences and measures of technical and allocative inefficiency.We show that numerical analysis of the model from the Bayesian perspective can

be facilitated using Markov chain Monte Carlo (MCMC) procedures. Posterioranalysis of the model resembles many features of the standard posterior analysis inthe context of multivariate regression models. Exact finite sample posteriordistributions are provided without resorting to asymptotic approximations. Toaccount for the parametric restrictions across equations, we construct a semi-informative prior that allows for differing degrees of ‘‘correctness’’ of therestrictions. We also provide tools for efficiency measurement in both with andwithout allocative inefficiency models. Allocative inefficiency is modeled via pricedistortions from which inferences are drawn on input over- (under-) use. In otherwords, we draw (firm-specific) inferences on both price distortions and input over-(under-) use along with technical efficiency. The new methods are illustrated usingpanel data on U.S. commercial banks.The remainder of the paper is organized as follows. The model with only technical

inefficiency is developed in Section 2. This is followed by the model in which bothtechnical and allocative inefficiency are modeled jointly. Section 4 deals with priorspecification. The U.S. commercial banking data and empirical results are discussedin Section 5 while Section 6 concludes the paper.

2. A model with only technical inefficiency

We begin with a cost minimizing behavior where firms are allocatively efficient.Assuming that panel data are available and technical inefficiency is time-invariant,the cost system can be written as (Kumbhakar and Lovell, 2000, p. 155)

ln Cait ¼ ln C0

it þ vit þ ui; i ¼ 1; . . . ; n; t ¼ 1; . . . ;T ð1Þ

Saj;it ¼ S0

j;it þ vj;it; j ¼ 2; . . . ;M ð2Þ

where Cait is the actual/observed cost of firm i in year t; Sa

j;it is the observed cost shareof input j ðj ¼ 1; . . . ;MÞ;C0

it is the cost frontier (cost without technical and allocativeinefficiencies) and S0

j;it is the frontier cost share5 of input j; vit are the noisecomponents, and uiX0 is time-invariant technical inefficiency, which can beinterpreted as the percentage increase in cost due to technical inefficiency.

5One cost share equation is dropped to avoid a singularity problem.

ARTICLE IN PRESS


van den Broeck et al. (1994) and Koop et al. (1997) considered the cost functionwith time-invariant inefficiency that is modeled above in a Bayesian framework.However, they used a single-equation approach and focused on estimating technicalefficiency from the cost function alone. Another feature of the model (in (1) and (2))is that it resembles a seemingly unrelated regression (SUR) model. A carefulexamination of the model reveals that it is also different from both the Fernandez etal. (2000) model and the SUR model. We extend the Koop et al. model to a systemand the SUR model to accommodate technical inefficiency. Neither the techniqueproposed by Koop et al. (1997) and Fernandez et al. (2000) nor the standardBayesian SUR technique (Griffiths, 2003) can be applied to estimate the modelproposed above. Fernandez et al. (2000) present a system of equations associatedwith the distance function but the formulation is ad hoc. Also, the numericaltechniques presented here are different from those in Fernandez et al. (2000).We rewrite the above cost system in a generic form (which is a panel version of the

SUR equation system extended to include time-invariant technical inefficiency)

y1 ¼ X 1b1 þ v1 þ u � 1T

y2 ¼ X 2b2 þ v2

..

.

yM ¼ X MbM þ vM

ð3Þ

where ym is an nT � 1 vector of observations6 for the mth dependent variable ðm ¼

1; . . . ;MÞ;X m is an nT � km matrix of observations for the explanatory variables inthe mth equation, bm is a km � 1 parameter error, vm is an nT � 1 random vector, u isa n � 1 non-negative random vector representing time invariant technicalinefficiency, and 1T is a T � 1 unit vector. Thus, n is the number of firms andeach of these firms is observed for T time periods. The first equation in (3) is thetranslog cost function, and the remaining M � 1 equations are the associated costshare equations. We rewrite (3) as

y ¼ Xbþ v þu � 1T

0nTðM�1Þ

� �ð4Þ

where 0nTðM�1Þ is an nTðM � 1Þ � 1 vector of zeros, and the notations y and X areobvious. Regarding stochastic components we assume that

(i)

6It

invar7O

of th

v � NnTM ð0nTM ;S� InT Þ, where S is an M � M contemporaneous covariancematrix;

(ii)
ui � IN ð0; s2uÞ; uiX0 ði ¼ 1; . . . ; nÞ, i.e., u follows a half-normal distribution;7
(iii)
v and u are mutually independent, as well as independent of X.
is straightforward to accommodate unbalanced panels. We assume technical inefficiency is time

iant.

ther distributions such as the exponential, truncated normal and gamma could be used. The relevance

ese distributions in practical applications is an issue worth exploring in future research.

ARTICLE IN PRESS


With the above distributional assumptions the likelihood function of the model in(4) is given by

Lðb;S�1; su; y;X Þ /

ZRn

þ

jS�1jn=2 exp �1

2tr Aðb; uÞS�1

� �pðu jsuÞdu ð5Þ

where

pðu jsuÞ ¼p2s2u

� ��n=2exp �

1

2s2uu0u

� �; u 2 Rn

þ ð6Þ

is the joint density function of u from (ii) and

Aðb; uÞ ¼

ðy1 � X 1b1 � u � 1T Þ0ðy1 � X 1b1 � u � 1T Þ ðy1 � X 1b1 � u � 1T Þ

0ðyM � X MbM Þ

..

. ...

ðyM � X MbMÞ0ðy1 � X 1b1 � u � 1T Þ ðyM � X MbMÞ

0ðyM � X MbM Þ

2664

3775:ð7Þ

For a Bayesian analysis we need to choose the prior density function of theparameters, viz., pðb;S�1;suÞ. Here we choose the following conditional structure:

pðb;S�1;suÞ / pðb jS�1;suÞpðS�1ÞpðsuÞ / pðbÞpðS�1ÞpðsuÞ ð8Þ

where

pðsuÞ / s�ðnþ1Þu exp �

q

2s2u

� �; nX0; qX0 ð9Þ

pðS�1Þ / S�1�� ½nS�ðMþ1Þ�=2

expð�12

tr ASS�1Þ: ð10Þ

In (8) we assume, a priori, that b; S and su are independent. The prior on s2u in (9) isinverted gamma. The prior for S�1 in (10) is Wishart with parameters nS and AS. Itreduces to the diffuse prior used by Zellner (1971, p. 242) when nS ¼ 0 andAS ¼ 0M�M . Regarding the prior on b; pðbÞ, we choose a form that can imposelinear restrictions among the elements of b (that are derived from mathematicalproperties of the cost function). A suitable candidate for this is the semi-informativeprior of Geweke (1993), i.e.,

Gb � Nqðg;HÞ ð11Þ

where G is a q � k matrix (where k ¼PM

m¼1 km is the total number of parameters andthe rank of G ¼ q), g is a q � 1 vector, and H is a q � q matrix whose inverse exists.When H ! 0q�q, the prior in (11) allows exact imposition of the q linear restrictions.This is, indeed, the case here because the cost function we are estimating satisfiessome mathematical properties that are exact (see, for example, Diewert (1982) for theproperties of the cost function). As the elements of H diverge from 0q�q, the priorbecomes increasingly vague. As kHk ! 1, the prior becomes improper.

ARTICLE IN PRESS


2.1. Bayesian inference

In sampling theory one starts computing the multiple integral in (5) and maximizesthe likelihood function with respect to the parameters. Although the multipleintegral in the likelihood function (5) can be computed analytically, it is likely thatthe log-likelihood function will be prone to numerical problems.The same problems will be encountered in the Bayesian analysis of the posterior

density function when the latent variables u are explicitly integrated out. Thisapproach is called data augmentation. To get around such problems, we consider theposterior density function (the product of (5) and (8)) augmented by the latentinefficiency variables u:

pðb;S�1;su; u j y;X Þ / jS�1jðn�ðMþ1ÞÞ=2 exp �1

2tr Aðb; uÞS�1

� �

�s�ðnþnþ1Þu exp �

q þ u0u

2s2u

� �pðbÞpðS�1ÞpðsuÞ: (12)

In (12) the latent variables u are treated as parameters in order to avoid thecomplicated likelihood function or posterior density function. This procedurefacilitates considerably the use of MCMC techniques that will be used to performinferences for this model, as we explain later. In the sampling-theory framework, thisconstruction can be used to implement an EM algorithm.Numerical Bayesian inference is performed using MCMC techniques,8 especially

the Gibbs sampler. The philosophy of the method is simple. Given a posteriordensity function pðy jY Þ where y ¼ ½y1; . . . ; yp�

0 is the parameter vector, the objectiveis to simulate random draws fyðsÞ; s ¼ 1; . . . ;Sg from the posterior. Once this is donethe estimation problem is solved because (under quadratic loss) we can estimate yfrom �y ¼ S�1PS

s¼1 yðsÞ and we can compute second or other moments in a similar

way, if they exist. We consider kernel densities of the individual elements of fyðsÞ; s ¼

1; . . . ;Sg to form approximations to marginal posterior density functions ofparameters. The same is true for any function f ðyÞ of the parameter vector, sincewe have the draws ff ðyðsÞÞ; s ¼ 1; . . . ;Sg. To generate draws from the posteriordensity function pðy jY Þ we consider the conditional density functionspðyj j y�j ;Y Þ ðj ¼ 1; . . . ; pÞ where y�j denotes all elements of y except the jth element.The sequence fyðsÞ; s ¼ 1; . . . ;Sg so generated is called Gibbs sampling sequence(Gelfand and Smith, 1990; Tanner and Wong, 1987) and it converges in distributionto the posterior under fairly mild conditions (Roberts and Smith, 1994). This meansthat if the number of draws is large, then one can use the draws fyðsÞ; s ¼ 1; . . . ;Sg asa sample from the posterior density function.In the present model we generate random drawings from the following

conditionals: (i) b jS; su; u, data, (ii) S jb; su; u, data, (iii) su jb; S; u, data, and(iv) u jb; S; su, data. We repeat this cycle S times to generate a sequence of length S

for each one of these parameters. The draws so generated can be considered as a

8For a review of MCMC methods in econometries, see Geweke (1999).

ARTICLE IN PRESS


sample from the joint posterior density function of the parameters. The requiredconditional density functions to implement Gibbs sampling are as follows. For theregression parameters we have

b jS�1; su; u; y; X � Nkð�b; �V Þ ð13Þ

where the conditional posterior mean is

�b ¼ ½X 0ðS�1 � InT ÞX þ G0H�1G��1½X 0ðS�1 � InT Þðy � UÞ þ G0H�1g�

U ¼u � 1T

0nTðM�1Þ

� �

and the conditional posterior covariance matrix is

�V ¼ ½X 0ðS�1 � InT ÞX þ G0H�1G��1:

The conditional posterior density function of S�1 is

PðS�1 jb;su; u; y;X Þ / jS�1jðnþnS�ðMþ1ÞÞ=2 expð�12 tr½AS þ Aðb; uÞ�S�1Þ ð14Þ

which is a Wishart density function. The conditional posterior density function of su

is

pðsu jb;S�1; u; y;X Þ / s�ðnþnþ1Þu exp �

u0u þ q

2s2u

� �ð15Þ

which implies

u0u þ q

s2u

��b; S�1; u; y; X � w2nþn: ð16Þ

It can be shown that the conditional posterior density function of latentinefficiencies is given by

ui jb; S�1; su; y; X � N1ðm�i ;s2�Þ; uiX0; i ¼ 1; . . . ; n ð17Þ

where

m�i ¼ Ts2�Xm

j¼1

�ejis1j ; i ¼ 1; . . . ; n;

s2� ¼s2u

1þ Ts11s2u; em ¼ ym � X mbm ¼ ½em1; . . . ; emT �

0; �em ¼ T�1XT

t¼1

emt

with �em ¼ ½�em1; . . . ; �emn�0 for m ¼ 1; . . . ;M. The inverse contemporaneous covariance

matrix is expressed as S�1 ¼ ½sij �. Finally, the ui’s are independent in (17). To derivethis result, notice that from (12) we can decompose the posterior aspðu jb;S�1;su; y;X Þ pðb;S�1;su j y;X Þ, in which the second part is irrelevant. Wefollow the technique presented in Tsionas (1999) to draw from (17). This techniqueutilizes acceptance sampling based on an exponential blanketing density whoseparameter is chosen to maximize the acceptance rate.

ARTICLE IN PRESS


To set up the Gibbs sampler we draw random numbers from conditional posteriordensity functions (13)–(17). This task is straightforward because these densityfunctions are from well-known families like the normal, truncated normal andWishart. Therefore, the Gibbs sampler provides a straightforward numericalapproach to Bayesian analysis of a translog cost system involving technicalinefficiency.

2.2. Efficiency measurement

In this section, we describe efficiency measurement based on the concept ofposterior predictive efficiency developed by van den Broeck et al. (1994). Consider ayet unobserved firm for which the data on the dependent variables are in y0 (anMT � 1 vector) and the data on the explanatory variables are in X 0 (an MT � k

matrix), i.e.,

y0 ¼

y01

..

.

y0M

2664

3775; X 0 ¼

X 01

..

.

X 0M

2664

3775

and y0m is T � 1 vector and X 0m is MT � km matrix ðm ¼ 1; . . . ;MÞ. Define

m�0 ¼ Ts2�PM

m¼1 �e0ms

1m, �e0m ¼ T�1PT

t¼1 e0mt; m ¼ 1; . . . ;M and e0 ¼ y0 � X 0b ¼

½e01; . . . ; e0M �0.

From (17) the conditional posterior density function of the latent inefficiency forthe yet unobserved firm is

u0 jb; S�1; su; y; X ; X 0 � Nðm�0; s2�Þ; u0X0: ð18Þ

Let r0 ¼ expð�u0Þ be the efficiency of the firm, r0 2 ð0; 1Þ. Then

pðr0 jb;S�1;su; y;X ;X 0Þ ¼p2s2�

� ��1=2F

m�0s�

� �r�10 exp �

ðln r0 þ m�0Þ2

2s2�

� �;

r0 2 ð0; 1Þ: (19)

It is necessary to integrate out the model parameters (viz., b and S�1) to obtain themarginal density function of r0. For this we write (19) as

pðr0 j y;X ;X 0Þ ¼

Zpðr0 jb;S�1;su; y;X ;X 0Þpðb;S�1;su j y;X Þdbdsu dS�1:

ð20Þ

An approximation of (20) can be computed using the standard estimator

pðr0 j y;X ;X 0Þ / S�1XS

s¼1

pðr0 jbðsÞ;S�1ðsÞ ;su;ðsÞ; y;X ;X 0Þ ð21Þ

where fbðsÞ;S�1ðsÞ ;su;ðsÞ; s ¼ 1; . . . ;Sg is the set of posterior draws. The posterior

predictive density function in (21) can be presented graphically to draw inferences

ARTICLE IN PRESS


about the efficiency level of a yet unobserved firm, after normalization to make it aproper density function.In practice, it is important to report efficiency measures for the observed firms as

well. The density function of ui jb; S�1; su; y; X is given in (17). Thus, if

ri ¼ expð�uiÞ ð22Þ

then a straightforward modification of (19) can be used to obtain the firm-specificefficiency density function.Moments of ri can be computed easily, and the density pðri j y;X Þ can be

approximated using the standard estimator based on the set of posterior draws. Themean and/or median of r can be used to predict efficiency. We do this as follows:given the draws fu

ðsÞi ; s ¼ 1; . . . ;Sg for the sth iteration of the Gibbs sampler, we

compute rðsÞi ¼ expð�u

ðsÞi Þ. Since u

ðsÞi is a draw from the conditional density function

ri jb; S�1; su; y; X it follows that

�ri ¼ S�1XS

s¼1

rðsÞi ð23Þ

represents average firm-specific technical efficiency.

2.3. Prior elicitation

Given the functional forms of the prior density function, in practice, we have tochoose the hyperparameters to match whatever prior knowledge we may have.Although it is difficult to have prior notions about parameters like b or S (other thanrestrictions imposed by economic theory) it is sometimes possible to utilize priorinformation about inefficiency. Given the prior in (9) for parameter su the objectivein this section is to choose the hyperparameters n and q in some satisfactory way.Since ri ¼ expð�uiÞ; ri 2 ð0; 1Þ, we have

pðri jsuÞ ¼p2s2u

� ��1=2r�1i exp �

ðln riÞ2

2s2u

� �; i ¼ 1; . . . ; n: ð24Þ

We can either treat ui as a model parameter and use the prior in (6) or consider it tobe a part of the model. Both the interpretations give the same posterior results. Theprior of su in (9) depends on the hyperparameters n and q. These parameters may beelicited as follows. To facilitate prior elicitation, we used numerical quadrature tocompute the mean �r and variance �s2 of the marginal prior for values in the rangen 2 ½1; 100� and q 2 ½0:001; 5�. Then we computed the following regressions (with5000 observations) to approximate prior elicitation:9

lnðqÞ ¼ 6:163þ 3:032 ln �r þ 1:090 ln �s2; R2 ¼ 0:800 ð25Þ

lnðn=qÞ ¼ �1:157þ 2:022 ln �r � 1:024 ln �s2; R2 ¼ 0:999: ð26Þ

9If one needs highly precise priors then it is necessary to conduct the simulation that will give the exact

priors.

ARTICLE IN PRESS


For any desired prior mean efficiency and prior variance, these regressions can beused to obtain approximately the right values of the hyperparameters n and q. Moreprecise prior elicitation can be accomplished using exact quadrature methods with theimplied prior probability density function for efficiency, pðrÞ. Alternatively, it can beapproximated using simulation techniques. Given a sample of values fsðsÞu ; s ¼

1; . . . ;Sg from the prior, one could draw from uðsÞ jsðsÞu � jNð0; s2ðsÞu Þj, computerðsÞ ¼ expð�uðsÞÞ, and approximate the marginal prior pðrÞ using a histogram of the rðsÞ.It can be shown that the posterior density function is finitely integrable and that

parameters and efficiency measures have finite first and second moments. The mostimportant result is that with cross-sectional data and a flat prior on S�1 the posteriordoes not exist, and we need a prior that keeps S probabilistically ‘‘away from zero’’.Posterior moments exist under standard conditions. A technical Appendix detailingthese statements is available upon request.

3. A model with both technical and allocative inefficiency

In this section, we consider a model that allows for both technical and allocativeinefficiency. Perhaps the simplest way to deal with allocative inefficiency is to arguethat the share equation residuals represent deviations from first-order conditionsand, therefore, they represent allocative distortions. This modeling approach,however, fails to take into account the link between allocative inefficiency and itsimpact on cost. Here we follow Kumbhakar (1997) who, following the definition ofallocative inefficiency from Schmidt and Lovell (1979), derived the exact relationshipbetween allocative inefficiency and cost therefrom in the context of the translog costfunction. It solves the Greene problem theoretically. No estimation technique is,however, offered. And we are not aware of any application where the Greeneproblem is solved using a flexible cost function and treating allocative inefficiency asrandom. In a sampling theory framework empirical application of the Kumbhakar(1997) model is difficult because of the computational complexity of the model,especially when allocative inefficiency is represented by random variables a laSchmidt and Lovell (1979).Assume xj represents (time-invariant) allocative inefficiency for the input pair ðj; 1Þ

so that the relevant input price vector (often labeled as shadow price vector) to thefirm is ðw� � ðw1;w�

2; . . . ;w�MÞ ¼ ðw1;w2 expðx2Þ; . . . ;wM expðxMÞÞÞ, where x2; . . . ; xM

are random variables. Kumbhakar (1997) showed that the translog system (with asingle output) can be written as follows:10

ln Cait ¼ ln C�

it þ ln Git þ vit þ ui; i ¼ 1; . . . ; n; t ¼ 1; . . . ;T ð27Þ

Saj;it ¼ S0

j;it þ Zj;it ð28Þ

where Cait; Sa

j;it; S0j;it; vit and ui are the same and defined in Section 2. The arguments

of C�it are w�

it and yit while those in C0it (defined in Section 2) are wit and yit. The Zj;it

10The multiple output generalization of this result is straightforward.

ARTICLE IN PRESS


and ln Git are functions of allocative inefficiency, x2; . . . ; xM (defined below). Werewrite (27) as ln Ca

it ¼ ln C0it þ ln CAL

it þ vit þ ui where ln CALit ð¼ ln C�

it � ln C0it þ

ln GitÞ can be interpreted as the percentage increase in cost due to allocativeinefficiency.11 For a translog functional form C0

it; S0j;it; ln CAL

it ; ln Git and Zj;it are

ln C0it ¼ a0 þ

Xj

aj ln wj;it þ gy ln yit þ1

2gyyðln yitÞ

2

þ1

2

Xj

Xk

bjk ln wj;it ln wk;it þX

j

gjy ln wj;it ln yit

þ att þ1

2attt

2 þ byt ln yitt þX

j

bjt ln wj;itt (29)

S0j;it ¼ aj þ

Xk

bjk ln wk;it þ gjy ln yit þ bjtt ð30Þ

ln CALit ¼ ln Git þ

Xj

ajxj;i þX

j

Xk

bjkxj;i ln wk;it þ1

2

Xj

Xk

bjkxj;ixk;i

þX

j

gjyxj;i ln yit þX

j

bjtxj;it (31)

Git ¼X

j

S�j;it expð�xj;iÞ ð32Þ

where

S�j;it ¼ aj þ

Xk

bjk ln w�k;it þ gjy ln yit þ bjtt � S0

j;it þX

k

bjkxk: ð33Þ

Finally,

Zj;it ¼S0

j;itf1� Git expðxj;iÞg þP

k bjkxk

Git expðxj;iÞ: ð34Þ

Thus, Zj;it are the deviations of the actual cost shares from their optimum values, andare nonlinear functions of allocative inefficiency, x2; . . . ; xM , and data.If we denote the vectors of all observations on log cost and M � 1 cost shares by

y1; y2; . . . ; yM , the matrix of cost function regressors (observations on log prices, logoutput, their squares and interactions) by X 1, the matrix of cost share equationregressors (observations on log prices and log output by X 2, then we can write thetranslog cost system in (27) and (28) as

y1 ¼ X 1ðxÞb1 þ ln Gðx;bÞ þ v1 þ u � 1T

yj ¼ X 2bj þ Zj�1ðx;bÞ þ vj ; j ¼ 2; . . . ;M (35)

11This is non-negative given strict concavity of the cost function.

ARTICLE IN PRESS


where we have appended error terms vj ðjX2Þ in the share equations to capture, forexample, measurement errors in cost share equations. The matrix X 1ðxÞ denotes X 1

when wj;it are replaced by w�j;it � wj;it expðxj;iÞ so that X 1ð0M�1Þ ¼ X 1. Finally, b

denotes the entire parameter vector. The system in (35) is a nonlinear seeminglyunrelated regression model with nonlinear random effects.We continue to assume, as before, that v � NMnT ð0;S� 1T Þ. Furthermore, we

assume that xi ¼ ½x2;i; . . . ; xM;i�0 � NM�1ð0;OÞ,

12 i ¼ 1; . . . ; n. Then the above modelrepresents a system of nonlinear regression equations with random effects. We writethe system compactly as

y ¼ X ðxÞbþ fðx;bÞ þ v þu � 1T

0ðM�1ÞnT

� �ð36Þ

where

b ¼ ½b01; . . . ;b0M �0; fðx;bÞ ¼

ln Gðx;bÞ

Zðx;bÞ

� �; X ðxÞ ¼

X 1ðxÞ

X 2 � 1M�1

� �:

We assume that v � NnTM ð0;S� InT Þ; x � NnðM�1Þð0;O� InÞ, and both areindependent of each other as well as independent of X. Regarding the priors we have

pðb;S�1;su;O�1Þ / pðbÞpðS�1ÞpðsuÞpðO�1Þ:

The priors on su; S�1, and b are the same as in (9), (10), (11), and we choose aWishart prior for O�1, viz.,

pðO�1Þ / jO�1jðnO�MÞ=2 expð�12

AOO�1Þ

where nO and AO are parameters of the prior density function. The augmentedposterior density function of the model is

pðb;S�1;su; u; x;O�1 j y;X Þ / jS�1jðn�ðMþ1ÞÞ=2 exp �1

2tr A ðb; x; uÞS�1

� �

�sð�nþnþ1Þu exp �

q þ u0u

2s2u

� �jO�1jðn�mÞ=2 exp �

1

2tr QðxÞO�1

� �pðb;S�1; su;OÞ

(37)

where QðxÞ ¼Pn

i¼1 xix0i and Aðb; x; uÞ is similar to (7) except that we have the x terms

in it, viz.,

Aðb; x; uÞ

¼

ðe1ðx;b; uÞ � u � 1T Þ0ðe1ðx; b; uÞ � u � 1T Þ . . . e1ðx;b; uÞ

0eMðx;b; uÞ

..

. ...

eMðx;b; uÞ0ðe1ðx; b; uÞ � u � 1T Þ . . . eM ðx; b; uÞ0eM ðx; b; uÞ

26664

37775

12It is easy to assume a non-zero mean for the x’s. Allowing for a non-zero mean, say m, requiredsomewhat tight priors at least in our application. Although m is clearly identified from the share equation

constant terms, it does not appear that we have ‘‘proper empirical identification’’ to use a term suggested

to us by an anonymous referee. For that reason, we opt for setting m ¼ 0 in this application, reflecting our

prior notion that average allocative inefficiency is likely to be small.

ARTICLE IN PRESS


where emðx;b; uÞ ¼ ym � X mðxÞbm � fmðx;bÞ; m ¼ 1; . . . ;M. The prior density func-tions for parameters other than O are the same as before.The kernel posterior density function of parameters is

pðb;S�1;su;O�1 j y;X Þ ¼

ZRn

ZRn

þ

pðb;S�1;su; u; x;O�1 j y;X Þdudx

which does not have a closed form analytical solution.13 For this reason, inference inthis model is a challenge.14

3.1. Bayesian inference

To perform Bayesian analysis of this model we utilize MCMC methods associatedwith the posterior distribution. In other words, we construct a Markov chain definedby conditional density functions of parameters. In this Markov chain, random drawsare made from each posterior conditional distribution. The conditional posteriordistributions required for implementing MCMC techniques are as follows.

3.2. Conditional posterior of b

The conditional posterior density function for b is

pðb jS�1;su; u; x;O�1; y;X Þ / expð�12

tr Aðb; x; uÞS�1Þ pðbÞ: ð38Þ

This density function is not from any known family so random number generation isdifficult. One can, however, use the Metropolis–Hastings algorithm (Tierney, 1994).Our objective is to generate random draws from a distribution with density f ðxÞ buta direct sampling is not possible. However, there is a density gðxÞ from whichrandom number generation is easy. So we proceed as follows. Given an initialcondition xð0Þ, we generate a candidate y � gðxÞ. The next draw will be either xð0Þ or

13It is possible to integrate explicitly the augmented posterior with respect to S and O but the resulting

expressions are highly complicated, depend on the latent variables u and x, and thus it is not clear how this

could be useful in estimation.14Alternative computational approaches to estimating nonlinear random effect models are available in

the statistics literature, although systems of nonlinear equations are hard to find. Pinheiro and Bates

(1995) discuss the theory and computational techniques for nonlinear random effect models, and conclude

that adaptive Gaussian quadrature is one of the best methods to obtain approximations to the integrated

likelihood function (the integration is with respect to x). However, when xi is defined in more than one

dimension, Gaussian quadrature is subject to curse of dimensionality. Laplace approximation is another

alternative, and has been investigated by many authors, including Vonesh and Chinchilli (1997), Wolfinger

(1993), and Wolfinger and Lin (1997). In the related literature on generalized nonlinear models with

random effects, numerical quadrature techniques have been considered and analyzed by Longford (1994),

McCulloch (1994), Liu and Pierce (1994), and Diggle et al. (1994). These techniques have, however,

unknown degrees of accuracy with respect to approximating the likelihood function. It is clear that an

error � in the computation of integrals with respect to a given xj for a particular observation is magnified to

NJ� for the log-likelihood if we have N observations and Jxj ’s, where J ¼ M � 1. Thus the error can be

substantial. In addition, if J is large the curse of dimensionality will interfere with our inability to use these

techniques in practically relevant applications.

ARTICLE IN PRESS


the candidate y. More specifically, we set xð1Þ ¼ y with probability

aðxð0Þ; yÞ ¼ min 1;f ðyÞ=gðyÞ

f ðxð0ÞÞ=gðxð0ÞÞ

� �

else we set xð1Þ ¼ xð0Þ. We continue this process until we get S draws. Apparently, forthis procedure to work, gðxÞmust be a good approximation to f ðxÞ, otherwise we willbe rejecting a lot of the candidates, meaning that effectively we will be unable toexplore f ðxÞ, as desired. The method is applied in the current setting in the followingmanner. Let f ðb j x; u;su;S; y;X Þ be the exact posterior conditional, andgðb j x; u;su;S; y;X Þ be the posterior corresponding to the multivariate Student-tproposal density15 for b when fðx;bÞ ¼ 0. Suppose the current draw is bðiÞ.Define

aðbðiÞ; ~bÞ ¼ min 1;f ð ~b j x; u;su;S; y;X Þ=gð ~b j x; u;su;S; y;X Þ

f ðbðiÞ j x; u;su;S; y;X Þ=gðbðiÞ j x; u;su;S; y;X Þ

( ):

Then, with probability aðbðiÞ; ~bÞ we accept the proposal ~b, else we maintain thecurrent draw bðiÞ. If the acceptance rate of this proposal is not satisfactory, we canalways modify the proposal to some extent. For example, we can multiply itscovariance matrix by a certain constant that can be tuned to maintain a satisfactoryacceptance rate.This proposal is attractive because it automatically allows for imposition of all

theoretical restrictions via the prior pðbÞ. In our application, the acceptance rate ofthis proposal is near 70%. We also impose the monotonicity and concavityrestrictions at each sample point using the rejection method, exactly as in the modelwith only technical inefficiency.

3.3. Conditional posterior of S

The posterior conditional density function of S is given by

pðS�1 jb;su; u; x;O�1; y;X Þ

/ jS�1j ðnþnS�ðMþ1Þ=2 expð�12

tr½AS þ Aðb; x; uÞ�S�1Þ (39)

which is the density function of a Wishart distribution. Given b; x; u the matrixAðb; x; uÞ is a known constant. So generating random draws from the above densityfunction is straightforward.

15It is necessary to maintain the acceptance probability bounded so the degrees of freedom of the

Student-t should be less than the degrees of freedom of the usual approximate Student-t posterior for b. Inour empirical work we sample from Student-t with 40 degrees of freedom but have found that normal

proposals produce only trivial changes.

ARTICLE IN PRESS


3.4. Conditional posterior of O

The posterior conditional density function of O is

pðO�1 jb;S�1;su; u; x; y;X Þ / jO�1jðnþnO�MÞ=2 expð�12

tr½AO þ QðxÞ�O�1Þ ð40Þ

which is also the density function of a Wishart distribution since AO þ QðxÞ is a givenmatrix of constants.

3.5. Conditional posterior of su

The posterior conditional density function of su satisfies

q þ u0u

s2u

��b; S�1; O�1; u; x; y; X � w2nþn ð41Þ

from which random number generation is simple.

3.6. Conditional posterior of u

For latent technical inefficiency, it can easily be shown, using the techniquesdeveloped in the previous section, that

ui jb; S�1; O�1; su; x; y; X � N1ðm�i ;s2�Þ; uiX0; i ¼ 1; . . . ; n ð42Þ

where

m�i ¼ Ts2�Xm

j¼1

�ejiðx;b; uÞs1j ; i ¼ 1; . . . ; n; s2� ¼s2u

1þ Ts11s2u

emðx;b; uÞ ¼ ym � X mðxÞbm � fmðx;bÞ ¼ ½em1ðx;b; uÞ; . . . ; emT ðx; b; uÞ�0;

m ¼ 1; . . . ;M

�emðx; b; uÞ ¼ T�1XT

t¼1

emtðx; b; uÞ; m ¼ 1; . . . ;M is an n � 1 vector

with

�emðx; b; uÞ ¼ ½�em1ðx;b; uÞ; . . . ; �emnðx;b; uÞ�0; m ¼ 1; . . . ;M

and the inverse contemporaneous covariance matrix is expressed as S�1 ¼ ½sij �. Thisdistribution of ui is truncated normal, and since the ui’s are independent in their jointposterior conditional distribution, random draws can be generated sequentially foreach i ¼ 1; . . . ; n using the inverse cumulative distribution function technique. Soimplementation is straightforward. An alternative is to use acceptance sampling as inTsionas (1999). Here, we have opted for the second technique.

ARTICLE IN PRESS


3.7. Conditional posterior of x

Finally, we consider the posterior conditional density function of x given by

pðx jb;S�1;su; u;O�1; y;X Þ / expð�12

tr Aðb; x; uÞS�1Þ expð�12

tr QðxÞO�1Þ:

ð43Þ

Generating random draws from this joint density function is not straightforwardbecause the density function is not from any known form. One promising possibilityto obtain a reasonably good proposal density function is to linearize the cost shareequations (i.e., to make them linear in the x’s). The resulting approximate posteriorof each xi will be normal so in practice we can use a Student-t proposal densityfunction16 to maintain the acceptance probability bound. We can easily obtain arandom draw from this posterior, and use a Metropolis rule to maintain the correctposterior. The normal approximation is, in fact, very simple to use. The task is tolinearize the cost function and share equations with respect to x and use theapproximation to obtain a multivariate normal or Student-t density function for thex’s that can be used as proposal density function for the Metropolis–Hastings step.In Appendix A we show that partial derivatives of the cost function with respect tox’s are exactly zero, so we can use only the share equations to derive the linearapproximation so the normal approximation has a particularly simple structure forthe translog cost system. The acceptance rate of the Metropolis chain is over 85% forthe data we analyzed, which is a satisfactory approximation.17 Let xi be the currentdraw from the approximate density function. Let f ðxi j y;X ; b;S;OÞ be the exactconditional density function of xi given the data and the parameters, andgðxi j y;X ;b;OÞ be the normal approximation, i.e., the pdf of the multivariatenormal resulting from the normal approximation. Clearly, both density functions areavailable in closed form. The candidate xi � f ðxi j y;X ;b;S;OÞ will either be acceptedor rejected in favor of the previous draw, say xð0Þi , according to the following rule. Let

aðxð0Þi ; xiÞ ¼ min 1;f ðxi j y;X ;b;S;OÞ=gðxi j y;X ; b;OÞ

f ðxð0Þi j y;X ;b;S;OÞ=gðxð0Þi j y;X ; b;OÞ

( ):

Either we accept this draw with probability aðxð0Þi ; xiÞ, or reject it and take xð0Þi asthe draw. The overall acceptance rate of this procedure is over 85% in the data set wehave analyzed. It is necessary to ensure that the approximation to the exactconditional posterior of xi is satisfactory. We cannot claim that this will always be

16We use a Student-t with 10 degrees of freedom but we have found the results remain the same when we

use a normal proposal.17Following the sampling theory literature we could have used Gaussian quadrature to integrate out x’s

from the likelihood function or the posterior distribution, and then use the Metropolis–Hastings algorithm

(MHA) to provide inferences for b; S; O. The required posterior would not, of course, be available in

closed form. We did not opt for this technique because the MHA does not take account of the special

features of the problem, namely linearity of the system conditional on x’s. Moreover, for complicated

posteriors, the MHA can result in high autocorrelation of the draws, making reliable exploration of the

posterior a troublesome task.

ARTICLE IN PRESS


the case but we suspect that when this is not so, the approximation can be improvedadaptively by linearizing around a point �xi (different from zero) that could be thecurrent posterior mean of xi.

18

3.8. Joint measurement of technical and allocative inefficiency

The previous model is capable of providing measures of technical and allocativeinefficiency for each firm, and for a yet unobserved firm (in which case we makeposterior predictive inferences, i.e., predictive inferences conditional on theobserved data). Our problem here is as follows. Suppose f ðy;DÞ represents anyfunction of the parameters y (inclusive, perhaps, of any latent variables like u and x)and the data D. The objective is to estimate the posterior expectationEf ðy;DÞ ¼

Rf ðy;DÞpðy jDÞdy. Given draws yðsÞ; s ¼ 1; . . . ;S from the posterior

density function pðy jDÞ, this expectation can be approximated byE½f ðy;DÞ jD� � S�1PS

s¼1 f ðyðsÞ;DÞ.Next, we describe the procedure to obtain technical and allocative inefficiency

measures. Measurement of technical inefficiency in the present model is exactly thesame as in the model without allocative inefficiency presented in Section 2. Defineri ¼ expð�uiÞ to be the efficiency index of firm i ¼ 1; . . . ; n. The firm-specificefficiency measure is provided by the mean of the posterior density function of ri,viz., Eðri j y;X Þ. This measure can be obtained by averaging r

ðsÞi , where r

ðsÞi denotes the

sth draw for ri. Once a draw uðsÞi from the conditional posterior of ui becomes

available, this can be computed easily. The posterior predictive density function canalso be obtained easily. Since ui jsu is half normal, the density function of ri jsu canbe obtained easily. The marginal density function of ri is then given by pðriÞ ¼R1

0 pðri jsuÞpðsu j dataÞdsu � S�1PSs¼1 pðri jsðsÞu Þ where sðsÞu ; s ¼ 1; . . . ;S denotes the

posterior draws for su.Regarding allocative inefficiency we follow a similar procedure. First, the

departure of observed prices from shadow prices, namely the xj’s, are of interest.Given the posterior draws for x, say xðsÞj;i , a firm-specific measure is provided by�xj;i ¼ S�1PS

s¼1 xðsÞj;i where S indicates the number of draws. This gives the percentage

deviation of observed prices from shadow prices for input pair ðj; 1Þ for firm i. Thedeviations themselves can be estimated from �lj;i ¼ S�1PS

s¼1 lðsÞj;i , where

lðsÞj;i ¼ expðxðsÞj;i Þ; s ¼ 1; . . . ;S.Another measure of interest is the percentage increase in costs due to allocative

inefficiency for each firm. This is given by ln CALit and depends on the data, the x’s

and the parameters in b. Clearly, this measure can be computed for each draw of band x. It can be averaged with respect to the draws, and provide temporal and firm-specific measures for ln CAL

it or CALit . A posterior predictive density function for

18We have tried this approximation in our application and found that the acceptance rate increased

slightly. However, we decided not to use it in reporting the final results since the original proposal

performed rather well. Another advantage of the algorithm is that it can be vectorized easily to generate all

xi’s at once by exploiting properties of the normal distribution. Consequently, fitting the nonlinear

translog random effect model is not considerably more time consuming compared to the translog system

with only technical inefficiency.

ARTICLE IN PRESS


ln CAL, referring to an as of yet unobserved (or a typical) firm is computed asfollows. Given a vector of prices and output for that firm (for example, the sampleaverages of these variables) ln CAL is computed for each draw, and thenaveraged with respect to the draws. This provides information regarding thedistribution of allocative inefficiency for a typical firm conditional on the data, andcan be used for predictive purposes. Parameter uncertainty is fully taken intoaccount in these computations in standard Bayesian fashion since these measures areaveraged against the posterior distribution of parameters. The implication, ofcourse, is that we do not have to resort to asymptotic approximations of the ‘‘plug-in’’ variety. Similar principles are followed to compute the firm-specific measuresfor ln CAL.Further information can be obtained from Bayesian inferences regarding whether

certain inputs are under- or over-utilized. This information is not provided by the x’salone. Since qC�=qwj ¼ xTE

j and qC0=qwj ¼ xOPj where xTE

j and xOPj denote

technically efficient and optimal (both technically and allocatively efficient)quantities of input xj (with firm and time subscripts omitted) and C� ¼ Caju¼0,non-optimal use of input xj relative to input x1 (for example), can be obtained fromthe formula19

kj ¼ ðxj=x1ÞTE=ðxj=x1Þ

OP¼ expð�xjÞð1þ Zj=S0

j Þ=ð1þ Z1=S01Þ for j ¼ 2; . . . ;m:

Consequently, if kj4 ðo Þ1 then input xj is over- (under-) used relative to input x1.Note that these measures are firm-specific and time-varying. Using the MCMCalgorithm we have one draw for each of the kj. This draw depends on all parametersof the system. The final measure is obtained by averaging across all draws using thestandard estimator. This operation is equivalent to integrating parameteruncertainty out.It is clear that the model developed in this section provides all the information

that we need to evaluate technical and allocative inefficiency for each firmin the sample using a cost system (consisting of the cost function andcost share equations). There are four basic advantages of the model weproposed in this section. First, technical and allocative inefficiency are modeled ina way that is consistent with the cost minimization problem of standardmicroeconomic theory. Second, all inferences are for the given data, so noasymptotic approximations are used. Third, using a systems approach ensuresthat we obtain more precise estimates of technical efficiency relative to a single-equation approach. Finally, it solves the Greene problem from an empirical pointof view.20

19Alternatively, one can define non-optimal use of input xj from kj ¼ xTEj =xOP

j ¼ expð�xjÞð1þ

Zj=S0j ÞP

k ðS0k þ ZkÞ expð�ZkÞ for j ¼ 1; . . . ;m. Thus, if kj4 ðo Þ1 then input xj is over-used (under-used).

20An anonymous referee mentioned that a fixed effects model would also have these advantages. While

this is true, we argue that there are situations (for example, cross-sectional models) where one cannot use a

fixed effects model. Furthermore, in a fixed effects model technical inefficiency is defined relative to the

best firm in the sample. This is, however, not the case in the present model.

ARTICLE IN PRESS


4. Prior specifications

In our empirical application we use two priors (A and B). Both-priorshave n ¼ 1. Prior A has q ¼ 0:1 and prior B has q ¼ 1. Prior median efficiency is71% and 41.8%, respectively. We choose n ¼ 1 because n represents thenumber of observations in a fictitious experiment that provides a random sampleu1; . . . ; un with variance q=n. Therefore, prior information exists but is notparticularly precise.For the semi-informative prior to accommodate the restrictions implied by

economic theory we assume Gb � Nqð0; �2IqÞ where � ¼ 10�5. This prior is extremelytight and practically implies exact imposition of the restrictions. The exact form ofthe restrictions for this application is presented in Appendix B. Since these aremathematical restrictions to be satisfied by any cost function, we decided to imposethese constraints exactly. Nonetheless, we argue that the proposed method allowsone to use different degrees of correctness. We use informative priors for both S andO which are both inverted Wishart with nS ¼ nO ¼ 1 degree of freedom and scalematrix AS ¼ AO ¼ 10�5I . These priors are proper but very diffuse.

5. Data and empirical results

5.1. Data

The data for this study are taken from the commercial bank and bank holdingcompany database managed by the Federal Reserve Bank of Chicago. It is based onthe Report of Condition and Income (Call Report) for all U.S. commercial banksthat report to the Federal Reserve banks and the FDIC. In this paper we used thedata for the years 1996–2000 and selected a random sample of 500 commercialbanks.The commercial banking industry is one of the largest and most important sectors

of the U.S. economy. The structure of the banking industry has undergone rapidchanges in the last two decades, mostly due to extensive consolidation. Justificationof mergers and acquisitions is often provided in terms of economies of scale andefficiency. Here we focus on the efficiency arguments by estimating a flexible costsystem. Previous banking efficiency studies (see the survey by Berger and Humphrey(1997)) based on cost function estimation mostly focused on technical inefficiency.The reason for this is that estimation of the translog system with both technical andallocative inefficiency was not feasible before. From this perspective, this is the firstbanking study in which a translog cost function system with technical and allocativeinefficiency are jointly estimated without making them deterministic functions ofdata and unknown parameters.In the banking literature there is controversy regarding the choice of inputs and

outputs. Here we follow the intermediation approach (Kaparakis et al., 1994) inwhich banks are viewed as financial firms transforming various financial andphysical resources into loans and investments. The output variables are installment

ARTICLE IN PRESS


loans (to individuals for personal/household expenses) (y1), real estateloans (y2), business loans (y3), federal funds sold and securities purchasedunder agreements to resell (y4), other assets (assets that cannot be properly includedin any other asset items in the balance sheet) (y5). The input variables are labor (x1),capital (x2), purchased funds (x3), interest-bearing deposits in total transactionaccounts (x4) and interest-bearing deposits in total non-transactionaccounts (x5). For each input the price is obtained by dividing total expenses on itby the corresponding input quantity. Thus, for example, the price of labor (w1) isobtained from expenses on salaries and benefits divided by the number of full-timeemployees (x1). The same approach is used to obtain w2 through w5. Total cost isthen defined as the sum of the expenses on these five inputs. To impose thelinear homogeneity restrictions, we normalize total cost and all the prices withrespect to w5.

5.2. Empirical results

5.2.1. Technical efficiency

The model for technical efficiency only with a single output is outlined in Eq. (1).Here we write it out fully for five outputs and five inputs using the translogfunctional form

ln Cait ¼ a0 þ

Xj

aj ln wj;it þX

m

gm ln ym;it þ1

2

Xm

Xq

gmq ln ym;it ln yq;it

þ1

2

Xj

Xk

bjk ln wj;it ln wk;it þX

j

Xm

gjm ln wj;it ln ym;it þ att

þ1

2attt

2 þX

m

bmt ln ym;itt þX

j

bjt ln wj;itt þ v1;it þ ui

and

Saj;it ¼ aj þ

Xk

bjk ln wk;it þX

m

gjm ln ym;it þ bjtt þ vjþ1;it

m; q ¼ 1; . . . ; 5; j; k ¼ 1; . . . ; 4:

The assumptions on the noise components (v1; . . . ; v5), and technical inefficiencycomponent, u are as before and are not repeated here.In Fig. 1 we provide the posterior predictive technical efficiency as well as kernel

density estimates for firm-specific technical efficiency measures for both priors A andB. We considered two models: (i) the translog cost system with only technicalinefficiency, and (ii) the translog cost system with both technical and allocativeinefficiency. In Figs. 1a and b we provide the posterior predictive technical efficiencydistributions, which give the density function of technical efficiency for a typical oryet unobserved firm. The results from the two systems (with and without allocativeinefficiency) are quite similar. This does not mean that allocative inefficiency can be

ARTICLE IN PRESS

(a) (b)

(c)(d)

Fig. 1. (a) posterior predictive technical efficiency distribution (prior A); (b) posterior predictive technical

efficiency distribution (prior B); (c) technical efficiency distribution (prior A); (d) technical efficiency

distribution (prior B).


ignored.21 Since the overall cost efficiency is the product of technical and allocativeefficiency, i.e., OE=TE AE (Farrell, 1957), and both TE and AE are less thanunity—the estimated TE in the technical inefficiency only model (where OE=TE byconstruction) is likely to be biased upward. In Figs. 1c and d we report kerneldensities of firm-specific efficiency measures. For each bank in the sample, itstechnical efficiency measure is the mean of the distribution of technical efficiency ofthat bank, conditional on its data, and unconditional on the parameters. In otherwords, parameter uncertainty is accounted for in estimating technical efficiency.Figs. 1c and d present the kernel densities of these bank-specific efficiency means.Results from both models show that efficiency values below 70% are highlyimprobable.22

21Note that ignoring allocative inefficiency, if any, makes the model misspecified that results wrong

parameter estimates since the allocative inefficiency component in a translog cost function depends on

outputs and input prices in a nonlinear fashion.22We examine convergence of our MCMC sampling schemes using Geweke’s convergence diagnostics,

as well as by running multiple chains starting from over-dispersed initial conditions and checking whether

final marginal posteriors are close. Models with or without allocative inefficiency seemed very robust to

initial conditions and passed convergence tests. Additionally, we have examined autocorrelation functions

of posterior draws. To reduce the autocorrelation, we use the batching technique that is standard in the

simulation literature. Batch means display practically zero autocorrelations. Results in graphical form are

available upon request.

ARTICLE IN PRESS

(a)

(b)

Fig. 2. (a) comparison of efficiency rankings, prior A; (b) comparison of efficiency rankings, prior B.


In Fig. 2 we report efficiency rankings (technical) of banks from models with onlytechnical inefficiency and with both technical and allocative inefficiency. Generally,the correlations between these rankings are fairly high but for some specific bankslarge differences are observed. Thus, if the focus is individual bank efficiency thehigh correlation of efficiency ranking between models may not be useful in choosingbetween models.

5.2.2. Allocative efficiency

We now discuss results obtained from the system with both technical andallocative inefficiency. The density functions of allocative inefficiency (pricedistortion) (xj) are reported in Figs. 3a and b, and some summary measures arereported in Table 1. Each graph23 provides the kernel density estimate of bank-specific average of xj. It may be useful to describe again how these density functionsarise. For the ith bank in the sample the Gibbs sampler provides a random draw xðsÞ

ðiÞj

during iteration s ¼ 1; . . . ;S for input pair ðj; 1Þ. Recall that this is a draw from thedistribution of xðiÞj conditional on the data and all other parameters of the model.What we need is the density function of xðiÞj conditional on the data only. To averageout parameter uncertainty we compute �xðiÞj ¼ S�1PS

s¼1 xðsÞðiÞj which provides a price

distortion measure for each input and each bank. Fig. 4 presents kernel densities of

23We do not report posterior predictive distributions for allocative inefficiency measures since we are

interested mostly in inferences regarding the banks in our sample.

ARTICLE IN PRESS

(a) (b)

(c) (d)

Fig. 3. (a) distribution of price distortions, prior A; (b) distribution of price distortions, prior B; (c)

distribution of allocative distortions, prior A; (d) distribution of allocative distortions, prior B.

Table 1

Posterior results for functions of interesta

Prior A Prior B

Posterior predictive technical efficiency 0.963 (0.032) 0.963 (0.035)

Firm-specific technical efficiency 0.868 (0.247) 0.859 (0.257)

lnCAL 0.099 (0.088) 0.098 (0.088)

x1 �0.0016 (0.033) �0.0016 (0.033)

x2 �0.0056 (0.011) �0.0055 (0.011)

x3 0.0029 (0.024) 0.0028 (0.024)

x4 0.0011 (0.008) 0.0009 (0.008)

k1 1.0038 (0.059) 1.0038 (0.058)

k2 1.017 (0.026) 1.017 (0.027)

k3 0.991 (0.035) 0.991 (0.034)

k4 0.996 (0.025) 0.996 (0.025)

aThe entries are the posterior means. Posterior standard deviations appear in parentheses.


�xðiÞj across banks for any given input j ¼ 1; . . . ; 4. These density functions do notvary widely with the prior of the technical efficiency parameter su. They areconcentrated around zero (so we can claim that, banks on average do not seem tohave significant relative price distortions) but they differ in terms of spread and

ARTICLE IN PRESS

Fig. 4. Distribution of cost of allocative inefficiency.


overall shape. For labor (input 1), relative price distortions can be as large as 8% inabsolute value. For other inputs, the spread is much lower and distortions rangefrom minus 4% to plus 4%. The difference in spreads reflects the fact that for labor(relative toinput 5) banks seem to misperceive prices to a greater extent compared toother inputs. Finally, the fact that these density functions are not particularly tightmeans that banks are quite heterogeneous in terms of allocative inefficiency.The density functions of kj (relative over- (under-) use of each input) are reported

in Figs. 3c and d, and their summary measures are reported in Table 1. These densityfunctions are kernel density estimates of bank-specific measures derived followingthe formula given in the previous paragraph. These density functions are mostlycentered around unity meaning that, on average, banks do not make many allocativemistakes in using their inputs. The considerable spread suggests that banks arehighly heterogeneous in their relative input mis-allocation. More specifically, k1ranges from 0.836 to 1.182 suggesting that banks may under-utilize labor (relative toinput 5) by as much as 17.4% or over-utilize it by as much as 18.2%. The remainingkj ranges roughly from 0.92 to 1.08 suggesting under-utilization by 8% and over-utilization by 8%. Thus we observe the presence of considerable allocativeinefficiency in the sample banks.When banks fail to allocate their inputs properly, costs will increase. We label this

as the cost of allocative inefficiency. In Fig. 4, we provide kernel densities of bank-specific measures of percentage increases in cost due to allocative inefficiency,ln CAL. The density functions of ln CAL are highly skewed to the right. On average,allocative inefficiency increased cost by 10% (meaning that on average, costallocative efficiency is 90%), although there are fewer banks for which this is muchhigher. From the practical point of view optimal use of inputs is driven by the motiveof attaining high cost efficiency. Since some inputs are costlier than others arelatively cheap (expensive) input can be over- (under-) used more than some other

ARTICLE IN PRESS


inputs that are relatively costly. Here we show how to obtain information onallocative inefficiency (defined in terms of price distortion), non-optimal input useand finally the cost of non-optimal input use, along with cost of technicalinefficiency.We have tried several prior tightness parameters for the regression parameters, b,

viz., � ranging from 10�12 to 0.1 without noticing large differences in final resultsrelated to technical and allocative inefficiency. We have also tried larger values forthe prior tightness parameter. In these cases the results started to differ notably. Thistype of outcome is quite reasonable. With small values of the prior tightness we caninterpret the model as a cost share system, at least in an approximate sense. Wenecessarily lose this interpretation as the value of prior tightness increases. In thiscase, the interpretation of technical efficiency becomes ambiguous so it should beexpected that the results become more sensitive on the prior.

6. Conclusions

In this paper, we developed Bayesian tools for making inferences on firm-specifictechnical and allocative inefficiency using a system approach. The system consideredhere is based on the cost minimization behavior of producers. The main contributionof the paper is the estimation of a well-specified translog system (in which the errorterms in the cost and cost share equations are internally consistent) in a randomeffects framework. This solves the Greene problem by using a model that istheoretically consistent and estimating it without treating the inefficiencies asparametric functions of the data and unknown parameters. First, we analyzed themodel with only technical inefficiency and then we introduced allocative inefficiency.The model with only technical inefficiency is a standard seemingly unrelatedregressions system conditional on the latent inefficiency variable. The model withboth technical and allocative inefficiency is a nonlinear seemingly unrelatedregression with nonlinear random effects. We showed that simulation-basednumerical Bayesian analysis can be used to provide inferences on parameters andmore importantly on functions of interest, viz., technical efficiency, allocativeinefficiency, price distortions for each input, non-optimal input use for each input,etc., for each firm. The new techniques are applied to a panel of U.S. banks. Wecompared the results obtained from two system approaches, namely with andwithout allocative inefficiency. Results show some important differences in efficiencyestimates across models.

Acknowledgements

We would like to thank Gary Koop, three anonymous referees and theparticipants of the conference for their valuable comments. We, alone, areresponsible for any remaining errors.

ARTICLE IN PRESS


Appendix A. Random draws from conditional posterior density function of x

In this appendix, we show how to derive the normal approximation to theconditional posterior of x, and then use a Metropolis update to maintain the correctposterior density function. We assume that u is known and we set it to zero withoutloss of generality (meaning, it is subtracted from y1). In general, any nonlinearsystem with random effects can be written as

y1;it ¼ f 1;itðxi; bÞ þ v1;it

..

.

yM ;it ¼ f M;itðxi; bÞ þ vM ;it; i ¼ 1; . . . ; n; t ¼ 1; . . . ;T

where xi � NM�1ð0;OÞ and vit � ½v1;it; . . . ; vM ;it�0 � NMð0;SÞ independently, and f m;it

denotes a given nonlinear function for the mth equation, firm i and year t. We cantake a first-order Taylor expansion with respect to x, consider all observations forthe ith firm, and write the system in obvious notation as

y1;i ¼ f 1;ið0; bÞ þ Z1;ixi þ v1;i

..

.

yM ;i ¼ f M ;ið0; bÞ þ ZM ;ixi þ vM;i

where ½v01;i; . . . ; v0M ;i�

0 � NMð0;S� IT Þ, and Z1;i; . . . ;ZM;i are matrices representingthe first derivatives with respect to x evaluated at the point of approximation.Therefore, Zm;i ¼ qf m;i=qxijx¼0 whose dimension is T � ðM � 1Þ. Apparently, theZm;i’s are functions of the data as well as b. If we combine this linear system with thedistributional assumption about xi we have

0M�1 ¼ IM�1xi þ ei; ei � NM�1ð0;OÞ:

Therefore, we can use standard results from mixed estimation to obtain

xi � NM�1ð~xi; ½Z

0iðS

�1 � IT ÞZi þ O�1��1Þ

where

~xi ¼ ½Z0iðS

�1 � IT ÞZi þ O�1��1Z0iðS

�1 � IT Þzi; i ¼ 1; . . . ; n; zi ¼ ½z1;i; . . . ; zM;i�0

zm;i ¼ ym;i � f m;ið0; bÞ; m ¼ 1; . . . ;M and Zi ¼ ½Z01;i . . .Z0

M ;i�:

This represents the normal approximation to the posterior conditional densityfunction of any nonlinear system with time-invariant random effects. In practice,instead of a normal density function we use a Student-t with 10 degrees of freedom.These general results must now be specialized to the translog cost share system we

analyze. The task is to find the first derivatives of the cost function and shareequations with respect to the x’s, and evaluate them at x ¼ 0M�1. Omitting i; t

ARTICLE IN PRESS


subscripts and error terms for simplicity, the cost function is

ln Ca ¼ ln C0 þ ln CAL

lnCAL ¼ ln G þX

j

ajxj þX

j

gjyxj ln y þX

j

Xk

bjkxj ln wk

þ1

2

Xj

Xk

bjkxjxk

and ln C0 is the usual translog cost function. Assuming all restrictions implied by thetheory in place, we get

q ln CAL

qxj

¼q ln G

qxj

þ aj þ gjy ln y þX

k

bjk ln wk þX

k

bjkxk:

Since

G ¼X

l

S�l expð�xlÞ; where S�

l ¼ al þX

k

blkðln wk þ xkÞ þ gly ln y

we obtain

q lnG

qxj

¼ G�1X

l

blj expð�xlÞ � expð�xjÞS�j

( ):

Therefore, q ln CAL=qxjjx¼0 ¼ 0. Since ln CALjx¼0 ¼ 0, the cost function con-tributes nothing to the conditional posterior of x up to a first order ofapproximation. This is particularly important because the cost function is the mostcomplicated function of the system, and omitting the cost function from furtherconsideration results in computational gain. Next, we consider the share equations.These are given by

Sam ¼ S0

m þ Zm

Zm ¼S0

mf1� G expðxmÞg þP

k bmkxk

G expðxmÞfor m ¼ 1; . . . ;M � 1:

Clearly, Zl jx¼0 ¼ 0, and Gjx¼0 ¼P

k S�kjx¼0 ¼ 1. Moreover, it is easy to show that

q½G expðxmÞ�

qxj

��x¼0

¼ ðS0j Þ2þ bjj if m ¼ j and S0

mð1þ S0j Þ þ bmj if maj:

After some algebra, the derivatives of the allocative inefficiency term with respectto x’s are

qZm

qxj

��x¼0

¼ �S0j ð1� S0

j Þ þ bjj if m ¼ j and S0mS0

j þ bmj if maj:

These partial derivatives are simple functions of the data and b, and can becomputed very easily at no cost conditional on the b’s. Therefore, we can set up the

ARTICLE IN PRESS


matrices Zi, and these matrices can, in turn, be used to obtain a draw from theapproximate multivariate Student-t posterior conditional density function of xi’s.

Appendix B. Prior restrictions imposed by economic theory

Consider the translog cost function

ln C ¼ a0 þ a0 ln w þ b0 ln q þ 12ln w0A ln w þ 1

2ln q0B ln q þ ln w0D ln q

þ gtt þ12 gttt

2 þ d0 ln q t þ y0 ln w t (B.1)

where ln w is the m � 1 vector of log prices, ln q is the s � 1 vector of log outputs, a0is a constant, a; y and b; d are m � 1 and s � 1 vectors, and A; B; D are matrices ofdimensions m � n; s � s and m � s, respectively. The share equations will be in theform

S ¼ m þ F ln w þ G ln q þ lt ðB:2Þ

where S is the m � 1 vector of input shares, F ;G are matrices of dimensions m � m

and m � s, respectively, and l is an m � 1 vector.We impose exactly the restrictions that A and B are symmetric. Homogeneity

implies

10mA ¼ 0; 10SD ¼ 0; 10ma ¼ 1 ðB:3Þ

where 1m denotes the m � 1 unit vector. Moreover, we have the cross-equationrestrictions

m ¼ a; F ¼ a; G ¼ D; y ¼ l: ðB:4Þ

In system (B.1), (B.2) we have a total of 110 unrestricted parameters; provided weimpose exactly the symmetry restrictions on A and B as well as homogeneity we haveto account for the 44 cross-equation restrictions in (B.3) and (B.4). Given ourconventions, if g denotes the 110�1 unrestricted parameter vector the restrictions areas follows:

g67 � g2 ¼ 0
g89 � g4 ¼ 0 g68 � g7 ¼ 0 g90 � g9 ¼ 0 g69 � g8 ¼ 0 g91 � g13 ¼ 0 g70 � g9 ¼ 0 g92 � g16 ¼ 0 g71 � g10 ¼ 0 g93 � g17 ¼ 0 g72 � g11 ¼ 0 g94 � g18 ¼ 0 g73 � g42 ¼ 0 g95 � g52 ¼ 0 g74 � g43 ¼ 0 g96 � g53 ¼ 0 g75 � g44 ¼ 0 g97 � g54 ¼ 0 g76 � g45 ¼ 0 g98 � g55 ¼ 0 g77 � g46 ¼ 0 g99 � g56 ¼ 0 g78 � g3 ¼ 0 g100 � g5 ¼ 0 g79 � g8 ¼ 0 g101 � g10 ¼ 0

ARTICLE IN PRESS


g80 � g12 ¼ 0
g102 � g14 ¼ 0 g81 � g13 ¼ 0 g103 � g17 ¼ 0 g82 � g14 ¼ 0 g104 � g19 ¼ 0 g83 � g15 ¼ 0 g105 � g20 ¼ 0 g84 � g47 ¼ 0 g106 � g57 ¼ 0 g85 � g48 ¼ 0 g107 � g58 ¼ 0 g86 � g49 ¼ 0 g108 � g59 ¼ 0 g87 � g50 ¼ 0 g109 � g60 ¼ 0 g88 � g51 ¼ 0 g110 � g61 ¼ 0
Since all restrictions are linear, they can be put in the form Gg ¼ g where G is44�110, and g is 44�1. Based on this formulation a semi-informative prior can bespecified in the form Gg � N44ðg;HÞ, where H is 44�44.

References

Aigner, D.J., Lovell, C.A.K., Schmidt, P., 1977. Formulation and estimation of stochastic frontier

production function models. Journal of Econometrics 6, 21–37.

Atkinson, S.E., Cornwell, C., 1994. Parametric estimation of technical and allocative inefficiency with

panel data. International Economic Review 35, 231–244.

Bauer, P.W., 1990. Recent developments in the econometric estimation of frontiers. Journal of

Econometrics 46, 39–56.

Berger, A.N., Humphrey, D.B., 1997. Efficiency of financial institutions: international survey and

directions for future research. European Journal of Operational Research 98, 175–212.

van den Broeck, J., Koop, G., Osiewalski, J., Steel, M.F.J., 1994. Stochastic frontier models: a Bayesian

perspective. Journal of Econometrics 61, 273–303.

Christensen, L.R., Greene, W.H., 1976. Economies of scale in U.S. electric power generation. Journal of

Political Economy 84, 655–676.

Diewert, W.E., 1982. Duality approaches to microeconomic theory. In: Arrow, K.J., Intriligator, M.D.

(Eds.), Handbook of Mathematical Economics, Vol. II. North-Holland, Amsterdam.

Diewert, W.E., Wales, T.J., 1987. Flexible functional forms and global curvature conditions.

Econometrica 55, 43–68.

Diggle, P.J., Liang, K.Y., Zeger, S.L., 1994. Analysis of Longitudinal Data. Clarendon Press, Oxford.

Farrell, M.J., 1957. The measurement of productive efficiency. Journal of the Royal Statistical Society,

Series A 120, 253–281.

Fernandez, C., Koop, G., Steel, M.F.J., 2000. A Bayesian analysis of multiple output stochastic frontiers.

Journal of Econometrics 98, 47–79.

Gelfand, A.E., Smith, A.F.M., 1990. Sampling based approaches to calculating marginal densities.

Journal of the American Statistical Association 85, 398–409.

Geweke, J., 1993. Bayesian treatment of the independent Student-t linear model. Journal of Applied

Econometrics 8, S19–S40.

Geweke, J., 1999. Using simulation methods for Bayesian econometric models: inference, development

and communication (with discussion and rejoinder). Econometric Reviews 18, 1–126.

Greene, W.H., 1980. On the estimation of a flexible frontier production model. Journal of Econometrics 13

(1), 101–115.

Greene, W.H., 1993. The econometric approach to efficiency analysis. In: Fried, H.O., Lovell, C.A.K.,

Schmidt, S.S. (Eds.), The Measurement of Productive Efficiency: Techniques and Applications. Oxford

University Press, Oxford, pp. 68–119.

ARTICLE IN PRESS


Greene, W.H., 2001. New Developments in the Estimation of Stochastic Frontier Models with Panel Data.

Department of Economics, Stern School of Business. New York University, NY.

Griffiths, W.E., 2003. Bayesian inference in the seemingly unrelated regressions models. In: Giles, D.,

(Ed.), Computer-aided Econometrics. Marcel Dekker, New York (forthcoming).

Kaparakis, E.I., Miller, S.M., Noulas, A., 1994. Short-run cost-inefficiency of commercial banks: a flexible

stochastic frontier approach. Journal of Money, Credit and Banking 26, 875–893.

Koop, G., Steel, M.F.J., 2001. Bayesian analysis of stochastic frontier models. In: Baltagi, B. (Ed.), A

Companion to Theoretical Econometrics. Blackwell, Oxford, pp. 520–573.

Koop, G., Steel, M.F.J., Osiewalski, J., 1995. Posterior analysis of stochastic frontier models using Gibbs

sampling. Computational Statistics 10, 353–373.

Koop, G., Osiewalski, J., Steel, M.F.J., 1997. Bayesian efficiency analysis through individual effects:

hospital cost frontiers. Journal of Econometrics 76, 77–105.

Kumbhakar, S.C., 1987. The specification of technical and allocative inefficiency in stochastic production

and profit frontiers. Journal of Econometrics 34, 335–348.

Kumbhakar, S.C., 1997. Modeling allocative inefficiency in a translog cost function and cost share

equations: an exact relationship. Journal of Econometrics 76, 351–356.

Kumbhakar, S.C., Lovell, C.A.K., 2000. Stochastic Frontier Analysis. Cambridge University Press,

New York.

Liu, Q., Pierce, D.A., 1994. A note on Gauss–Hermite quadrature. Biometrika 81, 624–629.

Longford, N.T., 1994. Logistic regression with random coefficients. Computational Statistics and Data

Analysis 17, 1–15.

Maietta, O.W., 2002. The decomposition of cost efficiency into technical and allocative efficiency with

panel data of Italian dairy farms. European Review of Agricultural Economics 27, 473–495.

McCulloch, C.E., 1994. Maximum likelihood variance components estimation for binary data. Journal of

the American Statistical Association 89, 330–335.

Meeusen, W., van den Broeck, J., 1977. Efficiency estimation from Cobb–Douglas production functions

with composed error. International Economic Review 8, 435–444.

Pinheiro, J.C., Bates, D.M., 1995. Approximations to the log-likelihood function in the nonlinear mixed-

effects model. Journal of Computational and Graphical Statistics 4, 12–35.

Roberts, G.O., Smith, A.F.M., 1994. Simple conditions for the convergence of the Gibbs sampler and

Metropolis–Hastings algorithms. Stochastic Processes and their Applications 49, 207–216.

Schmidt, P., Lovell, C.A.K., 1979. Estimating technical and allocative inefficiency relative to stochastic

production and cost frontiers. Journal of Econometrics 9, 343–366.

Tanner, M.A., Wong, W.H., 1987. The calculation of posterior distributions by data augmentation.

Journal of the American Statistical Association 82, 528–550.

Tierney, L., 1994. Markov chains for exploring posterior distributions (with discussion). Annals of

Statistics 22, 1701–1762.

Tsionas, E.G., 1999. Full likelihood inference in normal-gamma stochastic frontier models. Journal of

Productivity Analysis 13, 179–201.

Vonesh, E.F., Chinchilli, V.M., 1997. Linear and Nonlinear Models for the Analysis of Repeated

Measurements. Marcel Dekker, New York.

Wolfinger, R.D., 1993. Laplace’s approximation for nonlinear mixed models. Biometrika 80, 791–795.

Wolfinger, R.D., Lin, X., 1997. Two Taylor-series approximation methods for nonlinear mixed models.

Computational Statistics and Data Analysis 25, 465–490.

Zellner, A., 1971. Introduction to Bayesian Inference in Econometrics. Wiley, New York.

Documents

Measuring technical and allocative inefficiency in the translog cost system: a Bayesian approach