Upload
jake-daedalus
View
14
Download
0
Embed Size (px)
Citation preview
http://smr.sagepub.com/Research
Sociological Methods &
http://smr.sagepub.com/content/40/3/419The online version of this article can be found at:
DOI: 10.1177/0049124111415367
July 2011 2011 40: 419 originally published online 29Sociological Methods & Research
Robert M. O'BrienConstrained Estimators and Age-Period-Cohort Models
Published by:
http://www.sagepublications.com
can be found at:Sociological Methods & ResearchAdditional services and information for
http://smr.sagepub.com/cgi/alertsEmail Alerts:
http://smr.sagepub.com/subscriptionsSubscriptions:
http://www.sagepub.com/journalsReprints.navReprints:
http://www.sagepub.com/journalsPermissions.navPermissions:
http://smr.sagepub.com/content/40/3/419.refs.htmlCitations:
What is This?
- Jul 29, 2011Proof
- Aug 15, 2011Version of Record >>
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
Constrained Estimatorsand Age-Period-CohortModels
Robert M. O’Brien1
Abstract
If a researcher wants to estimate the individual age, period, and cohort coeffi-cients in an age-period-cohort (APC) model, the method of choice is con-strained regression, which includes the intrinsic estimator (IE) recentlyintroduced by Yang and colleagues. To better understand these constrainedmodels, the author shows algebraically how each constraint is associatedwith a specific generalized inverse that is associated with a particular solutionvector that (when the model is just identified under the constraint) producesthe least square solution to the APC model. The author then discusses the ge-ometry of constrained estimators in terms of solutions being orthogonal to con-straints, solutions to various constraints all lying on a line single line inmultidimensional space, the distance on that line between various solutions,and the crucial role of the null vector. This provides insight into what character-istics all constrained estimators share and what is unique about the IE. The firstpart of the article focuses on constrained estimators in general (including the IE),and the latter part compares and contrasts the properties of traditionally con-strained APC estimators and the IE. The author concludes with some cautionsand suggestions for researchers using and interpreting constrained estimators.
Keywords
age-period-cohort models, constrained model estimation, intrinsic estimator
Article
1University of Oregon, Eugene, OR, USA
Corresponding Author:
Robert M. O’Brien, 720 PLC University of Oregon, Eugene, OR 97403
Email: [email protected]
Sociological Methods & Research40(3) 419–452
ª The Author(s) 2011Reprints and permission:
sagepub.com/journalsPermissions.navDOI: 10.1177/0049124111415367
http://smr.sagepub.com
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
Constrained estimators are the most popular approach to estimating the indi-
vidual coefficients in age-period-cohort (APC) accounting models.1 Conven-
tionally, a single constraint is placed on these models to make them just
identified. Unfortunately, since each of the constrained models fits the
data equally well, the validity of the chosen constraint cannot be judged
from the model fit. Instead, in the conventionally constrained APC models,
the researcher must decide on the basis of theory or past research the appro-
priateness of the constraint used to identify the model. Setting the constraint
in a defensible manner, however, is the Achilles heel of this approach, and
the constraint chosen can greatly affect the estimated coefficients.
Almost 30 years ago, Kupper et al. (1983) discussed a constrained estima-
tor for age, period, and cohort effects that they called the principal compo-
nents estimator. This estimator produces coefficients identical to those of
the recently introduced intrinsic estimator (IE), which is also a constrained
estimator. Kupper et al. (1983) note, as proved by Yang, Fu, and Land
(2004) for the IE, that the principal components estimator has minimum var-
iance. Beyond this, however, Kupper et al. (1983) did not appear to believe it
had any special usefulness in the analysis of APC data, pointing out that us-
ing this estimator could lead to more bias (in the sense of differences between
the expected value of the estimates and the true underlying generating pa-
rameters) than the use of some other constraint (Kupper et al. 1983).
Because of its minimum variance property and other additional properties,
Yang and associates recommend using the IE in the analysis of APC data.
Yang et al. (2004) correctly note the important contributions of Fu (Fu
2000; Fu, Hall, and Rohan 2004; Knight and Fu 2000) in the development
of the IE and in investigating its properties. It is Fu who has most extensively
published work on the IE.
Yang et al. (2004) recently introduced the IE to sociologists. After the spe-
cific introduction of the IE to sociologists in Sociological Methodology in
2004, it was further described in an article in the American Journal of Soci-
ology (Yang et al. 2008). Yang and associates (2008:1699) view the IE as ‘‘a
general-purpose method of APC analysis with potentially wide applicability
in the social sciences.’’ They review several properties of the IE (many of
which, as I discuss in the following, are shared with other constrained
estimators).
Given the widespread use of constrained APC models and the introduc-
tion of a new and arguably improved constraint into the sociological litera-
ture, this article explicates and clarifies the basic properties of constrained
linear models as they are used in the APC context. This should help research-
ers in several ways: (1) understanding the process of choosing an appropriate
420 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
constraint, (2) comprehending the relationship of the results based on using
different constraints, (3) untangling which characteristics are unique to the IE
and which it shares with other constrained estimators, and (4) facilitating
a comparison of the IE to other constrained estimators.
I begin with a brief discussion of the general APC accounting model and
the identification problem and then move to the use of generalized inverses,
which implement constrained estimation. These inverses provide a set of sol-
utions even though the unconstrained model is not identified. I then show
how these constraints can be implemented in different ways.
Next I turn to the geometry of constrained estimation with the aim of de-
veloping a common framework for evaluating different constrained estima-
tors. I then address, among other issues, bias in constrained estimators, the
relationship of the constraint to the results it produces, the relationship of
the IE estimator to other constrained estimators, and some unique features
produced by the constraint used in the IE approach. That discussion is fol-
lowed by a simulation designed to show some of these properties and to dem-
onstrate how a series of different constraints works in practice. I conclude
with an evaluation of the potential advantages of the IE relative to other con-
strained estimators and one potentially helpful situation in which researchers
might gain confidence from using different constrained estimators.
In the next two sections I briefly describe the accounting (multiple clas-
sification) model utilized in this literature and the identification problem
that analysts using this model face.
Multiple Classification Analysis for APC Models
Age groups, periods, and cohorts are often coded with dummy variables by
APC analysts. In this article I will use a similar coding scheme: effect coding.
I employ this coding because it is used in the IE approach (Kupper et al.
1983, 1985; Yang et al. 2004, 2008). This coding (as well as dummy variable
coding) does not constrain the functional form of the relationship between
the dependent variable and the age groups, periods, and cohorts. I use the
multiple classification model represented in equation (1) throughout this
article,2
Yij ¼ mþ ai þ pj þ ca�iþj þ eij; ð1Þ
where Yij is the age-period–specific value of the dependent variable, m repre-
sents the intercept, ai represents the effect of the ith age group, pj denotes the
effects of the jth period, ca–i+j represents the effects of the (a–i+j)th cohort
O’Brien 421
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
(where a equals the number of age groups), and eij represents the error term
[E(eij) ¼ 0]. I use age as the row variable, period as the column variable,
and cohorts are on the diagonals (see Table 1 for an example age-period
data matrix).
Rank Deficient Matrix
The identification problem in APC analysis arises because of the linear depen-
dency between the age, period, and cohort variables. If we know a person’s age
group and the period, we can determine their birth cohort (C ¼ P – A); if we
know their age group and cohort, we can determine the period (P ¼ C + A);
and if we know their birth cohort and the period, we can determine their age (A
¼ P – C). This is the linear dependency that prevents a unique solution to the
regression of the dependent variable on the effect coded (or dummy coded)
variables for age groups, periods, and cohorts (even when we have one refer-
ence category for each set of effect coded [dummy coded] variables).
In matrix notation, we can represent the multiple classification equation
(1) as:
Y =Xb+ e; ð2Þ
where Y is an ap × 1 vector of the age-period–specific rates (counts, or some
other form of dependent variable). The X-matrix is an ap × 2(a + p) − 3
matrix containing ones in the first column and effect coded variables in
the remaining columns. The order of the columns can be schematized as
[1, (a – 1), (p – 1), (a + p – 2)], where a – 1 represents the number of age
effect coded variables, p – 1 represents the number of period effect coded
variables, and a + p – 2 represents the number of cohort effect coded variables.
Each of these effect coded variables has ap entries (zeros and ones) in its col-
umn (and minus one in the row corresponding to the reference category) and is
coded to correspond to the ‘‘cell’’ in which the age-period–specific value of Y
Table 1. Age-Period Table With Data Generated as Described in the Text
Period 1 Period 2 Period 3 Period 4 Period 5
Age 1 10.5 11.0 11.5 14.0 14.5Age 2 10 10.5 11.0 13.5 14.0Age 3 9.0 9.0 9.5 12.0 12.5Age 4 8.0 8.0 8.0 10.5 11.0Age 5 7.0 7.0 7.0 9.0 9.5
422 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
resides.3 The last categories of age, period, and cohort serve as the reference
categories in our representation.
When a regular inverse exists, solving an equation of the form of (2) for
a unique set of least square regression coefficients is trivial:
b= X 0Xð Þ�1X 0Y , ð3Þ
where X 0 is the transpose of X and the superscripted –1 indicates the inverse.
The problem with this solution in the APC context occurs because of the lin-
ear dependency in the X-matrix, which means that the standard inverse of
X 0X does not exist. If we label the number of columns in X as m [¼ 2(a +
p) – 3], only m – 1 of these columns are linearly independent.
This linear dependency means that there is a set of coefficients that when
multiplied times the columns of X produces a column vector of zeros (with
the restriction that not all of these multiplying coefficients are zero). This
vector of coefficients is called the null vector. In general, one and only
one such vector of coefficients exists for the APC model (this vector is
unique up to multiplication by a scalar). In the language of linear algebra,
we say that there is a nontrivial solution to the a × p homogeneous equa-
tions; and we can write Xv ¼ 0, where X is the ap × 2(a + p) – 3 design
matrix and v is a 2(a + p) – 3 × 1 vector (the null vector). This vector is
said to be in the null space of X. The existence of only one such vector indi-
cates that the rank of X is just one less than full column rank and that a single
linear constraint should allow for a solution to the identification problem.
Constraints and Generalized Inverses
Even with a rank deficient matrix, it is still possible to find a least squares
solution to (2) by using a generalized inverse (Searle 1971). Typically, the
generalized inverse is denoted with a superscripted minus rather than a super-
scripted minus 1. I have added the subscript c to denote that generalized in-
verses differ depending upon the constraint with which they are associated:
X 0Xð Þ�c . For the same reason I have subscripted the solution vector for b with
c. That is, a particular set of solutions is associated with the particular gen-
eralized inverse used to solve the equation and that inverse is associated with
a particular constraint on the solutions:
bc = X 0Xð Þ�c X 0Y : ð4Þ
This allows us to compute solutions in the constrained APC model, and
each solution is determined by the constraint. The solution produced is a least
O’Brien 423
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
squares solution; that is, each solutions (bc) associated with the particular
constraint produces the same predicted values of Y and these values minimize
the sum of the squared residuals. Thus, �(Y � Y )2 is the same no matter
which generalized inverse is used to solve the equation.4 It is well known
in APC modeling that the fit cannot be used to differentiate the validity of
different constraints (assuming the use of just one constraint so that the mod-
el is just identified).
The way that constrained estimation is typically employed in APC anal-
ysis is to place a single constraint on X so that the model is just identified.
I will use the symbol Gc to represent the generalized inverse associated
with a particular constraint, rather than the more awkward notation(X 0X )�c .
With the generalized inverse associated with a particular constraint in
hand, we multiply X 0Y in (3) by the generalized inverse to obtain a solution:
bc =GcX 0Y . This solution is determined by the constraint. Mazumdar, Li,
and Bryce (1980) explicate this correspondence between a constraint, its gen-
eralized inverse, and the solution. They show how to derive a specific gen-
eralized inverse from the constraint.
When a constraint is used in the APC context, the constraint convention-
ally involves setting two of the coefficients associated with age, period, or
cohort to be equal. The choice of the constraint then determines a generalized
inverse used to solve for the age group, period, and cohort coefficients. The
choice of a constraint is crucial, since it determines the set of estimated val-
ues (bc) and whether those estimated values are unbiased. The assumption is
that c0β= 0, where c is the m × 1 vector for the constraint and β is the m × 1
vector of population effect coefficients that generated the data.5 If c0b 6¼ 0,
the estimate under that constraint will be biased in that its expected value
will not equal the value of parameters that generated the data (for a further
explication of the bias associated with violating this assumption, see Kupper
et al. 1983). This is true for the conventionally used constraints and the con-
straint associated with the IE.
The constraint associated with the IE is that the solution must be perpen-
dicular to the null vector (the vector that when pre-multiplied by X produces
an m × 1 zero vector). Specifically, if v represents the null vector, the as-
sumption is that v0β= 0. Since v is a specific constraint, we can write
c0β= 0, where it will be clear from the context that c is the null vector or a dif-
ferent constraint. To the extent that this assumption is not true, the estimates
associated with this constraint will be biased in the sense that they are biased
estimates of the parameters that generated the age-period–specific rates. As I
discuss in the following, when Yang et al. (2004, 2008) rightly assert that
their estimate is unbiased, it is an unbiased estimate of the least squares
424 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
solution in the subspace that is perpendicular to the null vector. That does not
mean that the IE is an unbiased estimate of the parameters that generated the
age-period–specific rates, the generating parameters.
To make the discussion more concrete, Figure 1 presents three constraints.
These constraints are based on a five-by-five age by period matrix, where the
columns of the X matrix contain one intercept, four age group effect coded
variables, four period effect coded variables, and eight cohort effect coded
variables. The first two constraints are conventional constraints that might
be used in APC analysis. As previously argued in the literature (Kupper
et al. 1985; Mason et al. 1973; Smith 2004), the constraint should be based
on theory and/or empirical evidence that the constraint applies to the popu-
lation under consideration. The third constraint is the one associated with the
IE. The first two constraints fix two of the effect coefficients to be equal to
each other and the third uses the null vector of X as the constraint. The as-
sumption for unbiased estimates of the generating parameters is that the
dot product of the constraint vector and the vector of the parameter values
associated with the process that generated the Y-values is equal to zero
(the reference categories are omitted). In the first two cases this will be
true; respectively, if the population parameter values for age1 and age2 are
equal to each other or if the population parameter values for cohort7 equals
the value for cohort8. The assumption when using the constraint associated
with the null vector is more complicated (though the IE is easily solved for
using an ‘‘add-on’’ program for STATA; cited in Yang et al. 2008); specif-
ically, for this five-by-five age-period matrix, the assumption is that 0.000 ×bint + (− 0.267 × bage1) + (−0.134 × bage2) + . . . + (.0401 × bcoh8) ¼ 0.
We can write the assumptions associated with each of these restrictions as
c0(age1= age2) · β= 0, c0(coh7= coh8) · β= 0, and c0(nullvector) · β= 0. Each of these
places a linear constraint on the solution. The question is which one of these
constraints is best with respect to obtaining a substantively correct result (as-
suming that the researcher wants to determine the effects of age, period, and
cohort coefficients in generating the outcomes: e.g., suicide rates, homicide
rates, or marriage rates).
Mazumdar et al. (1980) implement the constraints in the following man-
ner: (1) Compute X 0X ; (2) replace the last row of X 0X with the constraint that
we chose to use; for example, c0(age1= age2)or c0(nullvector); (3) compute the in-
verse of this new matrix; and (4) replace the last column of this inverse
with zeros. The result is the generalized inverse (Gc) associated with the par-
ticular constraint. Using this process, we can find a set of least squares sol-
utions under a particular constraint: bc =GcX 0Y .6 When the null vector is
substituted for the aforementioned last row of X 0X, the generalized inverse
O’Brien 425
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
that results is one that yields a solution for bc that is the same as if we had
used the Moore-Penrose generalized inverse: the inverse used in the IE ap-
proach (Mazumdar et al. 1980; Yang et al. 2008). This result occurs because
the Moore-Penrose generalized inverse institutes this constraint, although the
Moore-Penrose generalized inverse has some other valuable properties for
those using matrix algebra.
Simple Implementations of Constrained Regression
Researchers wanting to implement the first two constraints in Figure 1 may
choose to do so using a simple recoding of their data. To implement the first
constraint based on age group 1 and age group 2 having the same effects, one
can simply create a new variable (newage1_2) that is the sum of age1 and
age2 and use it in the analysis rather than age1 and age2. This is equivalent
to creating a single column in X that is equal to the age1 column plus the age2
column and using this column in place of the columns for age1 and age2 in
the regression analysis. The coefficient associated with newage1_2 is the co-
efficient for both age1 and age2. To implement the second constraint, one
can add the variable for cohort7 and cohort8 to create the new variable (new-
cohort7_8) and use it in the analysis instead of cohort7 and cohort8. It is
cage1=age2
ccoh7=coh8
cnull vector
intercept 0.000 0.000 0.000age1 1.000 0.000 -0.267age2 -1.000 0.000 -0.134age3 0.000 0.000 0.000age4 0.000 0.000 0.134
period1 0.000 0.000 0.267period2 0.000 0.000 0.134period3 0.000 0.000 0.000period4 0.000 0.000 -0.134cohort1 0.000 0.000 -0.535cohort2 0.000 0.000 -0.401cohort3 0.000 0.000 -0.267cohort4 0.000 0.000 -0.134cohort5 0.000 0.000 0.000cohort6 0.000 0.000 0.134cohort7 0.000 1.000 0.267cohort8 0.000 -1.000 0.401
Figure 1. Three different constraints on the X-matrix
426 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
a straightforward extension to implement a constraint in which age1 ¼ 0.5
× age2. One can construct this constraint by adding 0.5 × age1 to age2
(newage1_2 ¼ age2 + 0.5 × age1). The new column generated by this
transformation contains 0.5, 1, 0, 0, -1.5, 0.5, 1, . . . (where –1.5 is for the
reference group, the oldest age group in our example), and this pattern repeats
until there are no more rows in this column for this newly created variable.
Then we can run a regular regression using newage1_2, age3, . . . coh7,
coh8 as the regressors. The coefficient for newage1_2 is the coefficient for
age2 and .5 times that coefficient is the coefficient for age1. Another approach
uses constrained regression, which employs the constraint: age1 ¼ 0.5 ×age2. Constrained regression can also be used with the first two examples.
Although almost all researchers will want to use the convenient ‘‘add-on’’
program for STATA (cited by Yang et al. 2008) to calculate the IE, it is
possible to implement the null vector constraint and produce the coefficient
estimates from the IE using constrained regression or by transforming the
columns of X using the strategy described in the previous paragraph. The
null vector in Figure 1 means that 0 ¼ 0.00intercept – 2.67a1 – .134a2 +
0.00a3 + .134a4 + . . . +.267c7 + .401c8. Adding 2.67a1 to both sides of the
equation and dividing both sides of the equation by 2.67 yields a1 ¼ 0.00in-
tercept –.50a2 + 0.00a3 + .50a4 + . . . – 1.00c7 – 1.50c8. In this particular rep-
resentation, we see that a1 is a constrained linear function of the columns of
the other independent variables in X. If we implement this constraint (that is,
age1 ¼ –.50age2 + 0.50age4 + . . . – 1.00coh7 – 1.50coh8) in a constrained
regression program, we obtain the same estimates produced by the STATA-
based IE program. A reviewer of this article suggested transforming the col-
umns of independent variables in the following manner (although noting that
the procedure is tedious): newage2 ¼ age2 – 0.5 × age1; newage3 ¼ age3;
newage4 ¼ age4 + 0.5 × age1; . . . ; newcoh7 ¼ coh7; newcoh8 ¼ coh8 +
1.5age1. Using these values as regressors implements the ‘‘null vector con-
straint’’ and produces coefficient estimates equivalent to those from using
the IE.
The Geometry of Constrained Regression
When X 0X is one less than full column rank, as it is in the APC situation, the
solution to the m × 1 b vector of coefficients is in an m dimensional solution
space. Specifically, it lies in an m – 1 subspace of that solution space that is
orthogonal to its constraint. We know from the previous sections how to
estimate the m elements of the b vector by using a generalized inverse asso-
ciated with a constraint, by using constrained regression, or by algebraically
O’Brien 427
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
applying the constraint using the columns of X. To provide intuition about
how constrained regression works geometrically, we use a simple illustration
in which X 0X is a 3 × 3 matrix. Many essential properties of constrained
regression can be illuminated using this approach.
In the example I use the following specific data for Xb ¼ Y:
1 4 2
1 2 1
1 4 2
1 2 1
2664
3775
b1
b2
b3
24
35=
19
11
19
11
2664
3775; ð5Þ
which produces the normal equations for (X 0X)b ¼ (X 0Y)
4 12 6
12 40 20
6 20 10
24
35
b1
b2
b3
24
35=
60
196
98
24
35: ð6Þ
Here X 0X is singular reflecting the linear dependency in X. The second col-
umn is twice the third and the null vector is (0, 1, –2),7 since 0 times the first
column of X plus 1 times the second column of X plus –2 times the third col-
umn of X results in a 3 × 1 column vector of zeros.
Figure 2 has three axes b1, b2, and b3 (the three-dimensional solution space)
with the b1 axis representing the intercept. The null vector (0, 1, –2) is repre-
sented by the stippled line through the origin of (b1, b2, b3) that lies on the b2-b3
plane (since its b1 value is zero). The slope of the null vector on the b2-b3 plane
is –2.0 (if b2 increases by 1.0, b3 decreases by 2.0). The line of solutions, the
line on which every constrained solution appears, is three units above the
b2-b3 plane. Its slope with respect to the b2-b3 plane is the same as that of
the null vector, –2.0. The line of solutions and the null vector are parallel to
each other. The line of solutions for our data is always three units above the
b2-b3 plane; the value of the intercept remains constant at 3.0 for all solu-
tions using the constrained regression approaches outlined earlier for the
data in the example. We can determine the line of solutions in this simple
example by using (6), if we bear in mind that the intercept (b1) ¼ 3. If b1 ¼3 and we set b3 to zero b2 must be 4, and if we set b2 to zero b3 must be 8. If
we connect these two points, (3, 4, 0) and (3, 0, 8), we have the line on
which all of the constrained solutions in this example must fall. This line
is labeled as the line of solutions in Figure 2. Another way to determine
this line is to find the intersection of the first two planes represented by
the (first two) normal equations in (6). The problem is that the remaining
428 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
normal equation represents a plane on which this line lies. Since this plane
does not intersect the line of solutions in a single point (because of the lin-
ear dependency), we do not know where on the line of solutions to chose
a solution. We can force this plane to intersect the line of solutions at a sin-
gle point by setting a constraint on the solution.
We have seen earlier that we cannot solve this problem (equation 6) using
a regular inverse. We can, however, use a generalized inverse to find a solu-
tion for the bs for our example data. Different generalized inverses are asso-
ciated with different constraints and yield different solutions—all lying on
the line of solutions that is parallel to the null vector. Using the system
b2
b3
4
8
b1
(0,0,0)
90o
Line of Solutionsbc = GcX’Y
(Values in our Example)
3
3
Null Vector of X’X
(0, 1, -2)
Figure 2. Geometric view of constrained regression in three dimensions
O’Brien 429
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
described earlier (Mazumdar et al. 1980) to obtain the generalized inverses
for the data in (6) and solving for the bc vectors under different constraints,
we can confirm that all of the solutions fall on this line of solutions depicted
in Figure 2. We can, of course, obtain the same estimates using constrained
regression or transformations of the columns of X to implement the con-
straint. For example, to obtain the solution for the constraint (0, –1, 1), which
implies that b2 ¼ b3, we substitute this constraint vector for the last row of
the X 0X; find the inverse of this new matrix; and then substitute a column of
zeros for the last column of this inverse matrix. Pre-multiplying X 0Y by this
generalized inverse yields the solution vector (3.0, 2.67, 2.67) 0.Figure 3 depicts this solution geometrically. The constraint passes through
the origin (0, 0, 0) and has a slope of –1.0 with the b2-b3 plane. The solution
plane must be perpendicular to the constraint (it must have a slope of 1.0 with
respect to the b2-b3 plane) and pass through the b1 axes. The plane depicted in
Figure 3 has just such a slope with respect to the b2-b3 plane, is perpendicular
to the b2-b3 plane, and passes through the origin. Where this solution plane
intersects the line of solutions (3.0, 2.67, 2.67) is the solution under this
constraint.
If we repeated this process to obtain the ‘‘Moore-Penrose solution,’’ the
difference is that this solution is orthogonal to the null vector.8 Therefore,
we substitute the null vector for the last row of the X 0X; find the inverse
of this new matrix; and then substitute a column of zeros for the last column
of this inverse matrix. Pre-multiplying X 0Y by this generalized inverse yields
the solution vector (3.0, 3.2, 1.6) 0. Geometrically, the slope of the null vector
(0, 1, –2) to the b2-b3 plane is –2.0, and the ‘‘slope’’ of the solution plane rel-
ative to the b2-b3 plane (which must be orthogonal to this constraint) is .5.
This solution is depicted in Figure 4 as occurring where the solution plane
intersects the line of solutions. This plane is orthogonal to the null vector
(its slope with respect to the b2-b3 plane is 0.5), and it intersects the line
of solutions at the point (3.0, 3.2, 1.6). It is easy to check that this solution
vector is perpendicular to the null vector: namely, (0, 1, –2) · (3.0, 3.2,
1.6) 0 ¼ 0.
With the constraint (0, –1, 0.5) the estimated coefficients are (3.0, 2.0,
4.0), and using the constraint (0, –1, 0.1) yields (3.0, 0.6667, 6.6667). The
constraints do not allow just any solution to emerge, but only solutions
that are on the single line (the line of solutions) that is parallel to the null vec-
tor. Note also that each solution is orthogonal to its constraint. For example,
(0, –1, 0.5) · (3.0, 2.0, 4.0) 0 ¼ 0 and (0, –1, 0.1) · (3.0, 0.6667, 6.6667) 0 ¼ 0.
If we substitute these solutions (bc) into (5), each one produces the same pre-
dicted values of Y. This is part of what it means to be on this line of solutions.
430 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
If we were careful geometers, we could produce the constrained solutions
for the data in (6) corresponding to any constraint by first constructing the
line of solutions, then the solution plane that is orthogonal to the constraint,
and then determining where this plane intersects the line of solutions. We
would always find that the intercept is 3.0, since the line of solutions (for
this data) is always 3.0 units above the b2-b3 plane.
b2
b3
4
8
b1
Constraint(0,1,-1)
2.67
2.67
b2 = b3
Null Vector
45o
Solution(b1=3.0, b2= 2.67, b3=2.67)
Line of Solutionsbc = GcX’Y
(Values in our Example)
Figure 3. Geometric view of constrained regression in three dimensions: b2 ¼ b3
O’Brien 431
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
We can characterize the differences between various constrained solu-
tions by means of the factor kv, where k is a scalar and v is the null vector.
For example, one of our solutions is (3.0, 2.6667, 2.6667) and another is
(3.0, 0.6667, 6.6667). We can move from the first to the second solution
by adding –2 × (0, 1, –2) to the first solution. This manipulation simply
moves us along the line of potential solutions, here moving from (3.0,
2.6667, 2.6667) to reach (3.0, 0.6667, 6.6667). This relationship keeps all
b2
b3
4
8
b1
(0,0,0) 3.20
1.60
b3 = ½ b2
Null Vector
22.5o
Line of Solutionsbc = GcX′Y
(Values in our Example)
Moore-Penrose Solution(Values in Our Example)(b1=3.0; b2=3.2; b3=1.6)
Figure 4. Geometric view of constrained regression in three dimensions: b3 ¼ 1⁄2 b2:The Moore-Penrose solution
432 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
of the possible solutions on the line of solutions. Note that k depends on how
we scale the null vector, since any scalar multiple of (0, 1, –2) is also the null
vector.
Two ways to view this movement are first, as a movement from one
solution to the other solution by moving parallel to the axes. To move
from the point (3.0, 2.6667, 2.6667) to (3.0, 0.6667, 6.6667), move 0.0 units
on the b1 axis; –2 units parallel to the b2 axis and up 4 units parallel to the b3
axis where we intercept the line of solutions at (3.0, 0.6667, 6.6667). The sec-
ond geometric interpretation is based on the length that must be traveled on
the line of solutions from one solution to the other. The length of the null vec-
tor is the square root of the sum of its squared components: In our example
the square root of (02 + 12 + –22) ¼ 2.236. The distance along the line of
solutions between (3.0, 2.6667, 2.6667) and (3.0, 0.6667, 6.6667) is twice
this distance: 4.472. The unit of measurement for k is the length of the
null vector. We must move 2 units [2 × sqrt(v 0v)] along the line of solutions
to reach (3.0, 0.6667, 6.6667) from (3.0, 2.6667, 2.6667). The difference be-
tween any two constrained estimates may be written as bc – bc) ¼ kv, where
c and c* are two different constraints.
When we move to a situation in which there are four or more dimensions,
the solutions for any one set of data using different constraints still lie on
a line that is parallel to the null vector. This (one dimensional) line is then
in an m dimensional space rather than in a three-dimensional space. The
line of solution is intersected by an m – 1 dimensional hyperplane that is or-
thogonal to the constraint. Where this hyperplane intersects, the line of sol-
utions provides the solution associated with the particular constraint. We can
reach all of the solutions for the constrained estimators on the line by taking
any solution (from one of the generalized inverses) and adding a scalar times
the null vector to it. The scalar, k, still tells us how far we must travel on the
line of solutions with the units of measurement for k being the length of the
null vector (which now has m elements). Each set of solutions is still perpen-
dicular to the constraint.
Relevance to APC Models
In the class of APC models that I examine, a single linear constraint is placed
on the model that makes the model just identified. In this class of models, the
solution is always orthogonal to any constraint. The intercept is always the
same no matter which linear constraint is chosen. All of the solutions using
a single linear constraint lie on a single line of solutions in multidimensional
space. All of these solutions fit the model equally well in terms of predicted
O’Brien 433
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
values. All of the solutions are related to each other by kv. The solution is
determined by the intersection of the line of solutions with an m – 1 hyper-
plane, and the direction of that hyperplane is orthogonal to the constraint.
Before leaving this introduction to the geometry of constrained regres-
sion, I should mention a unique property of using the null vector as a con-
straint. The solution that it produces is closer to its constraint than using
any other constraint. In Figure 4 the solution plane perpendicular to the
null vector has a slope of .50 with respect to the b2-b3 plane (a one-unit in-
crease on b2 is associated with a 1⁄2 -unit increase in X3). The length of the
solution vector is the distance from the b1, b2, b3 origin to the point of solu-
tion on the line of solutions. For example, the length of the vector from the
origin to the solution on the line of solutions using the null vector is 4.669 (¼sqrt(3.02, 3.22 + 1.62)). Using the constraint (0, 1, –1), the distance to the so-
lution is 4.819 (¼ sqrt(3.02, 2.6672 + 2.6672)). This is often cited as an ad-
vantage of using the Moore-Penrose solution: ‘‘If we want to single out
one particular member of this solution set of vectors as a representative,
we might want to pick the one with the smallest length’’ (Press et al.
1992:62). We can also see that it is representative in the sense that it is in
the ‘‘middle’’ of a line of solutions that stretches out in either direction. It
may be this property that inspired Smith’s (2004:116) comment: ‘‘There is
also a sense in which the IE is an average of CGLIM [Constrained General-
ized Linear Model] estimates.’’
Some Shared Characteristics of Constrained APCModels
In this section I discuss some of characteristics that are shared by all con-
strained APC models, including the IE. I also discuss the characteristics of
the IE that are different from other constrained models. The aim is to provide
researchers with a basis for: understanding how constraints are related to sol-
utions, comprehending how the estimated coefficients from the various con-
strained models are related, and judging how much confidence should be
placed in the estimates.
A Solution for All of the Effect Coefficients
Because the APC model without an additional constraint is not of full column
rank, it is underidentified and there is no solution that provides a unique vec-
tor of age, period, and cohort coefficients that corresponds to the generating
434 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
parameters for the data. There are instead an infinite number of solutions for
the age, period, and cohort coefficients that produce least square estimates.
Any constrained regression approach provides estimates of the unique ef-
fect coefficients for each age group, period, and cohort under the particular
constraint. They are unbiased estimates in the sense that they are unbiased
estimates under a particular constraint. The crucial question, from our per-
spective, is whether the parameter estimates are unbiased, in the sense that
their expected values equal the values of the parameters that generated Y:
Does, in fact, c0b= 0?
Solution Space Is Perpendicular to the Constraint
Yang et al. (2004:82) note that ‘‘the parameter space P can be decomposed as
P=N⊕Y, where ⊕ is the direct product of two linear spaces that are per-
pendicular to each other; N is the one-dimensional null space of X spanned by
the vector {sB0}, with real number s; and � is the complement subspace or-
thogonal to N.’’ In our notation B0 is the null vector (v) and s is the scalar by
which the null vector may be multiplied, since it is unique up to multiplica-
tion by a scalar. In general, I note that the parameter space can be decom-
posed as P=Bc ⊕Πc where Bc is a linear constraint spanned by the
vector {sc} with real numbers s and constraint vector c; and Πc is a comple-
ment subspace orthogonal to Bc. The linear constraint, like the null vector, is
unique up to multiplication by a scalar. The solution vector must be orthog-
onal to the constraint, since the solution vector is determined by a point on
the constrained hyperplane, and the hyperplane is orthogonal to the con-
straint. Specifically, this point is the intersection of this hyperplane (with
its direction constrained to be orthogonal to the constraint) and the line of
solutions.
As noted earlier, all of the solutions for the different constraints (with
a fixed set of data) must fall on a line parallel to the null vector. This restric-
tion on the solutions occurs because of the linear dependency in the original
data. For each constrained solution, the constraint determines the direction of
the solution plane/hyperplane.
Bias
Arguably the most important consideration in setting a linear restriction is the
degree of bias associated with that restriction. The constraint chosen by the
researcher determines the amount of bias associated with the particular con-
strained estimate. Since the effect coefficients are estimated under this
O’Brien 435
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
assumption, the estimated values will be orthogonal to the constraint:
c0bc = 0. Therefore, we cannot use this observed orthogonality to judge the
validity of the constraint. The crucial question is whether the constraint is
orthogonal to b.
A general definition of bias involves the difference between the expected
value of an estimator and the parameter that it estimates. In this article, I
use bias to refer to the differences between expected values of the estimates
and the values of the parameters (b) of the process that generated the outcome
values. Specifically, it is the difference between the expected values of the es-
timates in the vector of constrained estimates, E(bc), and b: bias ¼ E(bc)� b.
Using this conception of bias, Kupper et al. (1983) note that for APC models
kv=E(bc)� b, where kv measures the distance between the expected value of
the constrained solution and the population parameters that generated the data
(b). This is identical to the distance between any two constrained estimates on
the line of solutions, except that one of the solutions is the ‘‘true’’ solution (the
one representing the process that generated the outcomes). Kupper et al. (1983)
note that this value of k can be computed as: k =�c0b=v0c and that c 0b¼0 when the expected value of the constrained estimate equals the parameter val-
ues that generated the outcome: E(bc)=b.
A different use of the term bias involves the difference between the
expected values of the estimators and the values of the parameters under
a particular constraint: E(bc)=bc. This is an important property for any con-
strained estimator, but the focus of most of the literature has been on bias in
terms of the expected value of the estimated parameters and the parameters
that generated the data.
A careful reading of Yang et al. (2004, 2008) indicates that the IE is an
unbiased estimate of the solution to the APC model that lies in the subspace
that is orthogonal to the null vector. The IE is an unbiased estimator of the bie
parameter values associated with the null vector constraint: E(bie)= bie. The
IE would provide an unbiased estimate of the parameters that generated the
age-period–specific rates, if and only if v0b= 0; just as other constrained es-
timates would provide unbiased estimates of the parameters that generated
the age-period–specific rates if and only if c0b= 0.
Relationship of Constrained Solutions to Each Other
The problem confronting APC analysts who employ constrained regression
(whether using traditional constraints or the IE) is that they cannot find the
unique solution to the normal equations. This occurs because the X matrix
is rank deficient by one. Thus, there are an infinite number of possible
436 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
solutions; however, not just any solution is possible. All constrained solu-
tions are related to each other in that they lie on a single line of solutions
in multidimensional space. In a fundamental sense, the line of solutions is
the most tangible thing we know about the solution to this system of equa-
tions. Any solution to the normal equations must lie on this line and any
of the solutions to the normal equations are ordinary least squares (OLS) sol-
utions. Thus, if our standard assumptions (e.g., those underlying OLS multi-
ple regression) are correct, even though the X matrix is rank deficient by one,
we know that the solution corresponding to the parameters values that gen-
erated the outcome data must fall on this line.
The difference between any two solution vectors in the class of con-
strained estimators examined in this article is kv ¼ (-c01b2=v0c1) × v, where
c1 is one constraint and b2 is the solution under a different constraint. When
we have calculated a single solution, we could determine the difference
between that solution and any other solution generated using a different con-
straint or, of course, determine any other solution. The distance between any
two solutions on the line of solutions is k · ffiffiffiffiffiffiv0vp
), whereffiffiffiffiffiffiv0vp
is the length of
the null vector. Appropriately, the value of k ¼ 0 when the constraint for
a solution and the solution b2 are orthogonal c01b2 = 0. Thus, kv indicates
how much the coefficients based on the two different constraints vary: b2-
b1 ¼ kv.
Model Fit
The solutions for the normal equations associated with the APC model all lie
on the line of solutions and any solution to these normal equations is a least
squares solution. Thus, each of the constrained estimates generates a solution
(bc) that produces the same solutions for the values of the dependent variable.
I showed this for the 3 × 3 case using the data in equation (5), and it extends
to higher dimensions and accounts for the well-established finding that in
APC analysis one cannot distinguish between models on the basis of fit
when they are just identified. This is true for all of the constrained estimates
that we examine.
Setting the Constraint and Zero Influence
Many sources (e.g., Kupper et al. 1983; Rodgers 1982; Smith 2004) caution
against setting the constraint based on the values of the observed dependent
variable. An advantage of the IE approach is that it assures that the researcher
does not give in to this temptation—the constraint is based purely on the
O’Brien 437
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
X-matrix, and in this sense, the analyst does not choose which constraint to
use.
I suggest two cautions. First, one could argue that in some situations with
good theory/past research that it would be better to set the constraint on the
basis of theory and previous research than setting it on this basis of the null
vector of the X-matrix. Kupper et al. (1983:2803-804) conclude: ‘‘Given that
the observed Yij’s are of no help in determining c, the only remaining option
is to make use of any reasonably reliable a priori, data independent, knowl-
edge about the underlying age, period, and cohort effect parameters under
study.’’
The second caution is that having no choice in the constraint used does not
mean that the constraint used is unimportant or has no effect on the outcome
measures. Yang et al. (2008:1704) state: ‘‘The eigenvector B0 [the null
vector] does not depend on the observed rates Y, only on the design matrix
X, and thus is completely determined by the numbers of age groups and pe-
riod groups—regardless of the event rates. In other words, B0 has a specific
form that is a function of the design matrix.’’ This is correct, as noted earlier,
and a potential advantage of using this constraint. But it is not clear what the
following statements means: ‘‘The fact that the fixed vector B0 is indepen-
dent of the response variable Y suggests that it should not play any role in
the estimation of effect coefficients’’ (Yang et al. 2008:1705) or ‘‘Specifical-
ly, the IE imposes the constraint that the direction in parameter space defined
by the eigenvector B0 in the null space of the design matrix X have zero in-
fluence on the parameter vector b0 (i.e., on the specific parameterization of
the vector b that is estimated by the IE)’’ (Yang et al. 2008:1706). If these
statements are meant to convey that constraining the solution to be orthogo-
nal to the vector B0 (the null vector) does not affect the estimates, then this is
incorrect. The solution vector must be orthogonal to the null vector and there-
fore is affected by this constraint. If they mean that the Y does not affect the
constraint that is chosen, then they are correct.
Variance of the Estimators
Yang et al. (2008:1709) note that ‘‘for a fixed number of time periods of data,
the IE is more statistically efficient (has a smaller variance) than any CGLIM
estimator that is obtained from a nontrivial equality constraint on the uncon-
strained regression coefficient estimator.’’
This is an important statistical advantage of using the null vector as a con-
straint and is in agreement with Kupper et al. (1983:2797) who note that the
principle component estimator (a linear transformation of the IE) ‘‘deals with
438 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
the exact linear dependency by involving only the non-zero eigenvalues as-
sociated with the eigenvectors.’’ Because of this, it ‘‘should be preferred on
variance grounds.’’
While noting that this is an important statistical property, I am in agree-
ment with Kupper et al. (1983) that bias (and I and they are using bias in
the sense of estimating the generating mechanism) is likely to be the more
important factor. Kupper et al. (1983:2797) note that to the extent that: v0bdeparts from zero, the principal component estimator is more biased and
‘‘could lead to more bias than the use of some other constraint . . . , so
that the optimal method for choosing c should probably take into account
both bias and variance (i.e., mean square error) considerations. Of
course, when the squared multiple correlation coefficient R2 is fairly close
to 1, a result which seems to occur not infrequently in practice, the bias
becomes the main area of concern.’’
Most Representative Solutions
The constraint used for the IE (the null vector) results in a vector from the
origin to the solutions that is the shortest of all the constrained estimators.
This property can be used to argue that this estimate is a sort of average of
the estimates based on constrained estimation. From the discussion of the ge-
ometry of constrained estimation, it is possible to view this solution as the
balance point of a teeter-totter—with solutions extending in both directions
along the line of solution from this solution point. In a similar vein we can
view the Moore-Penrose solution (the IE solution) as a sort of ‘‘conventional
solution.’’ Given that any of the constrained solutions provide the same fit to
the data, is there one solution that could serve as the conventional solution so
that different researchers would report the same solution in the absence of
evidence of a ‘‘better’’ solution? Some might say yes and that the solution
should be one based on the Moore-Penrose generalized inverse, because of
its statistical properties: ‘‘closest to its solution,’’ being (in a sense) most rep-
resentative, and its variance characteristics.
Convergence
I showed earlier that the scalar k times the null vector (kv) represents the dif-
ference between any two constrained estimators when using the same data
set. This holds for the intrinsic estimator b= bie + kv, where any of the con-
strained estimates, b, equals the estimates based on the intrinsic estimator,
bie, plus k times the null vector. It also holds for the traditional constrained
O’Brien 439
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
estimators b= bc + kv, where any of the constrained estimators (including
the intrinsic estimator) equals any specific constrained estimator, bc, plus k
times the null vector. Each of these solution vectors is a biased estimate of
the parameter vector that generated the outcome data to the extent that it dif-
fers from the parameters values (the age, period, and cohort parameters) that
‘‘nature’’ used to generate the age-period–specific results.
The null vector is unique up to multiplication by a scalar, and Yang et al.
(2008:1709) choose to norm the length of the null vector to 1 in their presen-
tation. They later transform their solution using ‘‘the orthonormal matrix of
all eigenvectors to transform the coefficients of the principal components
regression model to regression coefficients of the estimator B [the intrinsic
estimator]’’ (Yang et al. 2008:1705). Their transformed parameter values
are the same as those that result from our use of constrained regression (or
using generalized inverses) to obtain the parameters of the IE. Since they
use this normed null vector that has a length of 1, they note that as the number
of elements increases (the number of age groups and periods increase) the
elements of the null vector become smaller and ‘‘converge elementwise to
zero’’ (Yang et al. 2008:1709).
They use the formula b= bie + k * v * (in our notation) to represent the
relationship of any constrained estimate (b) to the values estimated by the
intrinsic estimator (bie). I have used the asterisk on v* to emphasize that it
is the normed null vector and on k* to emphasize that this scalar is appropri-
ately scaled, given the normed null vector. I note that I can also write
b= bc + k * v * . Yang et al (2008) argue that the expected value of any con-
strained estimator converges in value to the expected value of (bie) as the
number of periods and/or age groups increases, since the expected value of
v * goes elementwise to zero with such increases. Given b= bc + k * v *
and that bie is a constrained estimator, I could just as easily argue that the ex-
pected value of the intrinsic estimator converges toward the expected value
of any of the constrained estimators as v * goes elementwise zero. I am skep-
tical of this argument because as the normed v goes elementwise to zero there
is no reason to assume that k * remains constant. I have found in simulations
that k * does not remain fixed as the number of periods increases for the
models I have investigated.
On the other hand, convergence can occur for other reasons. It might be
the case that there is a zero trend in the period effects in the long term.
For example, the unemployment rate across a 60-period span may fluctuate
up and down over short periods of time, but with no apparent linear trend in
the long term. If we set a zero linear trend (ZLT) constraint for periods over
a short time span, we may or may not get an accurate estimate of the data
440 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
generating age effect coefficients.9 If we set a ZLT constraint for periods
covering a longer span of time, say 30 years, we are likely to get a more
accurate measure of the generating parameters for age, and a 60-year span
will likely provide an even more accurate measure. This is not surprising, be-
cause the ZLT constraint for periods is more likely to be correct (or closer to
being correct) as the span of periods increases, assuming an up and down pat-
tern of unemployment with very little or no overall trend.10
Investigating the Effects of Different Constraints
It is straightforward to investigate with data what happens to the estimated
age, period, and cohort coefficients as the constraints change. I report the re-
sults using two data sets (one with 5 periods and one with 20 periods). I use
the data from the 5 × 5 age-period matrix in Table 1. I generated the cell
entries by setting the earliest cohort value (cohort 1) to 5 and keeping the co-
hort effect at 5 through the fourth cohort and then increasing the cohort effect
by .50 for each cohort through the ninth and final cohort. The period effect is
two for the first three periods and four for the next two periods. The age ef-
fect is three for the two youngest age groups, two for the next oldest, one of
the next oldest, and zero for the oldest. I begin the simulation by using the
data for the 5 × 5 age-period matrix in Table 1.
When I extend our analysis to 20 periods, I leave the cohort effects as they
are for the 5 × 5 table and continue with no increase or decrease in the co-
hort effect for the newly added cohorts that correspond to the newly added
periods. The period effects are two for the first 3 periods, four for the next
2 periods, two for the next 3 periods, and continue this up-and-down pattern
until we reach the 20th period. The age effects remain the same as they are
for the 5 × 5 table across all 20 periods. When we increase the number of
periods to 20: the number of age groups remains at five and the number of
cohorts increases to 24. Certainly, I do not claim that this data set is somehow
representative of all of the possible generating mechanisms. The number of
mechanisms is infinite, as are the number of possible constraints. The results
point to many important relationships and illustrate many of the points that I
discussed previously.
I expect that if a constraint is consistent with the way the data were gen-
erated, we should obtain results that are consistent with the way the data were
generated. If the constraint is not consistent with the way the data were gen-
erated, we will obtain some other set of results. But whatever the results, they
will fit the data equally well; they will be orthogonal to the constraint used,
they will all lie on a line of solutions, and the intercept will be the same no
O’Brien 441
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
matter what constraint is used. The quandary in APC analysis is to discover
a method that will yield results that are consistent with how ‘‘nature’’ gener-
ated the data.
To compute the constrained estimates, I used constrained regression anal-
ysis in STATA. To calculate the IE, I used the add-on program in STATA
(cited in Yang et al. 2008). Table 2 presents the results. The first four col-
umns show the results when there are five periods and five age groups (based
on the data displayed in Table 1) for four different constraints: age1 ¼ age2,
period3 ¼ period4, cohort7 ¼ cohort8, and the constraint associated with
the IE. The fifth column contains the null vector for the 5 × 5 age-period
matrix. The next four columns show the results for the same four constraints,
but for the case where there are five age groups and 20 periods. The data have
been coded using effect coding so that the sum of the effect coefficients is
zero; and I have used the last category of age groups, periods, and cohorts
as the reference category. We can therefore determine the coefficients for
these reference categories and have reported this in Table 2.11 The final col-
umn of Table 2 contains the null vector for the 5 × 20 age-period matrix.
I first note that the analyses produce results consistent with several prin-
ciples noted earlier. (1) For the same X and Y data the intercept is always the
same no matter which linear constraint is used to identify the models: 10.433
for the data with 5 periods and five age groups and 11.575 for the data with
five age groups and 20 periods. (2) Each of the solutions is perpendicular to
its constraint. This is easily verified by multiplying the transpose of the con-
straint vector times the solution vector. Note that the constraint vector
ignores the reference categories (since they are ‘‘dropped’’ from the analy-
sis); thus, we need to multiply only the constraint times the corresponding
elements of the solution vector. For example, setting age1 ¼ age2 corre-
sponds to a constraint vector of (1, –1, 0, 0, . . . , 0), which when multiplied
times the appropriate elements of the solution vector results in zero (here I
have placed the intercept as the final term to match the output). In the
5 × 5 case, the age1 and age2 coefficients both equal 1.20 so the dot product
[c 0· (solution vector)] is zero; this is also true in the 5 × 20 case. The solu-
tion vector is perpendicular to the constraint vector. Similarly, the transpose
of the null vector times the solution for the IE equals zero. For the 5 × 5
case: –.267 × age1 – .134 × age2 + 0.000 × age3 + . . . + 0.401 × cohort8
+ 0.000 × intercept ¼ 0. (3) All of the solutions lie on a single line of sol-
utions and thus differ from one another by kv. In the 5 × 5 case, to move
from the solution for age1 ¼ age2 to the solution for period3 ¼ period4,
k ¼ 14.967; to move from the solution for the age constraint to the solution
for cohort constraint, k ¼ –3.742; to move from the age constraint to the
442 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
Tab
le2.
Anal
ysis
ofth
eD
ata
Gen
erat
edin
Table
1fo
r5
Peri
ods
and
Exte
nded
to20
Peri
ods
asD
escr
ibed
inth
eTe
xt
Five
age
groups
and
5per
iods
Five
age
groups
20
per
iods
Effec
tsag
e1¼
age2
per
iod3¼
per
iod4
cohort
7¼
cohort
8in
trin
sic
estim
ator
Null
vect
or
age1¼
age2
per
iod3¼
per
iod4
cohort
7¼
cohort
8in
trin
sic
estim
ator
Null
vect
or
age1
1.2
00
–2.8
00
2.2
00
1.3
90
–0.2
67
1.2
00
–2.8
00
2.2
00
1.3
36
–0.0
50
age2
1.2
00
–0.8
00
1.7
00
1.2
95
–0.1
34
1.2
00
–0.8
00
1.7
00
1.2
68
–0.0
25
age3
0.2
00
0.2
00
0.2
00
0.2
00
0.0
00
0.2
00
0.2
00
0.2
00
0.2
00
0.0
00
age4
–0.8
00
1.2
00
–1.3
00
–0.8
95
0.1
34
–0.8
00
1.2
00
–1.3
00
–0.8
68
0.0
25
age5
–1.8
00
2.2
00
–2.8
00
–1.9
90
–1.8
00
2.2
00
–2.8
00
–1.9
36
per
iod1
–0.8
00
3.2
00
–1.8
00
–0.9
90
0.2
67
–0.8
00
18.2
00
–5.5
50
–1.4
44
0.2
38
per
iod2
–0.8
00
1.2
00
–1.3
00
–0.8
95
0.1
34
–0.8
00
16.2
00
–5.0
50
–1.3
76
0.2
13
per
iod3
–0.8
00
–0.8
00
–0.8
00
–0.8
00
0.0
00
–0.8
00
14.2
00
–4.5
50
–1.3
08
0.1
88
per
iod4
1.2
00
–0.8
00
1.7
00
1.2
95
–0.1
34
1.2
00
14.2
00
–2.0
50
0.7
60
0.1
63
per
iod5
1.2
00
–2.8
00
2.2
00
1.3
90
1.2
00
12.2
00
–1.5
50
0.8
27
0.1
38
per
iod6
–0.8
00
8.2
00
–3.0
50
–1.1
05
0.1
13
per
iod7
–0.8
00
6.2
00
–2.5
50
–1.0
37
0.0
88
per
iod8
–0.8
00
4.2
00
–2.0
50
–0.9
69
0.0
63
per
iod9
1.2
00
4.2
00
0.4
50
1.0
98
0.0
38
per
iod10
1.2
00
2.2
00
0.9
50
1.1
66
0.0
13
per
iod11
–0.8
00
–1.8
00
–0.5
50
–0.7
66
–0.0
13
per
iod12
–0.8
00
–3.8
00
–0.0
50
–0.6
98
–0.0
38
per
iod13
–0.8
00
–5.8
00
0.4
50
–0.6
31
–0.0
63
per
iod14
1.2
00
–5.8
00
2.9
50
1.4
37
–0.0
88
per
iod15
1.2
00
–7.8
00
3.4
50
1.5
05
–0.1
13
per
iod16
–0.8
00
–11.8
00
1.9
50
–0.4
27
–0.1
38
(co
nti
nu
ed)
443
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
Tab
le2.
(co
nti
nu
ed
)
Five
age
groups
and
5per
iods
Five
age
groups
20
per
iods
Effec
tsag
e1¼
age2
per
iod3¼
per
iod4
cohort
7¼
cohort
8in
trin
sic
estim
ator
Null
vect
or
age1¼
age2
per
iod3¼
per
iod4
cohort
7¼
cohort
8in
trin
sic
estim
ator
Null
vect
or
per
iod17
–0.8
00
–13.8
00
2.4
50
–0.3
60
–0.1
63
per
iod18
–0.8
00
–15.8
00
2.9
50
–0.2
92
–0.1
88
per
iod19
1.2
00
–15.8
00
5.4
50
1.7
76
–0.2
13
per
iod20
1.2
00
–17.8
00
5.9
50
1.8
44
cohort
1–0.8
33
–8.8
33
1.1
67
–0.4
52
–0.5
35
–1.8
75
–24.8
75
3.8
75
–1.0
96
–0.2
88
cohort
2–0.8
33
–6.8
33
0.6
67
–0.5
48
–0.4
01
–1.8
75
–22.8
75
3.3
75
–1.1
64
–0.2
63
cohort
3–0.8
33
–4.8
33
0.1
67
–0.6
43
–0.2
67
–1.8
75
–20.8
75
2.8
75
–1.2
31
–0.2
38
cohort
4–0.8
33
–2.8
33
–0.3
33
–0.7
38
–0.1
34
–1.8
75
–18.8
75
2.3
75
–1.2
99
–0.2
13
cohort
5–0.3
33
–0.3
33
–0.3
33
–0.3
33
0.0
00
–1.3
75
–16.3
75
2.3
75
–0.8
67
–0.1
88
cohort
60.1
67
2.1
67
–0.3
33
0.0
71
0.1
34
–0.8
75
–13.8
75
2.3
75
–0.4
35
–0.1
63
cohort
70.6
67
4.6
67
–0.3
33
0.4
76
0.2
67
–0.3
75
–11.3
75
2.3
75
–0.0
02
–0.1
38
cohort
81.1
67
7.1
67
–0.3
33
0.8
81
0.4
01
0.1
25
–8.8
75
2.3
75
0.4
30
–0.1
13
cohort
91.6
67
9.6
67
–0.3
33
1.2
86
0.6
25
–6.3
75
2.3
75
0.8
62
–0.0
88
cohort
10
0.6
25
–4.3
75
1.8
75
0.7
94
–0.0
63
cohort
11
0.6
25
–2.3
75
1.3
75
0.7
27
–0.0
38
cohort
12
0.6
25
–0.3
75
0.8
75
0.6
59
–0.0
13
cohort
13
0.6
25
1.6
25
0.3
75
0.5
91
0.0
13
cohort
14
0.6
25
3.6
25
–0.1
25
0.5
23
0.0
38
cohort
15
0.6
25
5.6
25
–0.6
25
0.4
56
0.0
63
cohort
16
0.6
25
7.6
25
–1.1
25
0.3
88
0.0
88
cohort
17
0.6
25
9.6
25
–1.6
25
0.3
20
0.1
13
(co
nti
nu
ed)
444
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
Tab
le2.
(co
nti
nu
ed
)
Five
age
groups
and
5per
iods
Five
age
groups
20
per
iods
Effec
tsag
e1¼
age2
per
iod3¼
per
iod4
cohort
7¼
cohort
8in
trin
sic
estim
ator
Null
vect
or
age1¼
age2
per
iod3¼
per
iod4
cohort
7¼
cohort
8in
trin
sic
estim
ator
Null
vect
or
cohort
18
0.6
25
11.6
25
–2.1
25
0.2
52
0.1
38
cohort
19
0.6
25
13.6
25
–2.6
25
0.1
85
0.1
63
cohort
20
0.6
25
15.6
25
–3.1
25
0.1
17
0.1
88
cohort
21
0.6
25
17.6
25
–3.6
25
0.0
49
0.2
13
cohort
22
0.6
25
19.6
25
–4.1
25
–0.0
19
0.2
38
cohort
23
0.6
25
21.6
25
–4.6
25
–0.0
86
0.2
63
cohort
24
0.6
25
23.6
25
–5.1
25
–0.1
54
inte
rcep
t10.4
33
10.4
33
10.4
33
10.4
33
0.0
00
11.4
75
11.4
75
11.4
75
11.4
75
0.0
00
445
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
solutions using the IE, k ¼ –.713. To move from the IE constraint to the age
constraint, k ¼ .713; from the IE constraint to the period constraint, k ¼15.679; and from the IE constraint to the cohort constraint, k ¼ –3.029.
(4) Each of the solutions fits the data equally well—this also occurs in unre-
ported analyses when I added random error to the cell entries. As long as the
same data are analyzed, the fit does not depend on the linear constraint used.
For this particular generating mechanism (age1 ¼ age2) the IE does bet-
ter than the period and cohort constraints that we used (k ¼ .713 times the
null vector for the absolute difference between the IE estimates and the
putative data generating parameters); however, the data could have been gen-
erated by the coefficients implied by any of the particular constraints used in
Table 2. For example, in the case of the period constraint (period3 ¼ peri-
od4), age group 1 (the youngest) is two units lower than age 2, which is one
unit lower than age 3, which is one unit lower than age 4, which is one unit
lower than the oldest group. This does not match the putative data generating
mechanism—but I could have used these parameters to generate the data
(and nature might have). If this were the generating mechanism for the
data in Table 1, then the absolute value of k, which determines the distance
between estimates, would have been 15.679 for the distance between the IE
estimate and this generating mechanism.
In terms of ‘‘substance,’’ I note that the different constraints produce quite
different interpretations concerning the generating parameters. In Table 2, the
only constraint to produce the putative data generating mechanism (age1 ¼age2) is the only constraint (among those used) that is consistent with the gen-
erating mechanism. It is consistent because for the data generating mechanism
the two youngest age groups have the same effect. I use the term putative to
emphasize that any of the mechanisms implied by any of the solutions could
have generated the data. This is the quandary associated with using these con-
strained regression approaches to APC analysis. If the data were generated by
‘‘nature’’ the way I generated it, then only the age1 ¼ age2 constraint (of the
constraints used) would have shown us how ‘‘nature’’ generated these data.
For this constraint, we see that each cohort is 0.50 greater than the earlier
one from cohort5 up to cohort9, that the first three periods are two less than
the next two, and that the age effect increases by one from the oldest to the
next oldest and again by one to the next oldest and then increases by two
for the two youngest age groups. This fits the putative data generating mech-
anism in both the 5 × 5 and 5 × 20 cases.
Comparing the results as we move from the analysis of data for a 5 × 5
age-period matrix to those based on the a 5 × 20 age-period matrix does
not allow us to examine patterns across the multitude of possible models
446 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
that can arise in APC analysis. I can, however, note some interesting pat-
terns, and as we will see, many of these patterns depend on the following
relationship: kv ¼ (-c01b2=v0c1) × v, where c1 is one constraint and b2 is
the solution under a different constraint. Here kv represents the difference
between two solution vectors, one based on c1 as a constraint and the other
(b2) based on c2.
I will use the phrase convergence as the number of periods increases in
a specific manner. It is the extent to which the values in a solution vector
based on a particular age-period matrix approach some set of fixed values
as more periods of data are added. Specifically, for our data I ask whether
the age-effects in the 5 × 5 age-period matrix (which remain constant across
all 20 periods) converge to some other values when there are 20 periods. Do
the cohort effects that changed only for cohorts 5 through 9 have estimates
that converge to some other value as we add periods? Do the period effect
estimates based on the 5 × 5 age-period matrix converge to some new val-
ues as we add periods? Is there some sense in which we find that with more
periods one of the constraints is more likely to tell us what nature was doing
in the period we were studying. This concern assumes that analysts are
actually interested in the age effects, period effects, and cohort effects on
rates during a particular era. For example, during any specific era, cohort ef-
fects or period effect may be highly associated with smoking behavior, yet
these very patterns are not likely to be associated with smoking behavior
in the same way in future eras (after the original cohorts are replaced).
What sorts of changes in the solutions do we find as we move from 5 to 20
periods? In both the 5 period and 20 period simulations we find that (1) when
the constraint is consistent with the generating mechanism, the generating
mechanism is estimated correctly. If I were to add error variance to these
simulations, we would find that the expected value of the estimated param-
eters would remain unbiased estimates of the data generating parameters. (2)
For each of the three traditional APC constraints, the estimated age effect co-
efficients are the same for both the 5 and 20 period cases. This means that
they do not converge to some other values. (3) The values of k do not remain
constant as we move from the 5 × 5 to the 5 × 20 situation. For example,
the value of k needed to transform the effect coefficient estimates for the
age1 ¼ age2 constraint to those obtained using the IE constraint is –.713
in the 5 × 5 case. In the 5 × 20 case, this value of k is –2.709 (about 3.8
times greater). (4) The age effect estimates using the IE are not the same
for the 5 period and the 20 period cases. This occurs because unlike the tra-
ditional constraints that remain the same in the 5 and 20 period cases, the null
vector (and thus the constraint for the IE solution) changes as the number of
O’Brien 447
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
periods changes. (5) Typically, I calculate k by taking the age constrained es-
timated coefficients minus the corresponding IE estimated coefficients and
dividing each of the resulting elements by the corresponding null vector el-
ement. We can, however, use k ¼ (-c01b2=v0c1) to calculate k in each of these
cases. For example if c1 is the constraint vector associated with the age con-
straint and b2 is the solution vector associated with the IE, we will obtain the
same k values as we obtained before: –.713 and –2.709.
If we had reason to believe that the period effects in the long run had
a zero linear trend, then we might well use a ZLT constraint for periods
(see note 9). To the extent that this constraint is consistent with the data gen-
erating mechanism, it should make the estimated age effects converge on the
data generating age effects. For the data in my simulation the data generating
period effect is (2,2,2,4,4) for the 5 period case and is
(2,2,2,4,4,2,2,2,4,4,2,2,2,4,4,2,2,2,4,4) for the 20 period case. Not surprising-
ly the ZLT period constraint is closer to the data generating mechanism in the
20 period case, and this is what we might expect in the long term for many
sets of data. With these data the slope of the data generating period effects
regressed on time is .60 for the 5 period case and only .04 for the 20 period
case. The age effects for the data generating mechanism are the same for the
two youngest age groups and then are one lower for each of the succeeding
age groups. Using the ZLT constraint for periods in the 5 period case, these
age effects are estimated to increase by .60 between age groups 1 and 2 and
then to drop by .4 for each of the succeeding age groups. Using the ZLT con-
straint for periods in the 20 period case, there is an increase of .036 between
the first and second age group and then drops of .964 for each of the succeed-
ing age groups. There is clear convergence to the data generating values for
age. This, however, depends on making a constraint that is consistent with
the data generating mechanism. The data could have been generated by
the mechanism implied by the constraint p3 ¼ p4 (see Table 2). If that
were the case, the assumption that there is no linear trend in periods would
be incorrect. The linear trend for that potential generating mechanism is
strongly negative. But theory and other data might convince us that the
zero linear trend for periods or some other linear trend in periods is much
more plausible. Note that we might well make similar arguments for
a ZLT constraint for cohorts.
Conclusion
Constrained estimation is the most common method used by those seeking to
estimate each of the age, period, and cohort coefficients in APC models.
448 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
Traditionally, equality constraints on two of the effect coefficients have been
used, but a different and more complex constraint is used for the IE. The use
of this constraint in APC models and some of its advantages were examined
by Kupper et al. (1983, 1985). Its development as a general method for esti-
mating the effect coefficients in APC models can be traced to Wenjiang Fu in
a series of article (e.g., Fu 2000; Fu et al. 2004; Fu and Hall 2006; Knight and
Fu 2000). This method has been highlighted for sociologists in two major ar-
ticles by Yang and associates (2004, 2008).
This article’s goal is to offer a better understanding of the properties of
constrained estimators in the APC context and to compare the conventional
constrained estimator approach with the IE approach.12 With this in mind, I
examined these estimators from several perspectives: (1) algebraically, as
constraints directly associated with specific generalized inverses (each con-
straint has its own associated generalized inverse); (2) geometrically, show-
ing many characteristics shared by all constrained estimators and some that
are not shared; and (3) using two small sets of data (one with 5 periods and
one with 20) to show how these relationships occur in practice and some pat-
terns that can change and some that remain constant as the number of periods
is increased (at least for the data I used).
Because the various traditional constraints, the ZLT constraint discussed in
this article, and the constraint imposed by the IE are all linear constraints ap-
plied to the X-matrix, it is not surprising that the solutions resulting from these
constraints share many characteristics in common. These common character-
istics include the following: With the constraint in place we can estimate each
of the age, period, and cohort effects separately; each set of estimates is subject
to bias when we define bias as the extent to which the parameter values that
generated the data are not the same as the expected value of the estimates un-
der the particular constraint used in the estimation; and the solutions are per-
pendicular to their constraints. When analyzing the same data set: All of the
just identified constrained linear solutions lie on a line that is parallel to the
null vector; each constrained solution has the same intercept; each constrained
solution fits the data equally well; and all of these solutions are related to each
other by a scalar times the null vector. There are some special characteristics
that the IE does not share with other constrained estimators. It uses the null
vector as a constraint (and this eliminates the temptation to examine the Y vari-
able in order to set the constraint). It is the solution with the shortest length
from the constraint’s origin; and related to this, the variances of the estimated
coefficients based on the IE are smaller than for other constrained solutions.
Additionally, in some senses, the IE could serve as the representative solution.
As I have emphasized, however, arguably the major consideration should be
O’Brien 449
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
how well the data generating parameters are estimated, and using the ‘‘wrong’’
constraints can lead to highly misleading estimates.
It is tempting to think that in some situations we can rely on theory and
previous literature to make an educated guess about the constraint, and this
may be the case in some situations. Perhaps the most compelling suggestion
is this from Kupper et al. (1983), who note that there may be situations in
which theory and previous literature allow the researcher to contemplate
more than one set of constraints. Then one can use these different constraints
separately: ‘‘If the separate sets of estimates obtained by applying each of
these various theoretically-based (a priori) choices for c are in close agree-
ment, then one could justifiably have some confidence in the accuracy of
this common set of estimated age, period, and cohort effects’’ (Kupper et
al. 1983:2804). Of course, these constraints should not be obtained by search-
ing the data for constraints that ‘‘work.’’ One possibility might be to impose
a ZLT for periods and a ZLT for cohorts when there are a large number of
periods and cohorts, and it is reasonable to assume that there are no particular
trends for periods or for cohorts. Then the researcher would obtain estimates
under these two different constraints and see if they agree. If we move out-
side of the tradition of constrained estimators, there is a literature on methods
that bypass the identification problem by not attempting to estimate the in-
dividual age, period, and cohort coefficients.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or pub-
lication of this article.
Notes
1. Mason et al. (1973) introduced this technique to sociologists in 1973.
2. The basic model can also be estimated as a Poisson regression or as a logistic
regression in a straightforward manner with a bit of notational change and, of
course, the appropriate software (see Yang et al. 2008). The substance of the dis-
cussion in this article applies to these maximum likelihood (ML) estimation tech-
niques as well as to the ordinary least squares (OLS) approach to estimation.
450 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
3. The only difference between dummy variable coding and effect coding is the
minus one coding for the reference category rather than a zero.
4. Analogously, if using Poisson or logistic regression, the model fit is the same no
matter which generalized inverse is used; that is, −2 × log-likelihood is the same
no matter which generalized inverse is used (here I assume that we are using just
one constraint so that the model is just identified).
5. I emphasize here that b represents the parameter vector associated with the pro-
cess that generated the data.
6. I arbitrarily chose to place the constraint in the last row and then replace the last
column of the resulting inverse with zeros. I could have placed the constraint in
the fourth row and after obtaining the inverse replaced the fourth column with
zeros (Mazumdar, Li, and Bryce 1980).
7. The null vector of (0, 1, –2) is unique up to multiplication by a scalar.
8. Interestingly, the Mazumdar et al. (1980) system does not yield the Moore-Pen-
rose matrix, but since it implements the same constraint it yields the same set of
solutions.
9. To produce a zero linear trend (ZLT) constraint for periods, we can use the fol-
lowing linear constraint: (n – 1) × period1 + (n – 2) × period2 + . . . + 1 ×period(n – 1) ¼ 0, where n is the number of periods and the nth period is used
as a reference category.
10. This is certainly the case for the situation in which the period effects across time is
of the form of a sine wave.
11. With effect coding, the sum of the coefficients for age groups, period, and cohorts
each equal zero, so that it is easy to calculate the effect for the reference category.
12. Yang et al. (2008:1706) certainly recognize that intrinsic estimator (IE) is con-
strained estimator: ‘‘Figure 1 also helps to illustrate geometrically that the IE
may in fact also be viewed as a constrained estimator.’’
References
Fu, Wenjiang J. 2000. ‘‘Ridge Estimator in Singular Design With Applications to
Age-Period-Cohort Analysis of Disease Rates.’’ Communications in Statistics—
Theory and Method 29:263-78.
Fu, Wenjiang J., and Peter Hall. 2006. ‘‘Asymptotic Properties of Estimators in Age-
Period-Cohort Analysis.’’ Statistics and Probability Letters 76:1925-129.
Fu, Wenjiang J., Peter Hall, and Thomas E. Rohan. 2004. ‘‘Age-Period-Cohort Anal-
ysis: Structure of Estimators, Estimability, Sensitivity and Asymptotics.’’ Techni-
cal Report. Michigan State University, Department of Epidemiology.
Knight, Keith, and Wenjiang Fu. 2000. ‘‘Asymptotics for Lasso-Type Estimators.’’
The Annals of Statistics 28:1356-378.
Kupper, Lawrence L., Joseph M. Janis, Azza Karmous, and Bernard G. Greenberg.
1985. ‘‘Statistical Age-Period-Cohort Analysis: A Review and Critique.’’ Journal
of Chronic Disease 38:811-30.
O’Brien 451
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from
Kupper, Lawrence L., Joseph M. Janis, Ibrahim A. Salama, Carl N. Yoshizawa, and
Bernard G. Greenberg. 1983. ‘‘Age-Period-Cohort Analysis: An Illustration of the
Problems in Assessing Interaction in One Observation Per Cell Data.’’ Communi-
cations in Statistics—Theory and Method 12:2779-807.
Mason, Karen Oppenheim, William M. Mason, H. H. Winsborough, and Kenneth W.
Poole. 1973. ‘‘Some Methodological Issues in Cohort Analysis of Archival Data.’’
American Sociological Review 38:242-58.
Mazumdar, S., C. C. Li, and G. R. Bryce. 1980. ‘‘Correspondence Between a Linear
Restriction and a Generalized Inverse in Linear Model Analysis.’’ The American
Statistician 34:103-05.
Press, William H., Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling.
1992. Numerical Recipes in C: The Art of Scientific Computing. New York: Cam-
bridge University Press.
Rodgers, Willard L. 1982. ‘‘Reply to Comments by Smith, Mason and Fineberg.’’
American Sociological Review 47:793-96.
Searle, S. R. 1971. Linear Models. New York: John Wiley & Sons.
Smith, Herbert L. 2004. ‘‘Response: Cohort Analysis Redux.’’ Pp. 111-19 in Socio-
logical Methodology, edited by R. M. Stolzenberg. Oxford. UK: Basil Blackwell.
Yang, Yang, Wenjiang J. Fu, and Kenneth C. Land. 2004. ‘‘A Methodological Com-
parison of Age-Period-Cohort Models: Intrinsic Estimator and Conventional Gen-
eralized Linear Models.’’ Pp. 75-110 in Sociological Methodology, edited by R.
M. Stolzenberg. Oxford, UK: Basil Blackwell.
Yang, Yang, Sam Schulehoffer-Wohl, Wenjiang J. Fu, and Kenneth C. Land. 2008.
‘‘The Intrinsic Estimator for Age-Period-Cohort Analysis: What It Is and How to
Use It.’’ American Journal of Sociology 113:1697-736.
Bio
Robert M. O’Brien is a Professor of Sociology at the University of Oregon. He spe-
cializes in criminology and quantitative methods. He has published extensively in
both areas. His recent publications include: ‘‘Can Cohort Replacement Explain
Changes in the Relationship Between Age and Homicide Offending?’’ (Journal of
Quantitative Criminology with Jean Stockard); ‘‘A Mixed Model Estimation of
Age, Period, and Cohort Effects’’ (Sociological Methods and Research with Ken
Hudson and Jean Stockard); ‘‘The Age-Period-Cohort Conundrum as Two Funda-
mental Problems’’ (Quality and Quantity); and ‘‘Still Separate and Unequal? A
City Level Analysis of the Black-White Gap in Homicide Arrest Since 1960’’ (Amer-
ican Sociological Review with Gary LaFree and Eric Baumer).
452 Sociological Methods & Research 40(3)
at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from