Sociological Methods & Research 2011 O Brien 419 52

http://smr.sagepub.com/Research

Sociological Methods &

http://smr.sagepub.com/content/40/3/419The online version of this article can be found at:

DOI: 10.1177/0049124111415367

July 2011 2011 40: 419 originally published online 29Sociological Methods & Research

Robert M. O'BrienConstrained Estimators and Age-Period-Cohort Models

Published by:

http://www.sagepublications.com

can be found at:Sociological Methods & ResearchAdditional services and information for

http://smr.sagepub.com/cgi/alertsEmail Alerts:

http://smr.sagepub.com/subscriptionsSubscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://smr.sagepub.com/content/40/3/419.refs.htmlCitations:

What is This?

- Jul 29, 2011Proof

- Aug 15, 2011Version of Record >>

at William Paterson Univ of NJ on November 4, 2011smr.sagepub.comDownloaded from

http://smr.sagepub.com/

http://smr.sagepub.com/content/40/3/419

http://www.sagepublications.com

http://smr.sagepub.com/cgi/alerts

http://smr.sagepub.com/subscriptions

http://www.sagepub.com/journalsReprints.nav

http://www.sagepub.com/journalsPermissions.nav

http://smr.sagepub.com/content/40/3/419.refs.html

http://smr.sagepub.com/content/40/3/419.full.pdf

http://smr.sagepub.com/content/early/2011/07/23/0049124111415367.full.pdf

http://online.sagepub.com/site/sphelp/vorhelp.xhtml


Constrained Estimatorsand Age-Period-CohortModels

Robert M. O’Brien1

Abstract

If a researcher wants to estimate the individual age, period, and cohort coeffi-cients in an age-period-cohort (APC) model, the method of choice is con-strained regression, which includes the intrinsic estimator (IE) recentlyintroduced by Yang and colleagues. To better understand these constrainedmodels, the author shows algebraically how each constraint is associatedwith a specific generalized inverse that is associated with a particular solutionvector that (when the model is just identified under the constraint) producesthe least square solution to the APC model. The author then discusses the ge-ometry of constrained estimators in terms of solutions being orthogonal to con-straints, solutions to various constraints all lying on a line single line inmultidimensional space, the distance on that line between various solutions,and the crucial role of the null vector. This provides insight into what character-istics all constrained estimators share and what is unique about the IE. The firstpart of the article focuses on constrained estimators in general (including the IE),and the latter part compares and contrasts the properties of traditionally con-strained APC estimators and the IE. The author concludes with some cautionsand suggestions for researchers using and interpreting constrained estimators.

Keywords

age-period-cohort models, constrained model estimation, intrinsic estimator

Article

1University of Oregon, Eugene, OR, USA

Corresponding Author:

Robert M. O’Brien, 720 PLC University of Oregon, Eugene, OR 97403

Email: [email protected]

Sociological Methods & Research40(3) 419–452

ª The Author(s) 2011Reprints and permission:

sagepub.com/journalsPermissions.navDOI: 10.1177/0049124111415367

http://smr.sagepub.com



Constrained estimators are the most popular approach to estimating the indi-

vidual coefficients in age-period-cohort (APC) accounting models.1 Conven-

tionally, a single constraint is placed on these models to make them just

identified. Unfortunately, since each of the constrained models fits the

data equally well, the validity of the chosen constraint cannot be judged

from the model fit. Instead, in the conventionally constrained APC models,

the researcher must decide on the basis of theory or past research the appro-

priateness of the constraint used to identify the model. Setting the constraint

in a defensible manner, however, is the Achilles heel of this approach, and

the constraint chosen can greatly affect the estimated coefficients.

Almost 30 years ago, Kupper et al. (1983) discussed a constrained estima-

tor for age, period, and cohort effects that they called the principal compo-

nents estimator. This estimator produces coefficients identical to those of

the recently introduced intrinsic estimator (IE), which is also a constrained

estimator. Kupper et al. (1983) note, as proved by Yang, Fu, and Land

(2004) for the IE, that the principal components estimator has minimum var-

iance. Beyond this, however, Kupper et al. (1983) did not appear to believe it

had any special usefulness in the analysis of APC data, pointing out that us-

ing this estimator could lead to more bias (in the sense of differences between

the expected value of the estimates and the true underlying generating pa-

rameters) than the use of some other constraint (Kupper et al. 1983).

Because of its minimum variance property and other additional properties,

Yang and associates recommend using the IE in the analysis of APC data.

Yang et al. (2004) correctly note the important contributions of Fu (Fu

2000; Fu, Hall, and Rohan 2004; Knight and Fu 2000) in the development

of the IE and in investigating its properties. It is Fu who has most extensively

published work on the IE.

Yang et al. (2004) recently introduced the IE to sociologists. After the spe-

cific introduction of the IE to sociologists in Sociological Methodology in

2004, it was further described in an article in the American Journal of Soci-

ology (Yang et al. 2008). Yang and associates (2008:1699) view the IE as ‘‘a

general-purpose method of APC analysis with potentially wide applicability

in the social sciences.’’ They review several properties of the IE (many of

which, as I discuss in the following, are shared with other constrained

estimators).

Given the widespread use of constrained APC models and the introduc-

tion of a new and arguably improved constraint into the sociological litera-

ture, this article explicates and clarifies the basic properties of constrained

linear models as they are used in the APC context. This should help research-

ers in several ways: (1) understanding the process of choosing an appropriate

420 Sociological Methods & Research 40(3)



constraint, (2) comprehending the relationship of the results based on using

different constraints, (3) untangling which characteristics are unique to the IE

and which it shares with other constrained estimators, and (4) facilitating

a comparison of the IE to other constrained estimators.

I begin with a brief discussion of the general APC accounting model and

the identification problem and then move to the use of generalized inverses,

which implement constrained estimation. These inverses provide a set of sol-

utions even though the unconstrained model is not identified. I then show

how these constraints can be implemented in different ways.

Next I turn to the geometry of constrained estimation with the aim of de-

veloping a common framework for evaluating different constrained estima-

tors. I then address, among other issues, bias in constrained estimators, the

relationship of the constraint to the results it produces, the relationship of

the IE estimator to other constrained estimators, and some unique features

produced by the constraint used in the IE approach. That discussion is fol-

lowed by a simulation designed to show some of these properties and to dem-

onstrate how a series of different constraints works in practice. I conclude

with an evaluation of the potential advantages of the IE relative to other con-

strained estimators and one potentially helpful situation in which researchers

might gain confidence from using different constrained estimators.

In the next two sections I briefly describe the accounting (multiple clas-

sification) model utilized in this literature and the identification problem

that analysts using this model face.

Multiple Classification Analysis for APC Models

Age groups, periods, and cohorts are often coded with dummy variables by

APC analysts. In this article I will use a similar coding scheme: effect coding.

I employ this coding because it is used in the IE approach (Kupper et al.

1983, 1985; Yang et al. 2004, 2008). This coding (as well as dummy variable

coding) does not constrain the functional form of the relationship between

the dependent variable and the age groups, periods, and cohorts. I use the

multiple classification model represented in equation (1) throughout this

article,2

Yij ¼ mþ ai þ pj þ ca�iþj þ eij; ð1Þ

where Yij is the age-period–specific value of the dependent variable, m repre-

sents the intercept, ai represents the effect of the ith age group, pj denotes the

effects of the jth period, ca–i+j represents the effects of the (a–i+j)th cohort

O’Brien 421



(where a equals the number of age groups), and eij represents the error term

[E(eij) ¼ 0]. I use age as the row variable, period as the column variable,

and cohorts are on the diagonals (see Table 1 for an example age-period

data matrix).

Rank Deficient Matrix

The identification problem in APC analysis arises because of the linear depen-

dency between the age, period, and cohort variables. If we know a person’s age

group and the period, we can determine their birth cohort (C ¼ P – A); if we

know their age group and cohort, we can determine the period (P ¼ C + A);

and if we know their birth cohort and the period, we can determine their age (A

¼ P – C). This is the linear dependency that prevents a unique solution to the

regression of the dependent variable on the effect coded (or dummy coded)

variables for age groups, periods, and cohorts (even when we have one refer-

ence category for each set of effect coded [dummy coded] variables).

In matrix notation, we can represent the multiple classification equation

(1) as:

Y =Xb+ e; ð2Þ

where Y is an ap × 1 vector of the age-period–specific rates (counts, or some

other form of dependent variable). The X-matrix is an ap × 2(a + p) − 3

matrix containing ones in the first column and effect coded variables in

the remaining columns. The order of the columns can be schematized as

[1, (a – 1), (p – 1), (a + p – 2)], where a – 1 represents the number of age

effect coded variables, p – 1 represents the number of period effect coded

variables, and a + p – 2 represents the number of cohort effect coded variables.

Each of these effect coded variables has ap entries (zeros and ones) in its col-

umn (and minus one in the row corresponding to the reference category) and is

coded to correspond to the ‘‘cell’’ in which the age-period–specific value of Y

Table 1. Age-Period Table With Data Generated as Described in the Text

Period 1 Period 2 Period 3 Period 4 Period 5

Age 1 10.5 11.0 11.5 14.0 14.5Age 2 10 10.5 11.0 13.5 14.0Age 3 9.0 9.0 9.5 12.0 12.5Age 4 8.0 8.0 8.0 10.5 11.0Age 5 7.0 7.0 7.0 9.0 9.5




resides.3 The last categories of age, period, and cohort serve as the reference

categories in our representation.

When a regular inverse exists, solving an equation of the form of (2) for

a unique set of least square regression coefficients is trivial:

b= X 0Xð Þ�1X 0Y , ð3Þ

where X 0 is the transpose of X and the superscripted –1 indicates the inverse.

The problem with this solution in the APC context occurs because of the lin-

ear dependency in the X-matrix, which means that the standard inverse of

X 0X does not exist. If we label the number of columns in X as m [¼ 2(a +

p) – 3], only m – 1 of these columns are linearly independent.

This linear dependency means that there is a set of coefficients that when

multiplied times the columns of X produces a column vector of zeros (with

the restriction that not all of these multiplying coefficients are zero). This

vector of coefficients is called the null vector. In general, one and only

one such vector of coefficients exists for the APC model (this vector is

unique up to multiplication by a scalar). In the language of linear algebra,

we say that there is a nontrivial solution to the a × p homogeneous equa-

tions; and we can write Xv ¼ 0, where X is the ap × 2(a + p) – 3 design

matrix and v is a 2(a + p) – 3 × 1 vector (the null vector). This vector is

said to be in the null space of X. The existence of only one such vector indi-

cates that the rank of X is just one less than full column rank and that a single

linear constraint should allow for a solution to the identification problem.

Constraints and Generalized Inverses

Even with a rank deficient matrix, it is still possible to find a least squares

solution to (2) by using a generalized inverse (Searle 1971). Typically, the

generalized inverse is denoted with a superscripted minus rather than a super-

scripted minus 1. I have added the subscript c to denote that generalized in-

verses differ depending upon the constraint with which they are associated:

X 0Xð Þ�c . For the same reason I have subscripted the solution vector for b with

c. That is, a particular set of solutions is associated with the particular gen-

eralized inverse used to solve the equation and that inverse is associated with

a particular constraint on the solutions:

bc = X 0Xð Þ�c X 0Y : ð4Þ

This allows us to compute solutions in the constrained APC model, and

each solution is determined by the constraint. The solution produced is a least

O’Brien 423



squares solution; that is, each solutions (bc) associated with the particular

constraint produces the same predicted values of Y and these values minimize

the sum of the squared residuals. Thus, �(Y � Y )2 is the same no matter

which generalized inverse is used to solve the equation.4 It is well known

in APC modeling that the fit cannot be used to differentiate the validity of

different constraints (assuming the use of just one constraint so that the mod-

el is just identified).

The way that constrained estimation is typically employed in APC anal-

ysis is to place a single constraint on X so that the model is just identified.

I will use the symbol Gc to represent the generalized inverse associated

with a particular constraint, rather than the more awkward notation(X 0X )�c .

With the generalized inverse associated with a particular constraint in

hand, we multiply X 0Y in (3) by the generalized inverse to obtain a solution:

bc =GcX 0Y . This solution is determined by the constraint. Mazumdar, Li,

and Bryce (1980) explicate this correspondence between a constraint, its gen-

eralized inverse, and the solution. They show how to derive a specific gen-

eralized inverse from the constraint.

When a constraint is used in the APC context, the constraint convention-

ally involves setting two of the coefficients associated with age, period, or

cohort to be equal. The choice of the constraint then determines a generalized

inverse used to solve for the age group, period, and cohort coefficients. The

choice of a constraint is crucial, since it determines the set of estimated val-

ues (bc) and whether those estimated values are unbiased. The assumption is

that c0β= 0, where c is the m × 1 vector for the constraint and β is the m × 1

vector of population effect coefficients that generated the data.5 If c0b 6¼ 0,

the estimate under that constraint will be biased in that its expected value

will not equal the value of parameters that generated the data (for a further

explication of the bias associated with violating this assumption, see Kupper

et al. 1983). This is true for the conventionally used constraints and the con-

straint associated with the IE.

The constraint associated with the IE is that the solution must be perpen-

dicular to the null vector (the vector that when pre-multiplied by X produces

an m × 1 zero vector). Specifically, if v represents the null vector, the as-

sumption is that v0β= 0. Since v is a specific constraint, we can write

c0β= 0, where it will be clear from the context that c is the null vector or a dif-

ferent constraint. To the extent that this assumption is not true, the estimates

associated with this constraint will be biased in the sense that they are biased

estimates of the parameters that generated the age-period–specific rates. As I

discuss in the following, when Yang et al. (2004, 2008) rightly assert that

their estimate is unbiased, it is an unbiased estimate of the least squares




solution in the subspace that is perpendicular to the null vector. That does not

mean that the IE is an unbiased estimate of the parameters that generated the

age-period–specific rates, the generating parameters.

To make the discussion more concrete, Figure 1 presents three constraints.

These constraints are based on a five-by-five age by period matrix, where the

columns of the X matrix contain one intercept, four age group effect coded

variables, four period effect coded variables, and eight cohort effect coded

variables. The first two constraints are conventional constraints that might

be used in APC analysis. As previously argued in the literature (Kupper

et al. 1985; Mason et al. 1973; Smith 2004), the constraint should be based

on theory and/or empirical evidence that the constraint applies to the popu-

lation under consideration. The third constraint is the one associated with the

IE. The first two constraints fix two of the effect coefficients to be equal to

each other and the third uses the null vector of X as the constraint. The as-

sumption for unbiased estimates of the generating parameters is that the

dot product of the constraint vector and the vector of the parameter values

associated with the process that generated the Y-values is equal to zero

(the reference categories are omitted). In the first two cases this will be

true; respectively, if the population parameter values for age1 and age2 are

equal to each other or if the population parameter values for cohort7 equals

the value for cohort8. The assumption when using the constraint associated

with the null vector is more complicated (though the IE is easily solved for

using an ‘‘add-on’’ program for STATA; cited in Yang et al. 2008); specif-

ically, for this five-by-five age-period matrix, the assumption is that 0.000 ×bint + (− 0.267 × bage1) + (−0.134 × bage2) + . . . + (.0401 × bcoh8) ¼ 0.

We can write the assumptions associated with each of these restrictions as

c0(age1= age2) · β= 0, c0(coh7= coh8) · β= 0, and c0(nullvector) · β= 0. Each of these

places a linear constraint on the solution. The question is which one of these

constraints is best with respect to obtaining a substantively correct result (as-

suming that the researcher wants to determine the effects of age, period, and

cohort coefficients in generating the outcomes: e.g., suicide rates, homicide

rates, or marriage rates).

Mazumdar et al. (1980) implement the constraints in the following man-

ner: (1) Compute X 0X ; (2) replace the last row of X 0X with the constraint that

we chose to use; for example, c0(age1= age2)or c0(nullvector); (3) compute the in-

verse of this new matrix; and (4) replace the last column of this inverse

with zeros. The result is the generalized inverse (Gc) associated with the par-

ticular constraint. Using this process, we can find a set of least squares sol-

utions under a particular constraint: bc =GcX 0Y .6 When the null vector is

substituted for the aforementioned last row of X 0X, the generalized inverse

O’Brien 425



that results is one that yields a solution for bc that is the same as if we had

used the Moore-Penrose generalized inverse: the inverse used in the IE ap-

proach (Mazumdar et al. 1980; Yang et al. 2008). This result occurs because

the Moore-Penrose generalized inverse institutes this constraint, although the

Moore-Penrose generalized inverse has some other valuable properties for

those using matrix algebra.

Simple Implementations of Constrained Regression

Researchers wanting to implement the first two constraints in Figure 1 may

choose to do so using a simple recoding of their data. To implement the first

constraint based on age group 1 and age group 2 having the same effects, one

can simply create a new variable (newage1_2) that is the sum of age1 and

age2 and use it in the analysis rather than age1 and age2. This is equivalent

to creating a single column in X that is equal to the age1 column plus the age2

column and using this column in place of the columns for age1 and age2 in

the regression analysis. The coefficient associated with newage1_2 is the co-

efficient for both age1 and age2. To implement the second constraint, one

can add the variable for cohort7 and cohort8 to create the new variable (new-

cohort7_8) and use it in the analysis instead of cohort7 and cohort8. It is

cage1=age2

ccoh7=coh8

cnull vector

intercept 0.000 0.000 0.000age1 1.000 0.000 -0.267age2 -1.000 0.000 -0.134age3 0.000 0.000 0.000age4 0.000 0.000 0.134

period1 0.000 0.000 0.267period2 0.000 0.000 0.134period3 0.000 0.000 0.000period4 0.000 0.000 -0.134cohort1 0.000 0.000 -0.535cohort2 0.000 0.000 -0.401cohort3 0.000 0.000 -0.267cohort4 0.000 0.000 -0.134cohort5 0.000 0.000 0.000cohort6 0.000 0.000 0.134cohort7 0.000 1.000 0.267cohort8 0.000 -1.000 0.401

Figure 1. Three different constraints on the X-matrix




a straightforward extension to implement a constraint in which age1 ¼ 0.5

× age2. One can construct this constraint by adding 0.5 × age1 to age2

(newage1_2 ¼ age2 + 0.5 × age1). The new column generated by this

transformation contains 0.5, 1, 0, 0, -1.5, 0.5, 1, . . . (where –1.5 is for the

reference group, the oldest age group in our example), and this pattern repeats

until there are no more rows in this column for this newly created variable.

Then we can run a regular regression using newage1_2, age3, . . . coh7,

coh8 as the regressors. The coefficient for newage1_2 is the coefficient for

age2 and .5 times that coefficient is the coefficient for age1. Another approach

uses constrained regression, which employs the constraint: age1 ¼ 0.5 ×age2. Constrained regression can also be used with the first two examples.

Although almost all researchers will want to use the convenient ‘‘add-on’’

program for STATA (cited by Yang et al. 2008) to calculate the IE, it is

possible to implement the null vector constraint and produce the coefficient

estimates from the IE using constrained regression or by transforming the

columns of X using the strategy described in the previous paragraph. The

null vector in Figure 1 means that 0 ¼ 0.00intercept – 2.67a1 – .134a2 +

0.00a3 + .134a4 + . . . +.267c7 + .401c8. Adding 2.67a1 to both sides of the

equation and dividing both sides of the equation by 2.67 yields a1 ¼ 0.00in-

tercept –.50a2 + 0.00a3 + .50a4 + . . . – 1.00c7 – 1.50c8. In this particular rep-

resentation, we see that a1 is a constrained linear function of the columns of

the other independent variables in X. If we implement this constraint (that is,

age1 ¼ –.50age2 + 0.50age4 + . . . – 1.00coh7 – 1.50coh8) in a constrained

regression program, we obtain the same estimates produced by the STATA-

based IE program. A reviewer of this article suggested transforming the col-

umns of independent variables in the following manner (although noting that

the procedure is tedious): newage2 ¼ age2 – 0.5 × age1; newage3 ¼ age3;

newage4 ¼ age4 + 0.5 × age1; . . . ; newcoh7 ¼ coh7; newcoh8 ¼ coh8 +

1.5age1. Using these values as regressors implements the ‘‘null vector con-

straint’’ and produces coefficient estimates equivalent to those from using

the IE.

The Geometry of Constrained Regression

When X 0X is one less than full column rank, as it is in the APC situation, the

solution to the m × 1 b vector of coefficients is in an m dimensional solution

space. Specifically, it lies in an m – 1 subspace of that solution space that is

orthogonal to its constraint. We know from the previous sections how to

estimate the m elements of the b vector by using a generalized inverse asso-

ciated with a constraint, by using constrained regression, or by algebraically

O’Brien 427



applying the constraint using the columns of X. To provide intuition about

how constrained regression works geometrically, we use a simple illustration

in which X 0X is a 3 × 3 matrix. Many essential properties of constrained

regression can be illuminated using this approach.

In the example I use the following specific data for Xb ¼ Y:

1 4 2

1 2 1

1 4 2

1 2 1

2664

3775

b1

b2

b3

24

35=

19

11

19

11

2664

3775; ð5Þ

which produces the normal equations for (X 0X)b ¼ (X 0Y)

4 12 6

12 40 20

6 20 10

24

35

b1

b2

b3

24

35=

60

196

98

24

35: ð6Þ

Here X 0X is singular reflecting the linear dependency in X. The second col-

umn is twice the third and the null vector is (0, 1, –2),7 since 0 times the first

column of X plus 1 times the second column of X plus –2 times the third col-

umn of X results in a 3 × 1 column vector of zeros.

Figure 2 has three axes b1, b2, and b3 (the three-dimensional solution space)

with the b1 axis representing the intercept. The null vector (0, 1, –2) is repre-

sented by the stippled line through the origin of (b1, b2, b3) that lies on the b2-b3

plane (since its b1 value is zero). The slope of the null vector on the b2-b3 plane

is –2.0 (if b2 increases by 1.0, b3 decreases by 2.0). The line of solutions, the

line on which every constrained solution appears, is three units above the

b2-b3 plane. Its slope with respect to the b2-b3 plane is the same as that of

the null vector, –2.0. The line of solutions and the null vector are parallel to

each other. The line of solutions for our data is always three units above the

b2-b3 plane; the value of the intercept remains constant at 3.0 for all solu-

tions using the constrained regression approaches outlined earlier for the

data in the example. We can determine the line of solutions in this simple

example by using (6), if we bear in mind that the intercept (b1) ¼ 3. If b1 ¼3 and we set b3 to zero b2 must be 4, and if we set b2 to zero b3 must be 8. If

we connect these two points, (3, 4, 0) and (3, 0, 8), we have the line on

which all of the constrained solutions in this example must fall. This line

is labeled as the line of solutions in Figure 2. Another way to determine

this line is to find the intersection of the first two planes represented by

the (first two) normal equations in (6). The problem is that the remaining




normal equation represents a plane on which this line lies. Since this plane

does not intersect the line of solutions in a single point (because of the lin-

ear dependency), we do not know where on the line of solutions to chose

a solution. We can force this plane to intersect the line of solutions at a sin-

gle point by setting a constraint on the solution.

We have seen earlier that we cannot solve this problem (equation 6) using

a regular inverse. We can, however, use a generalized inverse to find a solu-

tion for the bs for our example data. Different generalized inverses are asso-

ciated with different constraints and yield different solutions—all lying on

the line of solutions that is parallel to the null vector. Using the system

b2

b3

4

8

b1

(0,0,0)

90o

Line of Solutionsbc = GcX’Y

(Values in our Example)

3

3

Null Vector of X’X

(0, 1, -2)

Figure 2. Geometric view of constrained regression in three dimensions

O’Brien 429



described earlier (Mazumdar et al. 1980) to obtain the generalized inverses

for the data in (6) and solving for the bc vectors under different constraints,

we can confirm that all of the solutions fall on this line of solutions depicted

in Figure 2. We can, of course, obtain the same estimates using constrained

regression or transformations of the columns of X to implement the con-

straint. For example, to obtain the solution for the constraint (0, –1, 1), which

implies that b2 ¼ b3, we substitute this constraint vector for the last row of

the X 0X; find the inverse of this new matrix; and then substitute a column of

zeros for the last column of this inverse matrix. Pre-multiplying X 0Y by this

generalized inverse yields the solution vector (3.0, 2.67, 2.67) 0.Figure 3 depicts this solution geometrically. The constraint passes through

the origin (0, 0, 0) and has a slope of –1.0 with the b2-b3 plane. The solution

plane must be perpendicular to the constraint (it must have a slope of 1.0 with

respect to the b2-b3 plane) and pass through the b1 axes. The plane depicted in

Figure 3 has just such a slope with respect to the b2-b3 plane, is perpendicular

to the b2-b3 plane, and passes through the origin. Where this solution plane

intersects the line of solutions (3.0, 2.67, 2.67) is the solution under this

constraint.

If we repeated this process to obtain the ‘‘Moore-Penrose solution,’’ the

difference is that this solution is orthogonal to the null vector.8 Therefore,

we substitute the null vector for the last row of the X 0X; find the inverse

of this new matrix; and then substitute a column of zeros for the last column

of this inverse matrix. Pre-multiplying X 0Y by this generalized inverse yields

the solution vector (3.0, 3.2, 1.6) 0. Geometrically, the slope of the null vector

(0, 1, –2) to the b2-b3 plane is –2.0, and the ‘‘slope’’ of the solution plane rel-

ative to the b2-b3 plane (which must be orthogonal to this constraint) is .5.

This solution is depicted in Figure 4 as occurring where the solution plane

intersects the line of solutions. This plane is orthogonal to the null vector

(its slope with respect to the b2-b3 plane is 0.5), and it intersects the line

of solutions at the point (3.0, 3.2, 1.6). It is easy to check that this solution

vector is perpendicular to the null vector: namely, (0, 1, –2) · (3.0, 3.2,

1.6) 0 ¼ 0.

With the constraint (0, –1, 0.5) the estimated coefficients are (3.0, 2.0,

4.0), and using the constraint (0, –1, 0.1) yields (3.0, 0.6667, 6.6667). The

constraints do not allow just any solution to emerge, but only solutions

that are on the single line (the line of solutions) that is parallel to the null vec-

tor. Note also that each solution is orthogonal to its constraint. For example,

(0, –1, 0.5) · (3.0, 2.0, 4.0) 0 ¼ 0 and (0, –1, 0.1) · (3.0, 0.6667, 6.6667) 0 ¼ 0.

If we substitute these solutions (bc) into (5), each one produces the same pre-

dicted values of Y. This is part of what it means to be on this line of solutions.




If we were careful geometers, we could produce the constrained solutions

for the data in (6) corresponding to any constraint by first constructing the

line of solutions, then the solution plane that is orthogonal to the constraint,

and then determining where this plane intersects the line of solutions. We

would always find that the intercept is 3.0, since the line of solutions (for

this data) is always 3.0 units above the b2-b3 plane.

b2

b3

4

8

b1

Constraint(0,1,-1)

2.67

2.67

b2 = b3

Null Vector

45o

Solution(b1=3.0, b2= 2.67, b3=2.67)

Line of Solutionsbc = GcX’Y


Figure 3. Geometric view of constrained regression in three dimensions: b2 ¼ b3

O’Brien 431



We can characterize the differences between various constrained solu-

tions by means of the factor kv, where k is a scalar and v is the null vector.

For example, one of our solutions is (3.0, 2.6667, 2.6667) and another is

(3.0, 0.6667, 6.6667). We can move from the first to the second solution

by adding –2 × (0, 1, –2) to the first solution. This manipulation simply

moves us along the line of potential solutions, here moving from (3.0,

2.6667, 2.6667) to reach (3.0, 0.6667, 6.6667). This relationship keeps all

b2

b3

4

8

b1

(0,0,0) 3.20

1.60

b3 = ½ b2

Null Vector

22.5o

Line of Solutionsbc = GcX′Y


Moore-Penrose Solution(Values in Our Example)(b1=3.0; b2=3.2; b3=1.6)

Figure 4. Geometric view of constrained regression in three dimensions: b3 ¼ 1⁄2 b2:The Moore-Penrose solution




of the possible solutions on the line of solutions. Note that k depends on how

we scale the null vector, since any scalar multiple of (0, 1, –2) is also the null

vector.

Two ways to view this movement are first, as a movement from one

solution to the other solution by moving parallel to the axes. To move

from the point (3.0, 2.6667, 2.6667) to (3.0, 0.6667, 6.6667), move 0.0 units

on the b1 axis; –2 units parallel to the b2 axis and up 4 units parallel to the b3

axis where we intercept the line of solutions at (3.0, 0.6667, 6.6667). The sec-

ond geometric interpretation is based on the length that must be traveled on

the line of solutions from one solution to the other. The length of the null vec-

tor is the square root of the sum of its squared components: In our example

the square root of (02 + 12 + –22) ¼ 2.236. The distance along the line of

solutions between (3.0, 2.6667, 2.6667) and (3.0, 0.6667, 6.6667) is twice

this distance: 4.472. The unit of measurement for k is the length of the

null vector. We must move 2 units [2 × sqrt(v 0v)] along the line of solutions

to reach (3.0, 0.6667, 6.6667) from (3.0, 2.6667, 2.6667). The difference be-

tween any two constrained estimates may be written as bc – bc) ¼ kv, where

c and c* are two different constraints.

When we move to a situation in which there are four or more dimensions,

the solutions for any one set of data using different constraints still lie on

a line that is parallel to the null vector. This (one dimensional) line is then

in an m dimensional space rather than in a three-dimensional space. The

line of solution is intersected by an m – 1 dimensional hyperplane that is or-

thogonal to the constraint. Where this hyperplane intersects, the line of sol-

utions provides the solution associated with the particular constraint. We can

reach all of the solutions for the constrained estimators on the line by taking

any solution (from one of the generalized inverses) and adding a scalar times

the null vector to it. The scalar, k, still tells us how far we must travel on the

line of solutions with the units of measurement for k being the length of the

null vector (which now has m elements). Each set of solutions is still perpen-

dicular to the constraint.

Relevance to APC Models

In the class of APC models that I examine, a single linear constraint is placed

on the model that makes the model just identified. In this class of models, the

solution is always orthogonal to any constraint. The intercept is always the

same no matter which linear constraint is chosen. All of the solutions using

a single linear constraint lie on a single line of solutions in multidimensional

space. All of these solutions fit the model equally well in terms of predicted

O’Brien 433



values. All of the solutions are related to each other by kv. The solution is

determined by the intersection of the line of solutions with an m – 1 hyper-

plane, and the direction of that hyperplane is orthogonal to the constraint.

Before leaving this introduction to the geometry of constrained regres-

sion, I should mention a unique property of using the null vector as a con-

straint. The solution that it produces is closer to its constraint than using

any other constraint. In Figure 4 the solution plane perpendicular to the

null vector has a slope of .50 with respect to the b2-b3 plane (a one-unit in-

crease on b2 is associated with a 1⁄2 -unit increase in X3). The length of the

solution vector is the distance from the b1, b2, b3 origin to the point of solu-

tion on the line of solutions. For example, the length of the vector from the

origin to the solution on the line of solutions using the null vector is 4.669 (¼sqrt(3.02, 3.22 + 1.62)). Using the constraint (0, 1, –1), the distance to the so-

lution is 4.819 (¼ sqrt(3.02, 2.6672 + 2.6672)). This is often cited as an ad-

vantage of using the Moore-Penrose solution: ‘‘If we want to single out

one particular member of this solution set of vectors as a representative,

we might want to pick the one with the smallest length’’ (Press et al.

1992:62). We can also see that it is representative in the sense that it is in

the ‘‘middle’’ of a line of solutions that stretches out in either direction. It

may be this property that inspired Smith’s (2004:116) comment: ‘‘There is

also a sense in which the IE is an average of CGLIM [Constrained General-

ized Linear Model] estimates.’’

Some Shared Characteristics of Constrained APCModels

In this section I discuss some of characteristics that are shared by all con-

strained APC models, including the IE. I also discuss the characteristics of

the IE that are different from other constrained models. The aim is to provide

researchers with a basis for: understanding how constraints are related to sol-

utions, comprehending how the estimated coefficients from the various con-

strained models are related, and judging how much confidence should be

placed in the estimates.

A Solution for All of the Effect Coefficients

Because the APC model without an additional constraint is not of full column

rank, it is underidentified and there is no solution that provides a unique vec-

tor of age, period, and cohort coefficients that corresponds to the generating




parameters for the data. There are instead an infinite number of solutions for

the age, period, and cohort coefficients that produce least square estimates.

Any constrained regression approach provides estimates of the unique ef-

fect coefficients for each age group, period, and cohort under the particular

constraint. They are unbiased estimates in the sense that they are unbiased

estimates under a particular constraint. The crucial question, from our per-

spective, is whether the parameter estimates are unbiased, in the sense that

their expected values equal the values of the parameters that generated Y:

Does, in fact, c0b= 0?

Solution Space Is Perpendicular to the Constraint

Yang et al. (2004:82) note that ‘‘the parameter space P can be decomposed as

P=N⊕Y, where ⊕ is the direct product of two linear spaces that are per-

pendicular to each other; N is the one-dimensional null space of X spanned by

the vector {sB0}, with real number s; and � is the complement subspace or-

thogonal to N.’’ In our notation B0 is the null vector (v) and s is the scalar by

which the null vector may be multiplied, since it is unique up to multiplica-

tion by a scalar. In general, I note that the parameter space can be decom-

posed as P=Bc ⊕Πc where Bc is a linear constraint spanned by the

vector {sc} with real numbers s and constraint vector c; and Πc is a comple-

ment subspace orthogonal to Bc. The linear constraint, like the null vector, is

unique up to multiplication by a scalar. The solution vector must be orthog-

onal to the constraint, since the solution vector is determined by a point on

the constrained hyperplane, and the hyperplane is orthogonal to the con-

straint. Specifically, this point is the intersection of this hyperplane (with

its direction constrained to be orthogonal to the constraint) and the line of

solutions.

As noted earlier, all of the solutions for the different constraints (with

a fixed set of data) must fall on a line parallel to the null vector. This restric-

tion on the solutions occurs because of the linear dependency in the original

data. For each constrained solution, the constraint determines the direction of

the solution plane/hyperplane.

Bias

Arguably the most important consideration in setting a linear restriction is the

degree of bias associated with that restriction. The constraint chosen by the

researcher determines the amount of bias associated with the particular con-

strained estimate. Since the effect coefficients are estimated under this

O’Brien 435



assumption, the estimated values will be orthogonal to the constraint:

c0bc = 0. Therefore, we cannot use this observed orthogonality to judge the

validity of the constraint. The crucial question is whether the constraint is

orthogonal to b.

A general definition of bias involves the difference between the expected

value of an estimator and the parameter that it estimates. In this article, I

use bias to refer to the differences between expected values of the estimates

and the values of the parameters (b) of the process that generated the outcome

values. Specifically, it is the difference between the expected values of the es-

timates in the vector of constrained estimates, E(bc), and b: bias ¼ E(bc)� b.

Using this conception of bias, Kupper et al. (1983) note that for APC models

kv=E(bc)� b, where kv measures the distance between the expected value of

the constrained solution and the population parameters that generated the data

(b). This is identical to the distance between any two constrained estimates on

the line of solutions, except that one of the solutions is the ‘‘true’’ solution (the

one representing the process that generated the outcomes). Kupper et al. (1983)

note that this value of k can be computed as: k =�c0b=v0c and that c 0b¼0 when the expected value of the constrained estimate equals the parameter val-

ues that generated the outcome: E(bc)=b.

A different use of the term bias involves the difference between the

expected values of the estimators and the values of the parameters under

a particular constraint: E(bc)=bc. This is an important property for any con-

strained estimator, but the focus of most of the literature has been on bias in

terms of the expected value of the estimated parameters and the parameters

that generated the data.

A careful reading of Yang et al. (2004, 2008) indicates that the IE is an

unbiased estimate of the solution to the APC model that lies in the subspace

that is orthogonal to the null vector. The IE is an unbiased estimator of the bie

parameter values associated with the null vector constraint: E(bie)= bie. The

IE would provide an unbiased estimate of the parameters that generated the

age-period–specific rates, if and only if v0b= 0; just as other constrained es-

timates would provide unbiased estimates of the parameters that generated

the age-period–specific rates if and only if c0b= 0.

Relationship of Constrained Solutions to Each Other

The problem confronting APC analysts who employ constrained regression

(whether using traditional constraints or the IE) is that they cannot find the

unique solution to the normal equations. This occurs because the X matrix

is rank deficient by one. Thus, there are an infinite number of possible




solutions; however, not just any solution is possible. All constrained solu-

tions are related to each other in that they lie on a single line of solutions

in multidimensional space. In a fundamental sense, the line of solutions is

the most tangible thing we know about the solution to this system of equa-

tions. Any solution to the normal equations must lie on this line and any

of the solutions to the normal equations are ordinary least squares (OLS) sol-

utions. Thus, if our standard assumptions (e.g., those underlying OLS multi-

ple regression) are correct, even though the X matrix is rank deficient by one,

we know that the solution corresponding to the parameters values that gen-

erated the outcome data must fall on this line.

The difference between any two solution vectors in the class of con-

strained estimators examined in this article is kv ¼ (-c01b2=v0c1) × v, where

c1 is one constraint and b2 is the solution under a different constraint. When

we have calculated a single solution, we could determine the difference

between that solution and any other solution generated using a different con-

straint or, of course, determine any other solution. The distance between any

two solutions on the line of solutions is k · ffiffiffiffiffiffiv0vp

), whereffiffiffiffiffiffiv0vp

is the length of

the null vector. Appropriately, the value of k ¼ 0 when the constraint for

a solution and the solution b2 are orthogonal c01b2 = 0. Thus, kv indicates

how much the coefficients based on the two different constraints vary: b2-

b1 ¼ kv.

Model Fit

The solutions for the normal equations associated with the APC model all lie

on the line of solutions and any solution to these normal equations is a least

squares solution. Thus, each of the constrained estimates generates a solution

(bc) that produces the same solutions for the values of the dependent variable.

I showed this for the 3 × 3 case using the data in equation (5), and it extends

to higher dimensions and accounts for the well-established finding that in

APC analysis one cannot distinguish between models on the basis of fit

when they are just identified. This is true for all of the constrained estimates

that we examine.

Setting the Constraint and Zero Influence

Many sources (e.g., Kupper et al. 1983; Rodgers 1982; Smith 2004) caution

against setting the constraint based on the values of the observed dependent

variable. An advantage of the IE approach is that it assures that the researcher

does not give in to this temptation—the constraint is based purely on the

O’Brien 437



X-matrix, and in this sense, the analyst does not choose which constraint to

use.

I suggest two cautions. First, one could argue that in some situations with

good theory/past research that it would be better to set the constraint on the

basis of theory and previous research than setting it on this basis of the null

vector of the X-matrix. Kupper et al. (1983:2803-804) conclude: ‘‘Given that

the observed Yij’s are of no help in determining c, the only remaining option

is to make use of any reasonably reliable a priori, data independent, knowl-

edge about the underlying age, period, and cohort effect parameters under

study.’’

The second caution is that having no choice in the constraint used does not

mean that the constraint used is unimportant or has no effect on the outcome

measures. Yang et al. (2008:1704) state: ‘‘The eigenvector B0 [the null

vector] does not depend on the observed rates Y, only on the design matrix

X, and thus is completely determined by the numbers of age groups and pe-

riod groups—regardless of the event rates. In other words, B0 has a specific

form that is a function of the design matrix.’’ This is correct, as noted earlier,

and a potential advantage of using this constraint. But it is not clear what the

following statements means: ‘‘The fact that the fixed vector B0 is indepen-

dent of the response variable Y suggests that it should not play any role in

the estimation of effect coefficients’’ (Yang et al. 2008:1705) or ‘‘Specifical-

ly, the IE imposes the constraint that the direction in parameter space defined

by the eigenvector B0 in the null space of the design matrix X have zero in-

fluence on the parameter vector b0 (i.e., on the specific parameterization of

the vector b that is estimated by the IE)’’ (Yang et al. 2008:1706). If these

statements are meant to convey that constraining the solution to be orthogo-

nal to the vector B0 (the null vector) does not affect the estimates, then this is

incorrect. The solution vector must be orthogonal to the null vector and there-

fore is affected by this constraint. If they mean that the Y does not affect the

constraint that is chosen, then they are correct.

Variance of the Estimators

Yang et al. (2008:1709) note that ‘‘for a fixed number of time periods of data,

the IE is more statistically efficient (has a smaller variance) than any CGLIM

estimator that is obtained from a nontrivial equality constraint on the uncon-

strained regression coefficient estimator.’’

This is an important statistical advantage of using the null vector as a con-

straint and is in agreement with Kupper et al. (1983:2797) who note that the

principle component estimator (a linear transformation of the IE) ‘‘deals with




the exact linear dependency by involving only the non-zero eigenvalues as-

sociated with the eigenvectors.’’ Because of this, it ‘‘should be preferred on

variance grounds.’’

While noting that this is an important statistical property, I am in agree-

ment with Kupper et al. (1983) that bias (and I and they are using bias in

the sense of estimating the generating mechanism) is likely to be the more

important factor. Kupper et al. (1983:2797) note that to the extent that: v0bdeparts from zero, the principal component estimator is more biased and

‘‘could lead to more bias than the use of some other constraint . . . , so

that the optimal method for choosing c should probably take into account

both bias and variance (i.e., mean square error) considerations. Of

course, when the squared multiple correlation coefficient R2 is fairly close

to 1, a result which seems to occur not infrequently in practice, the bias

becomes the main area of concern.’’

Most Representative Solutions

The constraint used for the IE (the null vector) results in a vector from the

origin to the solutions that is the shortest of all the constrained estimators.

This property can be used to argue that this estimate is a sort of average of

the estimates based on constrained estimation. From the discussion of the ge-

ometry of constrained estimation, it is possible to view this solution as the

balance point of a teeter-totter—with solutions extending in both directions

along the line of solution from this solution point. In a similar vein we can

view the Moore-Penrose solution (the IE solution) as a sort of ‘‘conventional

solution.’’ Given that any of the constrained solutions provide the same fit to

the data, is there one solution that could serve as the conventional solution so

that different researchers would report the same solution in the absence of

evidence of a ‘‘better’’ solution? Some might say yes and that the solution

should be one based on the Moore-Penrose generalized inverse, because of

its statistical properties: ‘‘closest to its solution,’’ being (in a sense) most rep-

resentative, and its variance characteristics.

Convergence

I showed earlier that the scalar k times the null vector (kv) represents the dif-

ference between any two constrained estimators when using the same data

set. This holds for the intrinsic estimator b= bie + kv, where any of the con-

strained estimates, b, equals the estimates based on the intrinsic estimator,

bie, plus k times the null vector. It also holds for the traditional constrained

O’Brien 439



estimators b= bc + kv, where any of the constrained estimators (including

the intrinsic estimator) equals any specific constrained estimator, bc, plus k

times the null vector. Each of these solution vectors is a biased estimate of

the parameter vector that generated the outcome data to the extent that it dif-

fers from the parameters values (the age, period, and cohort parameters) that

‘‘nature’’ used to generate the age-period–specific results.

The null vector is unique up to multiplication by a scalar, and Yang et al.

(2008:1709) choose to norm the length of the null vector to 1 in their presen-

tation. They later transform their solution using ‘‘the orthonormal matrix of

all eigenvectors to transform the coefficients of the principal components

regression model to regression coefficients of the estimator B [the intrinsic

estimator]’’ (Yang et al. 2008:1705). Their transformed parameter values

are the same as those that result from our use of constrained regression (or

using generalized inverses) to obtain the parameters of the IE. Since they

use this normed null vector that has a length of 1, they note that as the number

of elements increases (the number of age groups and periods increase) the

elements of the null vector become smaller and ‘‘converge elementwise to

zero’’ (Yang et al. 2008:1709).

They use the formula b= bie + k * v * (in our notation) to represent the

relationship of any constrained estimate (b) to the values estimated by the

intrinsic estimator (bie). I have used the asterisk on v* to emphasize that it

is the normed null vector and on k* to emphasize that this scalar is appropri-

ately scaled, given the normed null vector. I note that I can also write

b= bc + k * v * . Yang et al (2008) argue that the expected value of any con-

strained estimator converges in value to the expected value of (bie) as the

number of periods and/or age groups increases, since the expected value of

v * goes elementwise to zero with such increases. Given b= bc + k * v *

and that bie is a constrained estimator, I could just as easily argue that the ex-

pected value of the intrinsic estimator converges toward the expected value

of any of the constrained estimators as v * goes elementwise zero. I am skep-

tical of this argument because as the normed v goes elementwise to zero there

is no reason to assume that k * remains constant. I have found in simulations

that k * does not remain fixed as the number of periods increases for the

models I have investigated.

On the other hand, convergence can occur for other reasons. It might be

the case that there is a zero trend in the period effects in the long term.

For example, the unemployment rate across a 60-period span may fluctuate

up and down over short periods of time, but with no apparent linear trend in

the long term. If we set a zero linear trend (ZLT) constraint for periods over

a short time span, we may or may not get an accurate estimate of the data




generating age effect coefficients.9 If we set a ZLT constraint for periods

covering a longer span of time, say 30 years, we are likely to get a more

accurate measure of the generating parameters for age, and a 60-year span

will likely provide an even more accurate measure. This is not surprising, be-

cause the ZLT constraint for periods is more likely to be correct (or closer to

being correct) as the span of periods increases, assuming an up and down pat-

tern of unemployment with very little or no overall trend.10

Investigating the Effects of Different Constraints

It is straightforward to investigate with data what happens to the estimated

age, period, and cohort coefficients as the constraints change. I report the re-

sults using two data sets (one with 5 periods and one with 20 periods). I use

the data from the 5 × 5 age-period matrix in Table 1. I generated the cell

entries by setting the earliest cohort value (cohort 1) to 5 and keeping the co-

hort effect at 5 through the fourth cohort and then increasing the cohort effect

by .50 for each cohort through the ninth and final cohort. The period effect is

two for the first three periods and four for the next two periods. The age ef-

fect is three for the two youngest age groups, two for the next oldest, one of

the next oldest, and zero for the oldest. I begin the simulation by using the

data for the 5 × 5 age-period matrix in Table 1.

When I extend our analysis to 20 periods, I leave the cohort effects as they

are for the 5 × 5 table and continue with no increase or decrease in the co-

hort effect for the newly added cohorts that correspond to the newly added

periods. The period effects are two for the first 3 periods, four for the next

2 periods, two for the next 3 periods, and continue this up-and-down pattern

until we reach the 20th period. The age effects remain the same as they are

for the 5 × 5 table across all 20 periods. When we increase the number of

periods to 20: the number of age groups remains at five and the number of

cohorts increases to 24. Certainly, I do not claim that this data set is somehow

representative of all of the possible generating mechanisms. The number of

mechanisms is infinite, as are the number of possible constraints. The results

point to many important relationships and illustrate many of the points that I

discussed previously.

I expect that if a constraint is consistent with the way the data were gen-

erated, we should obtain results that are consistent with the way the data were

generated. If the constraint is not consistent with the way the data were gen-

erated, we will obtain some other set of results. But whatever the results, they

will fit the data equally well; they will be orthogonal to the constraint used,

they will all lie on a line of solutions, and the intercept will be the same no

O’Brien 441



matter what constraint is used. The quandary in APC analysis is to discover

a method that will yield results that are consistent with how ‘‘nature’’ gener-

ated the data.

To compute the constrained estimates, I used constrained regression anal-

ysis in STATA. To calculate the IE, I used the add-on program in STATA

(cited in Yang et al. 2008). Table 2 presents the results. The first four col-

umns show the results when there are five periods and five age groups (based

on the data displayed in Table 1) for four different constraints: age1 ¼ age2,

period3 ¼ period4, cohort7 ¼ cohort8, and the constraint associated with

the IE. The fifth column contains the null vector for the 5 × 5 age-period

matrix. The next four columns show the results for the same four constraints,

but for the case where there are five age groups and 20 periods. The data have

been coded using effect coding so that the sum of the effect coefficients is

zero; and I have used the last category of age groups, periods, and cohorts

as the reference category. We can therefore determine the coefficients for

these reference categories and have reported this in Table 2.11 The final col-

umn of Table 2 contains the null vector for the 5 × 20 age-period matrix.

I first note that the analyses produce results consistent with several prin-

ciples noted earlier. (1) For the same X and Y data the intercept is always the

same no matter which linear constraint is used to identify the models: 10.433

for the data with 5 periods and five age groups and 11.575 for the data with

five age groups and 20 periods. (2) Each of the solutions is perpendicular to

its constraint. This is easily verified by multiplying the transpose of the con-

straint vector times the solution vector. Note that the constraint vector

ignores the reference categories (since they are ‘‘dropped’’ from the analy-

sis); thus, we need to multiply only the constraint times the corresponding

elements of the solution vector. For example, setting age1 ¼ age2 corre-

sponds to a constraint vector of (1, –1, 0, 0, . . . , 0), which when multiplied

times the appropriate elements of the solution vector results in zero (here I

have placed the intercept as the final term to match the output). In the

5 × 5 case, the age1 and age2 coefficients both equal 1.20 so the dot product

[c 0· (solution vector)] is zero; this is also true in the 5 × 20 case. The solu-

tion vector is perpendicular to the constraint vector. Similarly, the transpose

of the null vector times the solution for the IE equals zero. For the 5 × 5

case: –.267 × age1 – .134 × age2 + 0.000 × age3 + . . . + 0.401 × cohort8

+ 0.000 × intercept ¼ 0. (3) All of the solutions lie on a single line of sol-

utions and thus differ from one another by kv. In the 5 × 5 case, to move

from the solution for age1 ¼ age2 to the solution for period3 ¼ period4,

k ¼ 14.967; to move from the solution for the age constraint to the solution

for cohort constraint, k ¼ –3.742; to move from the age constraint to the




Tab

le2.

Anal

ysis

ofth

eD

ata

Gen

erat

edin

Table

1fo

r5

Peri

ods

and

Exte

nded

to20

Peri

ods

asD

escr

ibed

inth

eTe

xt

Five

age

groups

and

5per

iods

Five

age

groups

20

per

iods

Effec

tsag

e1¼

age2

per

iod3¼

per

iod4

cohort

7¼

cohort

8in

trin

sic

estim

ator

Null

vect

or

age1¼

age2

per

iod3¼

per

iod4

cohort

7¼

cohort

8in

trin

sic

estim

ator

Null

vect

or

age1

1.2

00

–2.8

00

2.2

00

1.3

90

–0.2

67

1.2

00

–2.8

00

2.2

00

1.3

36

–0.0

50

age2

1.2

00

–0.8

00

1.7

00

1.2

95

–0.1

34

1.2

00

–0.8

00

1.7

00

1.2

68

–0.0

25

age3

0.2

00

0.2

00

0.2

00

0.2

00

0.0

00

0.2

00

0.2

00

0.2

00

0.2

00

0.0

00

age4

–0.8

00

1.2

00

–1.3

00

–0.8

95

0.1

34

–0.8

00

1.2

00

–1.3

00

–0.8

68

0.0

25

age5

–1.8

00

2.2

00

–2.8

00

–1.9

90

–1.8

00

2.2

00

–2.8

00

–1.9

36

per

iod1

–0.8

00

3.2

00

–1.8

00

–0.9

90

0.2

67

–0.8

00

18.2

00

–5.5

50

–1.4

44

0.2

38

per

iod2

–0.8

00

1.2

00

–1.3

00

–0.8

95

0.1

34

–0.8

00

16.2

00

–5.0

50

–1.3

76

0.2

13

per

iod3

–0.8

00

–0.8

00

–0.8

00

–0.8

00

0.0

00

–0.8

00

14.2

00

–4.5

50

–1.3

08

0.1

88

per

iod4

1.2

00

–0.8

00

1.7

00

1.2

95

–0.1

34

1.2

00

14.2

00

–2.0

50

0.7

60

0.1

63

per

iod5

1.2

00

–2.8

00

2.2

00

1.3

90

1.2

00

12.2

00

–1.5

50

0.8

27

0.1

38

per

iod6

–0.8

00

8.2

00

–3.0

50

–1.1

05

0.1

13

per

iod7

–0.8

00

6.2

00

–2.5

50

–1.0

37

0.0

88

per

iod8

–0.8

00

4.2

00

–2.0

50

–0.9

69

0.0

63

per

iod9

1.2

00

4.2

00

0.4

50

1.0

98

0.0

38

per

iod10

1.2

00

2.2

00

0.9

50

1.1

66

0.0

13

per

iod11

–0.8

00

–1.8

00

–0.5

50

–0.7

66

–0.0

13

per

iod12

–0.8

00

–3.8

00

–0.0

50

–0.6

98

–0.0

38

per

iod13

–0.8

00

–5.8

00

0.4

50

–0.6

31

–0.0

63

per

iod14

1.2

00

–5.8

00

2.9

50

1.4

37

–0.0

88

per

iod15

1.2

00

–7.8

00

3.4

50

1.5

05

–0.1

13

per

iod16

–0.8

00

–11.8

00

1.9

50

–0.4

27

–0.1

38

(co

nti

nu

ed)

443



Tab

le2.

(co

nti

nu

ed

)

Five

age

groups

and

5per

iods

Five

age

groups

20

per

iods

Effec

tsag

e1¼

age2

per

iod3¼

per

iod4

cohort

7¼

cohort

8in

trin

sic

estim

ator

Null

vect

or

age1¼

age2

per

iod3¼

per

iod4

cohort

7¼

cohort

8in

trin

sic

estim

ator

Null

vect

or

per

iod17

–0.8

00

–13.8

00

2.4

50

–0.3

60

–0.1

63

per

iod18

–0.8

00

–15.8

00

2.9

50

–0.2

92

–0.1

88

per

iod19

1.2

00

–15.8

00

5.4

50

1.7

76

–0.2

13

per

iod20

1.2

00

–17.8

00

5.9

50

1.8

44

cohort

1–0.8

33

–8.8

33

1.1

67

–0.4

52

–0.5

35

–1.8

75

–24.8

75

3.8

75

–1.0

96

–0.2

88

cohort

2–0.8

33

–6.8

33

0.6

67

–0.5

48

–0.4

01

–1.8

75

–22.8

75

3.3

75

–1.1

64

–0.2

63

cohort

3–0.8

33

–4.8

33

0.1

67

–0.6

43

–0.2

67

–1.8

75

–20.8

75

2.8

75

–1.2

31

–0.2

38

cohort

4–0.8

33

–2.8

33

–0.3

33

–0.7

38

–0.1

34

–1.8

75

–18.8

75

2.3

75

–1.2

99

–0.2

13

cohort

5–0.3

33

–0.3

33

–0.3

33

–0.3

33

0.0

00

–1.3

75

–16.3

75

2.3

75

–0.8

67

–0.1

88

cohort

60.1

67

2.1

67

–0.3

33

0.0

71

0.1

34

–0.8

75

–13.8

75

2.3

75

–0.4

35

–0.1

63

cohort

70.6

67

4.6

67

–0.3

33

0.4

76

0.2

67

–0.3

75

–11.3

75

2.3

75

–0.0

02

–0.1

38

cohort

81.1

67

7.1

67

–0.3

33

0.8

81

0.4

01

0.1

25

–8.8

75

2.3

75

0.4

30

–0.1

13

cohort

91.6

67

9.6

67

–0.3

33

1.2

86

0.6

25

–6.3

75

2.3

75

0.8

62

–0.0

88

cohort

10

0.6

25

–4.3

75

1.8

75

0.7

94

–0.0

63

cohort

11

0.6

25

–2.3

75

1.3

75

0.7

27

–0.0

38

cohort

12

0.6

25

–0.3

75

0.8

75

0.6

59

–0.0

13

cohort

13

0.6

25

1.6

25

0.3

75

0.5

91

0.0

13

cohort

14

0.6

25

3.6

25

–0.1

25

0.5

23

0.0

38

cohort

15

0.6

25

5.6

25

–0.6

25

0.4

56

0.0

63

cohort

16

0.6

25

7.6

25

–1.1

25

0.3

88

0.0

88

cohort

17

0.6

25

9.6

25

–1.6

25

0.3

20

0.1

13

(co

nti

nu

ed)

444



Tab

le2.

(co

nti

nu

ed

)

Five

age

groups

and

5per

iods

Five

age

groups

20

per

iods

Effec

tsag

e1¼

age2

per

iod3¼

per

iod4

cohort

7¼

cohort

8in

trin

sic

estim

ator

Null

vect

or

age1¼

age2

per

iod3¼

per

iod4

cohort

7¼

cohort

8in

trin

sic

estim

ator

Null

vect

or

cohort

18

0.6

25

11.6

25

–2.1

25

0.2

52

0.1

38

cohort

19

0.6

25

13.6

25

–2.6

25

0.1

85

0.1

63

cohort

20

0.6

25

15.6

25

–3.1

25

0.1

17

0.1

88

cohort

21

0.6

25

17.6

25

–3.6

25

0.0

49

0.2

13

cohort

22

0.6

25

19.6

25

–4.1

25

–0.0

19

0.2

38

cohort

23

0.6

25

21.6

25

–4.6

25

–0.0

86

0.2

63

cohort

24

0.6

25

23.6

25

–5.1

25

–0.1

54

inte

rcep

t10.4

33

10.4

33

10.4

33

10.4

33

0.0

00

11.4

75

11.4

75

11.4

75

11.4

75

0.0

00

445



solutions using the IE, k ¼ –.713. To move from the IE constraint to the age

constraint, k ¼ .713; from the IE constraint to the period constraint, k ¼15.679; and from the IE constraint to the cohort constraint, k ¼ –3.029.

(4) Each of the solutions fits the data equally well—this also occurs in unre-

ported analyses when I added random error to the cell entries. As long as the

same data are analyzed, the fit does not depend on the linear constraint used.

For this particular generating mechanism (age1 ¼ age2) the IE does bet-

ter than the period and cohort constraints that we used (k ¼ .713 times the

null vector for the absolute difference between the IE estimates and the

putative data generating parameters); however, the data could have been gen-

erated by the coefficients implied by any of the particular constraints used in

Table 2. For example, in the case of the period constraint (period3 ¼ peri-

od4), age group 1 (the youngest) is two units lower than age 2, which is one

unit lower than age 3, which is one unit lower than age 4, which is one unit

lower than the oldest group. This does not match the putative data generating

mechanism—but I could have used these parameters to generate the data

(and nature might have). If this were the generating mechanism for the

data in Table 1, then the absolute value of k, which determines the distance

between estimates, would have been 15.679 for the distance between the IE

estimate and this generating mechanism.

In terms of ‘‘substance,’’ I note that the different constraints produce quite

different interpretations concerning the generating parameters. In Table 2, the

only constraint to produce the putative data generating mechanism (age1 ¼age2) is the only constraint (among those used) that is consistent with the gen-

erating mechanism. It is consistent because for the data generating mechanism

the two youngest age groups have the same effect. I use the term putative to

emphasize that any of the mechanisms implied by any of the solutions could

have generated the data. This is the quandary associated with using these con-

strained regression approaches to APC analysis. If the data were generated by

‘‘nature’’ the way I generated it, then only the age1 ¼ age2 constraint (of the

constraints used) would have shown us how ‘‘nature’’ generated these data.

For this constraint, we see that each cohort is 0.50 greater than the earlier

one from cohort5 up to cohort9, that the first three periods are two less than

the next two, and that the age effect increases by one from the oldest to the

next oldest and again by one to the next oldest and then increases by two

for the two youngest age groups. This fits the putative data generating mech-

anism in both the 5 × 5 and 5 × 20 cases.

Comparing the results as we move from the analysis of data for a 5 × 5

age-period matrix to those based on the a 5 × 20 age-period matrix does

not allow us to examine patterns across the multitude of possible models




that can arise in APC analysis. I can, however, note some interesting pat-

terns, and as we will see, many of these patterns depend on the following

relationship: kv ¼ (-c01b2=v0c1) × v, where c1 is one constraint and b2 is

the solution under a different constraint. Here kv represents the difference

between two solution vectors, one based on c1 as a constraint and the other

(b2) based on c2.

I will use the phrase convergence as the number of periods increases in

a specific manner. It is the extent to which the values in a solution vector

based on a particular age-period matrix approach some set of fixed values

as more periods of data are added. Specifically, for our data I ask whether

the age-effects in the 5 × 5 age-period matrix (which remain constant across

all 20 periods) converge to some other values when there are 20 periods. Do

the cohort effects that changed only for cohorts 5 through 9 have estimates

that converge to some other value as we add periods? Do the period effect

estimates based on the 5 × 5 age-period matrix converge to some new val-

ues as we add periods? Is there some sense in which we find that with more

periods one of the constraints is more likely to tell us what nature was doing

in the period we were studying. This concern assumes that analysts are

actually interested in the age effects, period effects, and cohort effects on

rates during a particular era. For example, during any specific era, cohort ef-

fects or period effect may be highly associated with smoking behavior, yet

these very patterns are not likely to be associated with smoking behavior

in the same way in future eras (after the original cohorts are replaced).

What sorts of changes in the solutions do we find as we move from 5 to 20

periods? In both the 5 period and 20 period simulations we find that (1) when

the constraint is consistent with the generating mechanism, the generating

mechanism is estimated correctly. If I were to add error variance to these

simulations, we would find that the expected value of the estimated param-

eters would remain unbiased estimates of the data generating parameters. (2)

For each of the three traditional APC constraints, the estimated age effect co-

efficients are the same for both the 5 and 20 period cases. This means that

they do not converge to some other values. (3) The values of k do not remain

constant as we move from the 5 × 5 to the 5 × 20 situation. For example,

the value of k needed to transform the effect coefficient estimates for the

age1 ¼ age2 constraint to those obtained using the IE constraint is –.713

in the 5 × 5 case. In the 5 × 20 case, this value of k is –2.709 (about 3.8

times greater). (4) The age effect estimates using the IE are not the same

for the 5 period and the 20 period cases. This occurs because unlike the tra-

ditional constraints that remain the same in the 5 and 20 period cases, the null

vector (and thus the constraint for the IE solution) changes as the number of

O’Brien 447



periods changes. (5) Typically, I calculate k by taking the age constrained es-

timated coefficients minus the corresponding IE estimated coefficients and

dividing each of the resulting elements by the corresponding null vector el-

ement. We can, however, use k ¼ (-c01b2=v0c1) to calculate k in each of these

cases. For example if c1 is the constraint vector associated with the age con-

straint and b2 is the solution vector associated with the IE, we will obtain the

same k values as we obtained before: –.713 and –2.709.

If we had reason to believe that the period effects in the long run had

a zero linear trend, then we might well use a ZLT constraint for periods

(see note 9). To the extent that this constraint is consistent with the data gen-

erating mechanism, it should make the estimated age effects converge on the

data generating age effects. For the data in my simulation the data generating

period effect is (2,2,2,4,4) for the 5 period case and is

(2,2,2,4,4,2,2,2,4,4,2,2,2,4,4,2,2,2,4,4) for the 20 period case. Not surprising-

ly the ZLT period constraint is closer to the data generating mechanism in the

20 period case, and this is what we might expect in the long term for many

sets of data. With these data the slope of the data generating period effects

regressed on time is .60 for the 5 period case and only .04 for the 20 period

case. The age effects for the data generating mechanism are the same for the

two youngest age groups and then are one lower for each of the succeeding

age groups. Using the ZLT constraint for periods in the 5 period case, these

age effects are estimated to increase by .60 between age groups 1 and 2 and

then to drop by .4 for each of the succeeding age groups. Using the ZLT con-

straint for periods in the 20 period case, there is an increase of .036 between

the first and second age group and then drops of .964 for each of the succeed-

ing age groups. There is clear convergence to the data generating values for

age. This, however, depends on making a constraint that is consistent with

the data generating mechanism. The data could have been generated by

the mechanism implied by the constraint p3 ¼ p4 (see Table 2). If that

were the case, the assumption that there is no linear trend in periods would

be incorrect. The linear trend for that potential generating mechanism is

strongly negative. But theory and other data might convince us that the

zero linear trend for periods or some other linear trend in periods is much

more plausible. Note that we might well make similar arguments for

a ZLT constraint for cohorts.

Conclusion

Constrained estimation is the most common method used by those seeking to

estimate each of the age, period, and cohort coefficients in APC models.




Traditionally, equality constraints on two of the effect coefficients have been

used, but a different and more complex constraint is used for the IE. The use

of this constraint in APC models and some of its advantages were examined

by Kupper et al. (1983, 1985). Its development as a general method for esti-

mating the effect coefficients in APC models can be traced to Wenjiang Fu in

a series of article (e.g., Fu 2000; Fu et al. 2004; Fu and Hall 2006; Knight and

Fu 2000). This method has been highlighted for sociologists in two major ar-

ticles by Yang and associates (2004, 2008).

This article’s goal is to offer a better understanding of the properties of

constrained estimators in the APC context and to compare the conventional

constrained estimator approach with the IE approach.12 With this in mind, I

examined these estimators from several perspectives: (1) algebraically, as

constraints directly associated with specific generalized inverses (each con-

straint has its own associated generalized inverse); (2) geometrically, show-

ing many characteristics shared by all constrained estimators and some that

are not shared; and (3) using two small sets of data (one with 5 periods and

one with 20) to show how these relationships occur in practice and some pat-

terns that can change and some that remain constant as the number of periods

is increased (at least for the data I used).

Because the various traditional constraints, the ZLT constraint discussed in

this article, and the constraint imposed by the IE are all linear constraints ap-

plied to the X-matrix, it is not surprising that the solutions resulting from these

constraints share many characteristics in common. These common character-

istics include the following: With the constraint in place we can estimate each

of the age, period, and cohort effects separately; each set of estimates is subject

to bias when we define bias as the extent to which the parameter values that

generated the data are not the same as the expected value of the estimates un-

der the particular constraint used in the estimation; and the solutions are per-

pendicular to their constraints. When analyzing the same data set: All of the

just identified constrained linear solutions lie on a line that is parallel to the

null vector; each constrained solution has the same intercept; each constrained

solution fits the data equally well; and all of these solutions are related to each

other by a scalar times the null vector. There are some special characteristics

that the IE does not share with other constrained estimators. It uses the null

vector as a constraint (and this eliminates the temptation to examine the Y vari-

able in order to set the constraint). It is the solution with the shortest length

from the constraint’s origin; and related to this, the variances of the estimated

coefficients based on the IE are smaller than for other constrained solutions.

Additionally, in some senses, the IE could serve as the representative solution.

As I have emphasized, however, arguably the major consideration should be

O’Brien 449



how well the data generating parameters are estimated, and using the ‘‘wrong’’

constraints can lead to highly misleading estimates.

It is tempting to think that in some situations we can rely on theory and

previous literature to make an educated guess about the constraint, and this

may be the case in some situations. Perhaps the most compelling suggestion

is this from Kupper et al. (1983), who note that there may be situations in

which theory and previous literature allow the researcher to contemplate

more than one set of constraints. Then one can use these different constraints

separately: ‘‘If the separate sets of estimates obtained by applying each of

these various theoretically-based (a priori) choices for c are in close agree-

ment, then one could justifiably have some confidence in the accuracy of

this common set of estimated age, period, and cohort effects’’ (Kupper et

al. 1983:2804). Of course, these constraints should not be obtained by search-

ing the data for constraints that ‘‘work.’’ One possibility might be to impose

a ZLT for periods and a ZLT for cohorts when there are a large number of

periods and cohorts, and it is reasonable to assume that there are no particular

trends for periods or for cohorts. Then the researcher would obtain estimates

under these two different constraints and see if they agree. If we move out-

side of the tradition of constrained estimators, there is a literature on methods

that bypass the identification problem by not attempting to estimate the in-

dividual age, period, and cohort coefficients.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research,

authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or pub-

lication of this article.

Notes

1. Mason et al. (1973) introduced this technique to sociologists in 1973.

2. The basic model can also be estimated as a Poisson regression or as a logistic

regression in a straightforward manner with a bit of notational change and, of

course, the appropriate software (see Yang et al. 2008). The substance of the dis-

cussion in this article applies to these maximum likelihood (ML) estimation tech-

niques as well as to the ordinary least squares (OLS) approach to estimation.




3. The only difference between dummy variable coding and effect coding is the

minus one coding for the reference category rather than a zero.

4. Analogously, if using Poisson or logistic regression, the model fit is the same no

matter which generalized inverse is used; that is, −2 × log-likelihood is the same

no matter which generalized inverse is used (here I assume that we are using just

one constraint so that the model is just identified).

5. I emphasize here that b represents the parameter vector associated with the pro-

cess that generated the data.

6. I arbitrarily chose to place the constraint in the last row and then replace the last

column of the resulting inverse with zeros. I could have placed the constraint in

the fourth row and after obtaining the inverse replaced the fourth column with

zeros (Mazumdar, Li, and Bryce 1980).

7. The null vector of (0, 1, –2) is unique up to multiplication by a scalar.

8. Interestingly, the Mazumdar et al. (1980) system does not yield the Moore-Pen-

rose matrix, but since it implements the same constraint it yields the same set of

solutions.

9. To produce a zero linear trend (ZLT) constraint for periods, we can use the fol-

lowing linear constraint: (n – 1) × period1 + (n – 2) × period2 + . . . + 1 ×period(n – 1) ¼ 0, where n is the number of periods and the nth period is used

as a reference category.

10. This is certainly the case for the situation in which the period effects across time is

of the form of a sine wave.

11. With effect coding, the sum of the coefficients for age groups, period, and cohorts

each equal zero, so that it is easy to calculate the effect for the reference category.

12. Yang et al. (2008:1706) certainly recognize that intrinsic estimator (IE) is con-

strained estimator: ‘‘Figure 1 also helps to illustrate geometrically that the IE

may in fact also be viewed as a constrained estimator.’’

References

Fu, Wenjiang J. 2000. ‘‘Ridge Estimator in Singular Design With Applications to

Age-Period-Cohort Analysis of Disease Rates.’’ Communications in Statistics—

Theory and Method 29:263-78.

Fu, Wenjiang J., and Peter Hall. 2006. ‘‘Asymptotic Properties of Estimators in Age-

Period-Cohort Analysis.’’ Statistics and Probability Letters 76:1925-129.

Fu, Wenjiang J., Peter Hall, and Thomas E. Rohan. 2004. ‘‘Age-Period-Cohort Anal-

ysis: Structure of Estimators, Estimability, Sensitivity and Asymptotics.’’ Techni-

cal Report. Michigan State University, Department of Epidemiology.

Knight, Keith, and Wenjiang Fu. 2000. ‘‘Asymptotics for Lasso-Type Estimators.’’

The Annals of Statistics 28:1356-378.

Kupper, Lawrence L., Joseph M. Janis, Azza Karmous, and Bernard G. Greenberg.

1985. ‘‘Statistical Age-Period-Cohort Analysis: A Review and Critique.’’ Journal

of Chronic Disease 38:811-30.

O’Brien 451



Kupper, Lawrence L., Joseph M. Janis, Ibrahim A. Salama, Carl N. Yoshizawa, and

Bernard G. Greenberg. 1983. ‘‘Age-Period-Cohort Analysis: An Illustration of the

Problems in Assessing Interaction in One Observation Per Cell Data.’’ Communi-

cations in Statistics—Theory and Method 12:2779-807.

Mason, Karen Oppenheim, William M. Mason, H. H. Winsborough, and Kenneth W.

Poole. 1973. ‘‘Some Methodological Issues in Cohort Analysis of Archival Data.’’

American Sociological Review 38:242-58.

Mazumdar, S., C. C. Li, and G. R. Bryce. 1980. ‘‘Correspondence Between a Linear

Restriction and a Generalized Inverse in Linear Model Analysis.’’ The American

Statistician 34:103-05.

Press, William H., Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling.

1992. Numerical Recipes in C: The Art of Scientific Computing. New York: Cam-

bridge University Press.

Rodgers, Willard L. 1982. ‘‘Reply to Comments by Smith, Mason and Fineberg.’’

American Sociological Review 47:793-96.

Searle, S. R. 1971. Linear Models. New York: John Wiley & Sons.

Smith, Herbert L. 2004. ‘‘Response: Cohort Analysis Redux.’’ Pp. 111-19 in Socio-

logical Methodology, edited by R. M. Stolzenberg. Oxford. UK: Basil Blackwell.

Yang, Yang, Wenjiang J. Fu, and Kenneth C. Land. 2004. ‘‘A Methodological Com-

parison of Age-Period-Cohort Models: Intrinsic Estimator and Conventional Gen-

eralized Linear Models.’’ Pp. 75-110 in Sociological Methodology, edited by R.

M. Stolzenberg. Oxford, UK: Basil Blackwell.

Yang, Yang, Sam Schulehoffer-Wohl, Wenjiang J. Fu, and Kenneth C. Land. 2008.

‘‘The Intrinsic Estimator for Age-Period-Cohort Analysis: What It Is and How to

Use It.’’ American Journal of Sociology 113:1697-736.

Bio

Robert M. O’Brien is a Professor of Sociology at the University of Oregon. He spe-

cializes in criminology and quantitative methods. He has published extensively in

both areas. His recent publications include: ‘‘Can Cohort Replacement Explain

Changes in the Relationship Between Age and Homicide Offending?’’ (Journal of

Quantitative Criminology with Jean Stockard); ‘‘A Mixed Model Estimation of

Age, Period, and Cohort Effects’’ (Sociological Methods and Research with Ken

Hudson and Jean Stockard); ‘‘The Age-Period-Cohort Conundrum as Two Funda-

mental Problems’’ (Quality and Quantity); and ‘‘Still Separate and Unequal? A

City Level Analysis of the Black-White Gap in Homicide Arrest Since 1960’’ (Amer-

ican Sociological Review with Gary LaFree and Eric Baumer).




Documents

Sociological Methods & Research 2011 O Brien 419 52