View
214
Download
0
Category
Preview:
Citation preview
1
University of Copenhagen
Institute of Public Health
Autumn 2003
Modelling regional variation of
first-time births in Denmark
1980-1994
Individuelt studieforløb
Afleveret 1. oktober 2003
Lau Caspar Thygesen
Vejleder: Professor Niels Keiding
2
1. Introduction
The fertility pattern in Denmark has changed dramatically through the last part of
the 20th
century. The total fertility rate (TFR) was relatively high after the Second
World War (TFR = 2.5*) while it decreased dramatically at the end of the 1960s
and less dramatically through the 1970s until 1983 where the absolute minimum
was reached (TFR = 1.377). From that point on the TFR increased to 1.8 in the
start of the 1990s where after it has remained stable1.
This development conceals different trends for specific age groups through the
last 20 years. The age specific fertility rates for the younger age groups (15-19
and 20-24 years) have been decreasing since 1982, the fertility rate for the 25-29
years have increased until 1994 after which it has been decreasing and the fertility
rates for the oldest age groups (30-34, 35-39 and 40-44 years) have been
increasing since 19821;2
. These patterns could indicate that the women give birth
to almost same number of children but that they are older when giving birth.
Denmark is divided into 14 counties and 2 municipalities that are considered
counties (the municipalities of Copenhagen and Frederiksberg). The differences
of the total fertility rates between these 16 counties are rather large in the sense
that the fertility rates of Copenhagen and Frederiksberg municipalities are small
compared with the counties in Jutland (particularly Viborg County, Ringkøbing
County, South Jutland County and Ribe County). These differences conceal large
differences between the age specific fertility rates, whereby Copenhagen and
Frederiksberg municipalities have low fertility rates for the younger age groups
(20-29 years) compared with the other counties and higher for the older age
groups (30-39 years)3. This pattern also applies to a lesser account for Aarhus
County. For some of the Jutlandic counties (e.g. Viborg, Ringkøbing, South
Jutland and Ribe counties) the fertility rate is high for the younger (20-29 years)
and low for the older age groups (30-39 years) compared with the average
measure for Denmark. At the same time all counties in the period 1982-2002 have
experienced a decrease in the age groups younger than 24 years of age, stagnation
for the women 25-29 years of age and an increase for the older age groups.
This pattern indicates that the development where the women give birth to
children at still older ages first has become pronounced in Copenhagen,
* The total fertility rate could be interpreted as the average number of children a woman would
bear if she survived through the reproductive age span and experienced at each age the particular
set of age-specific fertility rates for that period (Preston, page 95).
3
Frederiksberg and to a lesser account Aarhus, whereas some of the counties in
Jutland may come to experience the same development.
On this background the objective of this investigation is to analyse the fertility of
childless women both for Denmark in general but also by analysing the regional
variation of first-time births. By analysing nulliparous women it is possible to
describe one dimension of the phenomenon where women are still older when
giving birth.
The standard method of analysing fertility rates by demographers is to tabulate
key summary measures (e.g. total fertility rate) by age, period and/or cohort
according to the whole population or by some well-defined subgroup (e.g. by
parity or marital status). This decomposition of the fertility into subgroups is then
used to show how subgroups of the female population influence the fertility
pattern and then aggregate these figures into some measure of fertility4. These
solutions will not be of primary interest in this paper where the aim is to show an
alternative way of analysing fertility.
The strategy in this paper is to analyse the fertility of childless women by a
statistical model called the age-period-cohort model. This model simultaneous
analyse the effect of age, calendar year and birth cohort. When several time-
related factors in complex combinations are in play, it becomes difficult to discern
clear patterns in the temporal variation in fertility rates. This could be solved by
using statistical modelling techniques to separate the effect of age, period (secular
influences) and cohort (generational factors). By analysing a complex
phenomenon like fertility with the use of statistical models it becomes possible to
give a more concise description of the phenomenon by pointing out the effects
that have the most pronounced effect on the fertility rates.
But some drawbacks to model fitting has to be considered where the parametric
specification of the model is often of great importance. In the models presented
below the parametric specification will be of central interest because the selection
of parameters in the age-period-cohort model often has an arbitrary element
included.
On this background the aim of this study is to analyse the fertility pattern of
childless women in respect to three time scales (age, period and birth cohort) and
to analyse the regional variation of fertility between the 16 counties of Denmark.
Both investigations will apply the age-period-cohort model to simultaneous model
the three time-related factors.
4
2. The model
2.1. Effect of age, period and cohort
In this study the fertility rate of nulliparous women will be analysed according to
three time-scales: The age of the woman giving birth, the period of giving birth
and the birth cohort of the woman giving birth. These effects will be termed age,
period and cohort, respectively. The effect of the 16 counties will be introduced in
the end of this section.
Age refers to the age of the woman giving birth in full years. I have data for the
age groups from 13-49 years (A=37) indexed a (a=13,…,49). Period refers to the
year the women are giving birth to the first child. I have data for the period 1980-
1994; both years included (P=15) and indexed p (p=1980,…,1994). Cohort refers
to the year the women are born. A cohort is defined as the aggregate of all units
that experience a particular demographic event during a specific time interval in
this investigation being born at the same time4. In this paper a cohort is defined by
C (C=A+P-1=51) and the relation is that cohort = period – age. The oldest cohort
is women who are 49 years of age in 1980 who therefore are from the 1931 birth
cohort. The youngest cohort is women who are 13 years in 1994 (the 1981-
cohort). Of course I cannot know whether a woman giving birth to her first child
in 1985 at the age of 25 is from the 1960 or 1959 cohort, because I have no
information about if she has had birthday in 1985 when giving birth. This problem
could not be solved in the present material and I therefore used the above
equation.
In the paper I will analyse the fertility rate of nulliparous women. The fertility rate
is estimated as the ratio of a count (number of births) to the ‘risk’ time (the
number of women under risk of giving birth to her first child). One class of
models are appropriate for analysing this kind of data: Generalised linear models.
2.2. Generalized linear models
With this broad class of models it is possible to analyse different types of
response variables. The class includes ordinary regression and analysis of
variance for continuous response variables as well as different models for
categorical response variables. All generalized linear models (subsequently called
GLM) are defined by three components: The random component identifies the
response variable, Y, and attaches a probability distribution for it. The systematic
component specifies the explanatory variables used as predictors in the model.
5
And finally the link describes the functional relationship between the systematic
component and the expected value of the random component. The GLM relates a
function of that value to the systematic component through a prediction equation
having a linear form. I find it necessary to briefly introduce the three components
to develop a satisfactory statistical model for the fertility data in this investigation.
2.2.1. The random component
For a sample size N, denote the observations on the response variable by Y1, …,
YN. At this point an assumption is stated to treat Y1, …, YN as independent
observations. The random component of a GLM consists of identifying the
response variable and selecting a probability distribution for Y1, …, YN.
In the scenario of this investigation the response variable is a count (number of
first births in a given year and at a given age), which is a non-negative value.
In the analysis of non-negative count observations a Poisson distribution for the
random component could be assumed. See appendix I for an argument for using
the Poisson distribution in the analysis of survival data, where we model the
intensity of first birth†. A random variable Y is said to have a Poisson distribution
with parameter µ if it takes integer y = 0, 1, 2, … with probability
!)(
y
eyYP
yµµ ⋅==
−
for µ > 0. The mean and variance of the Poisson distribution can be shown to be
E(Y) = var(Y) = µ
Since the mean is equal to the variance it follows that any factor that affects one
will also affect the other. If the actual variance of the observed number of births is
larger than expected under the Poisson assumption, the model is said to exhibit
over-dispersion. This is not uncommon when counts are large, because the
heterogeneity of the population will be pronounced when the data quantity is
large. The consequence of over-dispersion is that the parameters of the model
estimated will have standard errors that are too small. Therefore significance tests
will have a tendency to be significant. Several methods have been developed to
take account of this problem5, but when analysing counts that are as large as in
this investigation (counts from a whole population) the standard errors will no
matter what be so small that tests of significance will almost always be significant
6
no matter if these methods for adjustment of the standard errors are used or not.
Therefore the simple significance tests of the single parameter estimates will not
be used in this investigation but instead two other methods checking the adequacy
of the model will be used. These will be presented below in part 2.3.
2.2.2. The systematic component
The systematic component of a GLM specifies the explanatory variables. These
enter linearly as predictors on the right side of the model equation. The systematic
component specifies the variables that play the roles of xi in the formula
α + β1x1 + … + βkxk,
where i = 1, …, k and α and βi are the regression coefficients of the systematic
component. This linear combination of the explanatory variables is called the
linear predictor. Some of xi may be included in the linear predictor in other ways,
e.g. x3 = x1x2 to allow for interaction between x1 and x2.
2.2.3. The link
The link between the random and systematic component of the GLM specifies
how µ relates to the explanatory variables in the linear predictor. The model
formula states
g(µ) = α + β1x1 + … + βkxk,
where g(.) is called the link function. I could suppose a simple linear model but
this has the disadvantage that a linear predictor on the right side can assume any
real value, whereas the Poisson mean on the left side, which represents a count,
has to be non-negative. A straightforward solution to this problem is to model
instead the logarithm of the mean using a linear model. Thus, I consider a
generalized linear model with a link log: g(µ) = log(µ). A GLM using the log link
is called a log-linear model with the form
log(µ) = α + β1x1 + … + βkxk. (2.1)
2.3. Generalized linear models for rate data: Poisson regression
As has been indicated above the Poisson assumption could be used in the analysis
of count data. The data investigated in this paper are not count data but rather rate
† The analysis in this investigation could be interpreted as a survival analysis because we are
interested in analysing the time to giving birth for nulliparous women. Time could be expressed as
a function of age, period or cohort.
7
data, because when the events investigated occur over time it is reasonable to take
into account the time under risk. The formula for the rate r is
r = µ / t,
where µ is the count and t is the time under investigation called person-years
under risk (subsequently p-yrs). A log-linear model for rate data can be modelled
log(µ / t) = α + β1x1 + … + βkxk, (2.2)
which has equivalent representation
log(µ) – log(t) = α + β1x1 + … + βkxk. (2.3)
The adjustment term, - log(t), to the log link of the mean is called an offset.
Standard GLM software (e.g. SAS proc genmod) can fit models having offsets.
For model (2.3) the expected number of outcomes (e.g. births) satisfies
µ = t exp(α + β1x1 + … + βkxk). (2.4)
This means that the mean of µ is proportional to the index t, with proportionality
depending (constantly) on the value of the explanatory variables. If the values of
x1, …, xk is fixed, doubling the population size (doubling the p-yrs) also doubles
the expected number of outcomes. This element is also called the assumption of
piecewise constant intensity, which means that when x1, …, xk is fixed the count is
also constant. This assumption is fundamental for Poisson regression and should
be carefully considered in a concrete analysis. Normally when the intervals of the
variables included in the model are small it is assumed that the assumption is
valid. As will be obvious below I have data for age in whole years and for each
year in the period 1980-1994. Intervals of this size are normally assumed to be
small. Though the assumption cannot be tested in this investigation it seems
plausible.
The model (2.4) can be rewritten in the following way
µ = t ⋅ eα ⋅ eβ1X1 ⋅ ... ⋅ eβkXk, (2.5)
which is the reason why this model is also called a multiplicative model. In this
case e.g. eβkXk represents the relative risk of disease for exposure at level k relative
to a baseline at level 1 (eβ1X1 = 1). It should be emphasized that this specification
is not necessary the correct one and should be evaluated carefully in the course of
model selection.
8
When fitting data of enormous sample size like this study of the fertility pattern
for all fertile women in Denmark for 15 years almost any significance tests of
specific parameters or of whole models will be highly significant, because the
large sample provides small standard errors. But a statistically significantly effect
need not be important in a practical sense. With huge samples, it is crucial to
focus on estimation rather than hypothesis testing, because simpler models are
easier to summarize. Two methods will in this investigation be used to assess the
adequacy of the models estimated. The primary method will be to compare the
estimated fertility rates from the model with the observed fertility rates from the
dataset. This method will give an impression of the models predictive abilities.
The second and less important method will be to compare the deviance of the
model with the degrees of freedom. If these two numbers approach each other the
fit of the model is assumed to be satisfactory6. The deviance is also used to
compare two different but nested models, because the difference between the two
models’ deviances is an approximately chi-squared statistic with degrees of
freedom equal to the number of additional non-redundant parameters that are in
the largest model but not in the smallest and nested model.
2.4. Poisson regression with effects of age, period and cohort
The multiplicative model just described could include many types of explanatory
variable. In this investigation four explanatory variables will be included: The
effect of age, period, cohort and county.
In the first model without the inclusion of the county effect the expected count (µ)
for a given age (a), period (p) and cohort (c) are obtained by the following
formula
µ = t ⋅ exp(α + βaxa + βpxp + βcxc), (2.6)
where βa, βp and βc are the regression coefficients of the age group, period and
cohort, respectively and α is the intercept (when βa=βp=βc=0). This model is
equal to
µ / t = eα ⋅ eβaXa ⋅ eβpXp ⋅ eβcXc. (2.7)
In this model the antilogs of the effects βa, βp and βc are to be interpreted as the
adjusted relative rate ratios with respect to the reference categories for a, p and c.
If the constant parameter eα is multiplied by e
βaXa the age specific rate is
calculated for the reference period and the reference cohort.
9
There is a simple linear relationship between a, p and c
p = c + a, (2.8)
which induces a linear constraint between the three factors. This is a fundamental
problem in interpreting parameter estimates from the full age-period-cohort
model, because when there is this constraint there is no unique solution7;8
. In its
most general form the model is overparameterized since the mathematical relation
between age, period and cohort allows the same model to be written in infinitely
many ways, which leads to a problem of identification. The problem is that the
model has more parameters than may be estimated from the data.
One solution is to find a parameterization which has eβaXa representing fitted age
specific rates by choosing a reference period and reference cohort setting βp=1
and βc=1. This would leave as unknown A age parameters, (P-1) period
parameters and (C-1) cohort parameters. The total number of parameters would
then be A+P+C-2. Unfortunately this does not solve the problem of identification
because of the problem of drift7;8
. Drift is a variation of the rates, which does not
distinguish between period and cohort influences. The term could be understood
as a continuous parameter with the same change in the log-rate on the whole
scale.
In an age log-linear drift-model the number of parameters to estimate are A + 1
(one drift parameter). If an age-period model (P-2) extra parameters expressing
irregular period effects are added and for an age-cohort model (C-2) parameters
are added to the regular age-drift model.
Therefore the whole age-period-cohort model includes three components in
addition to the age parameters: The drift component, a non-drift period
component and a non-drift cohort component. That is 1+(P-2)+(C-2) parameters
in addition to the age parameters (A) (sums up to A+P+C-3). When I estimate by
model (2.7) I try to estimate A+(P-1)+(C-1) estimates which leaves us with
A+P+C-2 parameters. In other words the model includes one parameter for the
period drift and one parameter for the cohort drift. But that is not possible because
it is not possible from the data to distinguish between these two effects7;8
.
One obvious solution to solve this problem is to infer an extra constraint. Because
age is an important determining factor for the fertility of nulliparous women I
infer one constraint on the period or cohort variable and two on the other. This
means that the model could be estimated, but the problem with this solution is that
the two constraints on the period or cohort variable are arbitrarily selected.
10
Therefore another solution to the identification problem has been suggested,
where the ratios of two adjacent relative risks are contrasted8. The method derives
from a consideration of what defines non-drift effects. If this method for a period
effect is used the relative risk of period 3 versus period 2 could be contrasted with
the relative risk of period 2 versus period 1:
12
23
/
/ββ
ββ
ee
ee.
This estimate can be interpreted as a measure of acceleration of the period trend
during the time around period 2. If the ratio is higher than 1 it tells that the period
effect will be higher from period 2 to period 3 compared with the change from
period 1 to period 2. On the logarithmic scale the following relation holds:
β3 - β2 – (β2 - β1) = β3 - 2β2 + β1
This is a measure of curvature, where a negative value indicates a concave
relationship, a positive value indicates a convex relationship and a value of zero is
a straight line. Therefore the method can illustrate the development in small
intervals.
In this investigation I will use both solutions to get the best description of the
data.
The effect of the 16 counties will be modelled within the same framework. One
obvious solution is to introduce the county effect into (2.7):
µ / t = eα ⋅ eβaXa ⋅ eβpXp ⋅ eβcXc ⋅ eβcountyXcounty. (2.9)
This model has the same problem of identification as described above, but the
interpretation of the county estimates could be as the relative risk (antilog of the
estimates) for the reference categories of the age, period and cohort effects. The
analysis will show the interpretation of the model.
After this presentation of the APC-model, the data for this investigation will be
introduced and described.
11
3. Materials
The materials in this investigation originate from The Fertility Database (the
formal name is the Statistical Register of Fertility Research). The register contains
information on women and men of the fertile age resident in Denmark on the 1st
of January of the calendar year in question. It also gives information on the
children of these women and men. A range of information, which is collected
annually, covers the education and employment situation, family and housing
conditions, etc. of all adults. The content of the register therefore gives
opportunity for analysing men and women’s fertility and to analyse parental and
familial relationships (mother-child, father-child, parents of the same child)9.
In this investigation I will analyse the fertility rate of women’s first child with the
respect to age, year and county. I therefore have information of age and year in 1-
year groups and county in 16 groups (see table 3.1). With this information it is
possible to approximate the cohorts (cohort = year-age).
Table 3.1: Counties and municipalities of Denmark
1. Copenhagen Municipality 9. Funen County
2. Frederiksberg Municipality 10. South Jutland County
3. Copenhagen County 11. Ribe County
4. Frederiksborg County 12. Vejle County
5. Roskilde County 13. Ringkøbing County
6. Western Zealand County 14. Aarhus County
7. Storstrøm County 15. Viborg County
8. Bornholm County 16. North Jutland County
I have data about the 15 years from 1980 until 1994 and for the 36 age groups
from 13 until 48 years of age. The Fertility Database covers the whole fertile age
range (13-49 years of age), but because no nulliparous women in this period gave
birth at the age of 49 I excluded this age group. The absolute numbers of childless
women giving birth for all counties are reproduced in appendix II.
12
Figure 3.1. The data in a Lexis diagram
In figure 3.1 the data is illustrated in a Lexis diagram4. The figure clearly
illustrates that the oldest cohorts in this investigation are the oldest women early
in the period and that the youngest cohorts are the youngest at the end of the
period. The figure also shows that the youngest and oldest cohorts only consist of
a few observations that will result in more uncertain statistical inference about
these cohorts.
To calculate the fertility rate for the first child in the 15 years for the 36 age
groups it is also necessary to know how many women in the particular age and
period groups that were under risk. That is the number of women who have not
given birth to any children, because a woman can only have her first child once.
This number is approximated by the number of childless women 1st of January the
same year added to the number of childless women the 1st of January the
following year divided by 2. In the last year (1994) the number at risk is
approximated by the number of childless women 1st of January in 1994. The
number of childless women for each calendar year and age is reproduced in
appendix III, while the age and period-specific fertility rates could be found in
appendix IV.
13
4. Descriptive presentation of data
In this part I will describe the fertility rates for different age groups, periods,
cohorts and counties. This part will only be a description of the data, while the
models already presented in part 2 will be applied in part 5 and 6.
The association between age and the fertility rate of getting the first child is
presented in Figure 4.1.
Figure 4.1. Observed rates of age
0,00
0,02
0,04
0,06
0,08
0,10
0,12
13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
Figure 4.1. shows a strong association between age and the rate of getting the first
child. The rate increases sharply from age 21 and until 27 where after it decreases
sharply.
Figure 4.2. Observed rates by period
0,03
0,04
0,04
0,04
0,04
0,04
0,05
0,05
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
14
Figure 4.2. shows that the rate of first child fell until 1983 where after it again
began a modest increase through the period.
Figure 4.3. Observed age-specific rates
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
20 år 25 år 30 år 35 år 40 år
Figure 4.3. demonstrates that the pattern shown in figure 4.1 and figure 4.2 hides
a very heterogeneous development for the different age groups. The fertility rate
of first child of the younger age groups shows an uniform decrease through the
period, while the older age groups show an increase through the same period.
This pattern could also be found in figure 4.4 where the effect of age on the
fertility rate of first child is shown for the 15 periods.
15
Figure 4.4. Observed age-specific fertility rates for each
year 1980-1994
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
1980 1981
1982 1983
1984 1985
1986 1987
1988 1989
1990 1991
1992 1993
1994
In figure 4.4. it is possible to see the change in age distribution for each year
through the period. The decrease of fertility rate for the younger age groups is
most pronounced early in the period while the increase for the older age groups is
most pronounced later in the period. This pattern could be interpreted to say that
the women who did not gave birth to the first child early in the period waited until
some later point.
Figure 4.5. Observed rates of counties
0,035
0,040
0,045
0,050
Cop
enha
gen
M.
Frede
riksb
erg
M.
Cop
enha
gen
C.
Frede
riksb
org
C.
Ros
kilde
C.
C. W
este
rn Z
ealand
C. B
ornh
olm
Funen
C.
C. S
outh
ern
Jutla
nd
Ribe
C.
Vejle C
.
Vibor
g C.
C. N
orth
ern
Jutla
nd
16
Figure 4.5 illustrates the fertility rate of first child for women for the 16 counties.
The figure shows that the fertility rate is lowest for Copenhagen, Frederiksborg
and Roskilde counties and highest in Ribe County and Copenhagen Municipality.
This result indicates that the different age distributions in the counties may
influence the results because it was expected that the fertility rate would be lowest
in the more urban areas of Denmark and higher in the rural areas of Zealand and
Jutland. This dimension will be further analysed in part 6.
17
5. Modelling
After this preliminary presentation of the data I will now construct a model for the
effect of age, period and cohort. The inclusion of the regional variation will be
included in part 6.
As has been described earlier in this paper the fertility rate has changed
dramatically through the last decades. Both the age specific fertility rates and the
period-specific rates have been changing. But what is still fundamental is that age
is the main explanatory effect of fertility. Therefore it is obvious to introduce it as
the first factor. See model 1.
log(µ) = log(t) + α + βagexage (model 1)
This model consists of the offset value (log(t)), an effect for the reference age
group (�) and the effect of age (�age). This model therefore implies absence of
temporal change in the age specific fertility rates. This model is not of primary
interest and will only be used as the reference model for the models introduced
below.
Next, the period effect is added to model 1 as a continuous variable to investigate
if the different age specific curves show a common constant linear slope or drift
over time. See model 2.
log(µ) = log(t) + α + βagexage + δp(p-p0) (model 2)
This model states that there has been a linear change of the logarithmically
transformed fertility rate when taking account of the effect of age. As seen in the
last part this model is probably not a good description of the data because the
effect of period is different for the specific age groups. See figure 5.1.
18
Figure 5.1. Age-drift model
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
25 y-model 25 y-obs 30 y-model
30 y-obs 35 y-model 35 y-obs
Figure 5.1 illustrates that the heterogeneous development for the different age
groups cannot be captured by the age-drift model. For the 25 years old women the
model is a satisfactory description but for the older age groups with an increasing
fertility rate throughout this period the model is not a good description.
As a note it should be emphasized that the drift parameter in the age-(period)drift
model is equal to the drift parameter in the age-(cohort)drift model7. The age-
(cohort)drift model would give the same linear relationship as model 2 – a
negative parameter estimate.
If the age groups express different linear relationships it may imply that there is a
non-linear effect of period. One simple way to express this relationship would be
an age period model where the period parameter would imply a non-linear
relationship. The age specific log-fertility curves plotted against period should be
parallel (but not linear), if the AP-model would fit well. The model can be
formulated the following way. See model 3a.
log(µ) = log(t) + α + βage xage + βperiod xperiod (model 3a)
As has already been illuminated in Figure 4.3 the age specific curves were not
parallel but rather showed a decreasing trend for the younger women and an
increasing trend for women above 30 years of age. This different development
should not be well fitted with a AP-model, which is confirmed in figure 5.2 and
figure 5.3.
19
Figure 5.2. AP-model
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
20 years - model
20 years - obs
25 years - model
25 years - obs
Figure 5.3. AP - model
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
30 years - model
30 years - obs
35 years - model
35 years - obs
As can be seen the AP-model predicts that the development for every age group is
a decrease until 1983 and thereafter a stagnation of the log-fertility rates. This
prediction does not fit well with the observed data for all age groups. For the
youngest age groups the model does not capture the decreasing trend through the
period and for the older age groups the model gives a completely erroneous
description of the data.
Instead of modelling the age-period model an alternative model could be the age-
cohort model that would model the age and non-linear cohort effects. See model
3b.
log(µ) = log(t) + α + βage xage + βcohort xcohort (model 3b)
The fit of this model is shown in figure 5.4.
20
Figure 5.4. AC-model
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
1980 1982 1984 1986 1988 1990 1992 1994
25 y - model
25 y - obs
30 y - model
30 y - obs
35 y - model
35 y - obs
At first the cohort effect may seem to be relatively less important than the age and
period effects, because one would expect certain period and age effect, while it
may be more difficult to posit cohort effects, particularly differences between
adjacent or nearly adjacent cohorts. On the contrary figure 5.4 shows that model
3b captures the development for the oldest age group (35 years of age) accurately
and more or less captures the trends for the two other age groups. Although the
overall trend is fine there is some divergences between the observed rates and the
model prediction.
This underscores that the simple age-period and age-cohort models are not able to
describe the heterogeneous fertility development in Denmark in the period 1980-
1994. A further extension of the model is necessary to capture this development.
The method used in this investigation is the introduction of both the period and
cohort variables at the same time.
What the cohort effect could introduce to the age-period model is different age
specific period fertility rates. With this introduction the oldest cohorts will have a
higher fertility rate for younger women, while the youngest cohorts will have a
higher fertility rate for older women. This interaction between age and period
could possibly be described by the cohort effect.
Therefore I estimate the full age-period-cohort model. See model 4a.
log(µ) = log(t) + α + βage xage + βperiod xperiod + βcohort xcohort (model 4a)
The age-period-cohort model allows for non-parallel age specific mortality curves
as a function of cohort or period. But the model has the problem of identification
as described in part 2.
21
This has been solved in two ways in this investigation. The first solution is to
make an additional constraint on the cohort parameter, so it has been set to zero
for cohort 1945 and 1967. By introducing one constraint on period and two on the
cohort effect the problem of identification is eliminated. The observed and
expected plots for this model can be found in Figure 5.5 and Figure 5.6.
Figure 5.5. Age-period-cohort model
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
20 years - model
20 years - obs
25 years - model
25 years - obs
Figure 5.6. Age-period-cohort model
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
30 years - model
30 years - obs
35 years - model
35 years - obs
This model seems to fit the data much better‡. The model predicts that the slope of
the different age groups is not equal so that it is decreasing for the younger groups
‡ When estimating model 4a using the SAS-procedure proc genmod it should be noted that the
convergence is questionable, but that this problem was solved by excluding the four oldest cohort
in 1980 (equivalent with the three oldest in 1981, the two oldest in 1982 and the oldest in 1983)
and the youngest cohort in 1994. This convergence problem was caused by no observations in
these 11 cells. Despite the problem of convergence the estimates in the model with all
observations were no different than the dataset with the five cohorts excluded.
22
and increasing for the older. The minor deviations between the model and the
observed rates for the four age groups presented in the two figures could not
easily be explained. The cohort effect is illustrated in Figure 5.7 as the relative
risk compared with the two reference cohorts: 1945 and 1967.
Figure 5.7. Cohort effect in age-period-cohort model
0,0
0,5
1,0
1,5
2,0
2,5
1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980
Re
lati
ve
ris
k (
ref:
19
45
an
d 1
96
7)
For the four oldest cohorts and for the youngest cohort the women have
experienced no birth and therefore the standard errors of these parameters are
relatively large. Therefore the confidence intervals have not been included in the
figure.
The concrete estimates of the cohort effect in Figure 5.7 are not identifiable, while
the shape of the figure could be identified. The shape indicates that the cohorts
between the two reference groups have a higher ‘risk’ of fertility, while the oldest
and youngest cohorts have a lower ‘risk’. The lower risk might be explained by
few first-time births for the oldest cohorts, which could represent that these
cohorts do not give birth to the first child in the age-groups included in the data.
The confidence intervals also indicate that the estimates are very uncertain. For
the youngest cohorts the decreased risk could represent that these cohorts
postpone their first birth. The cohorts after 1967 are younger than 27 years of age
at the time of this investigation and therefore could be influenced by the still
lower fertility for the younger age groups later in the period of observation. This
lowered risk may have changed if I had data of the whole fertile age-span for
these cohorts.
The higher fertility rate for the cohort between 1945 and 1967 expresses that these
cohorts have a higher fertility rate compared with the reference cohorts when we
have taken account of the age and period effects.
23
Figure 5.8. Period effekt in APC-model
0,0
0,2
0,4
0,6
0,8
1,0
1,2
1,4
1,6
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
Figure 5.8 illustrates the parameters for the period effect expressed as the relative
risk of fertility compared with the fertility rate of 1987. The figure clearly
underscores that the fertility rate fell until 1983 where after it was stable until
1985 and increased for the rest of the period. This is the estimate for the reference
groups for the age and cohort effects which means that it is the fertility rate of the
25-year-old women from one of the two reference cohorts.
To compare the models introduced above a deviance analysis have been carried
through. See Table 5.1.
Table 5.1. Deviance analysis of model 1-4
Nr. Model (1)
Devian.
(D)
(2)
df
(3)
comp.
model
(4)
∆D
(5)
∆df
(6)
p
(7) mod.
v obs dat
(1)/(2)
0 Null 430895.3 8630 49.9299
1 A 30768.5 8595 0 400126.8 35 <0.001 3.5798
2 A-drift 30257.4 8594 1 83.1 1 <0.001 3.5208
3a AP 29183.0 8581 2 1502.4 13 <0.001 3.4009
3b AC 26386.3 8546 2 4299.1 48 <0.001 3.0876
4a APC #11 23481.5 8533 3a 5701.5 48 <0.001 2.7518
4b APC #22 23481.5 8533 3b 2904.8 13 <0.001 2.7518
1Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967 2Year is constrained twice for year = 1982 and 1992 and cohort is constrained for cohort = 1967
The table shows that the full age-period-cohort models (model 4a and 4b) give the
best fit of the data (column (7)), which was also obvious above in the figures. The
age effect have an immense explanatory effect compared with the null-model and
each inclusion of a variable in the models gives a better fit of the model (column
(4)-(6)).
24
The age-period-cohort model gives the best fit of the above models, but still has
the disadvantage that no unique parameterisation is possible. In fact inevitable
many solutions exit. This was solved by introducing an additional constraint on
the period and cohort effects. This solution is satisfactory if you find constraints
that are justified, but it should be emphasized that the constraint in many
investigations in general and this investigation in particular are chosen arbitrarily.
If you want to avoid this problem another solution has been suggested, where you
contrast the ratio of two adjacent relative risks8.
The great strength of this estimate is that it is unaffected by the parameterisation
of the age, period and cohort effect. See Figure 5.9 and Figure 5.10.
Figure 5.9. Second differences cohort effect
(without cohorts 1932-42 and 1978-81)
0,8
0,9
1,0
1,1
1,2
1942 1945 1948 1951 1954 1957 1960 1963 1966 1969 1972 1975
Figure 5.10. Second differences period effect
0,8
0,9
1,0
1,1
1,2
1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993
Figure 5.9 shows minor deviations from 1 in the cohorts between 1942 and 1978.
The deviations that can be shown have no overall trend and the differences
between single years show no consistent pattern.
25
Figure 5.10 shows that the second order differences for the periods are very small.
The trend is that there is a little acceleration in the fertility rates in the start of the
period and a more heterogeneous development in the end of the period. The
acceleration around 1981 indicates a slowing off of the general decrease in the
start of the 1980’s, while the acceleration around 1983 illustrates the change from
a decreasing trend until 1983 and a modest increase from 1983 to 1984. The
second differences are very close to 1 and should be interpreted cautiously.
To get a more overall interpretation of the second order differences it may be
necessary to smooth out the estimates to get an impression of an overall trend.
This has not been done in this investigation, because it is outside the scope of the
statistical inference applied in this investigation, but it could give a more coherent
and interpretable impression of the secular and generational effects on the fertility
rates.
After this statistical analysis of the general fertility in Denmark I will now analyse
the regional variation.
26
6. Regional variation
Denmark is divided into 14 counties and the municipalities of Copenhagen and
Frederiksberg. The variation between the counties is large when analysing the
total and age specific fertility rates3, but to my knowledge it has not previously
been analysed for nulliparous women. The purpose of this investigations will be
to set up a statistical model that may give estimates for the counties and describe
the differences between the counties in a concise way.
The basis for the analyses will be the model 4a described in the last section with
an additional parameter for the counties. See model 5.
log(µ) = log(t)+α+βagexage+βperiodxperiod+βcohortxcohort+βcounty x county (model 5)
Model 5 has the same problem of identification as described above. I therefore
infer an additional constraint on the cohort-parameter (for the cohorts 1945 and
1967). The parameter estimates for the counties in model 5 are reproduced in
textbox 6.1.
The exponential transformation could be interpreted as the relative risk of birth
for nulliparous women in the counties in relation to Aarhus County (reference
category) when taking account of age, period and cohort. The estimates state that
Aarhus County has the lowest fertility rate of nulliparous women, while
Bornholm County has the highest. The municipalities of Copenhagen and
Frederiksberg have surprisingly high estimates while most of the rural counties
have rather low fertility rates. Overall the pattern of the estimates is rather
surprising.
Textbox 6.1. Parameter estimates for the APC-county-model Standard
Parameter Estimate Error exp(estimate) Bornholm County 0.3257 0.0189 1.38 Frederiksberg Municipality 0.1874 0.0120 1.21 Frederiksborg County 0.1359 0.0080 1.15 Funen County 0.1335 0.0071 1.14 Copenhagen Municipality 0.1903 0.0064 1.21 Copenhagen County 0.0715 0.0067 1.01 North Jutland County 0.1979 0.0070 1.01 Ribe County 0.2433 0.0090 1.28 Ringkøbing County 0.2093 0.0085 1.01 Roskilde County 0.1586 0.0093 1.17 Storstrøm County 0.2022 0.0090 1.01 South Jutland County 0.2437 0.0088 1.01 Vejle County 0.2148 0.0079 1.24 West Zealand County 0.2303 0.0084 1.01 Viborg County 0.2512 0.0092 1.29 Aarhus County 0.0000 0.0000 1.00 (ref)
27
If we compare the fertility rates estimated by the model with the observed fertility
rates it becomes obvious that the model does not fit well. Figure 6.1, 6.2 and 6.3
give an impression of the disparity between the model and the observed rates.
Figure 6.1. Ringkøbing and Funen Counties - APC-
county-model
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
0,18
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
25 y,Ringk-model
25 y,Ringk-obs
25 y,Fyn-model
25 y,Fyn-obs
Figure 6.2. Ribe County - APC-county-model
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
Ribe-25 y-model
Ribe-25 y-obs
Ribe-30 y-model
Ribe-30 y-obs
28
Figure 6.3. Copenhagen municipality - APC-county-
model
0,00
0,02
0,04
0,06
0,08
0,10
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
25 y-model 25 y-obs 30 y-model
30 y-obs 35 y-model 35 y-obs
As can be seen from the figures the fit of the model is not striking. Although the
deviance has increased markedly compared with model 4a (see row 2 in Table
6.1) and for some of the counties (e.g. Ringkøbing and Funen counties for 25
years old women) the fit is more or less satisfactory, the overall fit of the model is
not good. The estimates of model for Ribe County shows the correct trend, but
have some differences compared with the observed fertility rates and for the
municipality of Copenhagen the model estimates the fertility rate too high for the
young age group (25 year) and generally too low for the old age group (35 year).
This pattern may suggest that the trend in Copenhagen Municipality differs from
the general trend in Denmark. The same pattern is also seen for the municipality
of Frederiksberg (data not shown). This finding could suggest that the analysis
perhaps should differentiate between the municipalities of Copenhagen and
Frederiksberg on the one side and the rest of counties on the other side.
This suggestion has been followed by analysing the fertility rate in two different
dataset. Model 6a, which has the same parameters as model 5, includes all
counties except the municipalities of Copenhagen and Frederiksberg, while model
6b only includes these two municipalities. It is not possible to compare this model
with the previous estimated models because the included data are not
homogeneous, but the model vs. observed data of model 6a and 6b has decreased
markedly (see Table 6.1).
29
Table 6.1. Deviance analysis of model 4-6
Nr. Model (1)
Devian.
(D)
(2)
df
(3)
comp.
model
(4)
∆D
(5)
∆df
(6)
p
(7) mod.
v obs dat
(1)/(2)
4a APC #11 23481.5 8533 3a 5701.5 48 <0.001 2.7518
5 APCC 1 14729.5 8299 4a 8752.0 234 <0.001 1.7749
6a APCC2 9256.3 7249 - - - - 1.2769
6b APCC3 1147.3 958 - - - - 1.1976
1Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967 2Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967.
The dataset is reduced by excluding the municipalities of Copenhagen and Frederiksberg 3Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967.
The dataset is reduced by only including the municipalities of Copenhagen and Frederiksberg
This solution offers a satisfactory description of the data as seen in figure 6.4-6.6.
Figure 6.4. Ringkøbing and Funen Counties - APC-
county-model - reduced dataset
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
0,18
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
25 y,Ringk-model
25 y,Ringk-obs
25 y,Fyn-model
25 y,Fyn-obs
Figure 6.5. Ribe County - APC-county-model -
reduced dataset
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
Ribe-25 y-model
Ribe-25 y-obs
Ribe-30 y-model
Ribe-30 y-obs
30
Figure 6.6. Copenhagen municipality - APC-county
model with reduced dataset
0,00
0,02
0,04
0,06
0,08
0,10
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
25 y - model 25 y - obs 30 y - model30 y - obs 35 y - model 35 y - obs
The problems seen for model 5 above are no longer as obvious. The fertility rates
for both the municipality of Copenhagen and the other counties are more or less
satisfactorily captured. For Ringkøbing, Funen and Ribe counties the model gives
the right description of the observed rates with some disparities between the
model and the observed rates. For the municipality of Copenhagen model 6b
gives a very satisfactory description with only minor differences between
observed and model rates.
The disadvantage of this solution is that two separate models have to been
estimated. Therefore it is not as simple to report as parameter estimates from a
single model.
As the previous analyses have indicated it seems that the age specific fertility
rates of the different counties diverge with the greatest difference between the
municipalities of Copenhagen and Frederiksberg compared with the rest of the
country. It seems like the fertility rates of the younger age-groups is lower in the
municipalities of Copenhagen and Frederiksberg compared to the rest of the
country, while the rates of the older age-groups is higher compared to the other
counties of Denmark.
This pattern could indicate that an interaction between the specific counties and
age could solve the problems indicated above. See model 7.
log(µ)=log(t)+α+βagexage+βperxper+βcohxcoh+βcounxcoun+βage,counxcounxage (model 7)
Model 7 indicates that the age specific fertility rates will not be the same for
different counties. This might potentially give a good fit. See table 6.2.
31
Table 6.2. Deviance analysis of model 5-7
Nr. Model (1)
Devian.
(D)
(2)
df
(3)
comp.
model
(4)
∆D
(5)
∆df
(6)
p
(7) mod.
v obs dat
(1)/(2)
5 APCC 1 14729.5 8299 - - - - 1.7749
6a APCC2 9256.3 7249 - - - - 1.2769
6b APCC3 1147.3 958 - - - - 1.1976
7 APCC-
int.ac4
7276.6 6609 - - - - 1.1010
1Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967 2Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967.
The dataset is reduced by excluding the municipalities of Copenhagen and Frederiksberg 3Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967.
The dataset is reduced by including only the municipalities of Copenhagen and Frederiksberg 4Year is constrained for year = 1987 and cohort is constrained twice for cohort = 1945 and 1967.
The model includes an interaction between age and county. For the model to converge it was
necessary to exclude observations for women younger than 15 years and older than 44 years
because of few observations in these age groups. This exclusion made it impossible to make
comparison with the other models.
Table 6.2. clearly shows that the model gives a nice description of the data. Figure
6.7 and 6.8 shows the fitted rates and the observed rates for the municipality of
Copenhagen and Ribe county.
Figure 6.7. Copenhagen municipality - APC-
county model with interaction between age and
county
0,00
0,01
0,02
0,03
0,04
0,05
0,06
0,07
0,08
0,09
0,10
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
25 y - model 25 y - obs 30 y - model
30 y - obs 35 y - model 35 y - obs
32
Figure 6.8. Ribe County - APC-county model with
age-county interaction
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
0,18
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
25 y - model 25 y - obs 30 y - model
30 y - obs 35 y - model 35 y - obs
It is obvious that the model fits the fertility rates of Copenhagen much better than
model 5 and that the model fits the rates of Ribe County satisfactory. This model
seems to be able to include both model 6a and 6b.
This advanced model seems to capture the rather diverging fertility rates of the
counties strikingly well. The different age-specific fertility rates of the counties
with lower rates for younger and higher rates for older in the municipalities of
Copenhagen and Frederiksberg is captured with the model while the rates of the
more rural counties are captured satisfactory.
The model specifies different age-structures for the individual counties. Figure 6.9
and 6.10 illustrate the age-specific fertility rates for the reference period (1987)
and reference cohorts (1945 and 1967) for Storstrøm County and Copenhagen
Municipality (Figure 6.9) and North Jutland County and Frederiksberg
Municipality (Figure 6.10).
33
Figure 6.9. Age-specific fertility rates - Storstrøm
County and Copenhagen Municipality
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Storstrøm C -
model
Storstrøm C -
obs
Copenhagen M
- model
Copenhagen M
- obs
Figure 6.10. Age-specific fertility rates - North
Jutland County and Frederiksberg Municipality
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
North Jutlan C -
model
North Jutland C
- obs
Frederiksberg
M - model
Frederiksberg
M - obs
The figures clearly show that the APC-county model with interaction between age
and county gives a very nice description of the observed fertility rates both for
counties with high age-specific rates and for the municipalities of Copenhagen
and Frederiksberg where the fertility rates is lower for younger women and then
approaches the level of the other more rural counties for older women.
This pattern is also illustrated in Figure 6.11 and 6.12 where the age-specific rates
estimated by model 7 are illustrated for the 14 counties and 2 municipalities.
34
Figure 6.11. Age-specific modelrates - Eastern
Denmark
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Copenhagen M
Frederiksberg M
Copenhagen C
Frederiksborg C
Roskilde C
West Zealand C
Storstrøm C
Bornholm C
Figure 6.12. Age-specific modelrates - Western
Denmark
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Funen C
South Jutland C
Ribe C
Vejle C
Ringkøbing C
Aarhus C
Viborg C
North Jutland C
The figures clearly show that the fertility rates of Copenhagen and Frederiksberg
are lower for the younger age groups, but also that the counties of Aarhus,
Copenhagen and to some extent Funen are lower than the remainder. These three
counties and the two municipalities is illustrated in Figure 6.13.
35
Figure 6.13. Age-specific modelrates - urban
regions of Denmark
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Frederiksberg C
Funen C
Copenhagen M
Copenhagen C
Aarhus C
This figure of the age-specific fertility rates of the most urban areas of Denmark
shows that Copenhagen and Frederiksberg municipalities have the lowest fertility
rates but also that the fertility rate of the youngest age-groups in Copenhagen
differ from Frederiksberg. The childless women about the age of 20 have a higher
fertility rate than the other counties but after this rather high fertility rate the rates
is consistently the lowest until the age of 35. The three counties have almost
consistently higher fertility rates than the two municipalities and have the highest
fertility rate at an earlier age than the municipalities. As also seen in Figure 6.12
the fertility rates of Funen County approaches the rates of other more rural
counties. This pattern may illustrate that Funen County is both composed of the
urban area around Odense and the smaller more rural cities on the island.
This examination of the APC-county model with an interaction between age and
county clearly shows that the model gives satisfactory description of the regional
variation in Denmark in the period from 1980 until 1994. The interaction between
age and county offers a rather simple formulation of the complex demographic
phenomenon of different age-structures of fertility in the regions of Denmark.
36
7. Conclusion
This investigation has analysed the fertility of childless women in the period
1980-1994. The investigation has been two-fased by both analysing the fertility
rate for the whole country and by analysing the regional differences between the
14 counties and 2 municipalities in Denmark. The purpose of the investigation
was to illustrate how the age-period-cohort model could describe these
phenomena and to illustrate if this multiple regression technique could give
knowledge on how the three connected time-scales have influenced the fertility
rates.
The analyses have shown that a simple description only introducing age and
period effects gives an imperfect description of the data. The full age-period-
cohort model has the ability to describe the heterogeneous development of
fertility rates for different ages with a decreasing trend through the period for the
youngest age-groups and increasing trend for the older age-groups. The cohort
effect captures that the birth cohorts from the 1950’s and the start of the 1960’s
have a higher fertility rate than younger and older cohorts.
When analysing the regional variation by an APC-model introducing a categorial
effect of the different regions, it becomes obvious that this model gives a
erroneous description of the data. For the municipalities of Copenhagen and
Frederiksberg the model overestimates the effect of younger women and
underestimates the effect of older women. For the other counties the model is
more or less satisfactory. This finding opened an analysis where it was
hypothesized that the different effect of age for the specific counties could be
modelled by introducing an interaction between age and county. This model gave
a very satisfactory description of the data.
Through the investigation I have been using the terms year, period and cohort
effect. It should be emphasized that it is not possible to attribute any causal
interpretation of age, year or cohort. These effects are proxy variables of not
directly observed factors that may be part of the physical and social environment.
The factors could range from very basic and obvious factors such as availability
of contraceptives to more subtle factors such as the connotations of giving birth
for younger (or older) women. The interpretation of the slower changing cohort
effect is not as straightforward in interpretation as the effect of age and period.
The necessary condition for a cohort effect is that the impact of some period
effect has a permanent effect on particular cohorts. It is not adequate to think of
cohort effect as fast occurring as e.g. the period effect of legalisation of
37
contraceptives, but rather as slower changes over a longer time span. Because of
these difficulties of interpretation of the three effects, the use of APC-models
could be used as an analysis in the start of a scientific process used to make an
advanced description of a phenomenon occurring over a time span.
This study has extended our knowledge of first-time fertility in Denmark by
simultaneous analyzing the effect of age, period and cohort. The study has shown
that the APC-model could adequately describe the combined effect of these three
time-scales on the demographic phenomenon fertility and could describe the
regional variation between counties in Denmark.
38
References
1. Statistics Denmark. statistikbanken.dk. www.statistikbanken.dk . 27-7-2003.
2. Statistics Denmark. Statistiske efterretninger 1997:3. Statistics Denmark,
1997.
3. Statistics Denmark. Befolkningens bevægelser 1999. Statistics Denmark,
2000.
4. Preston SH, Heuveline P, Guillot M. Demography: Measuring and
Modelling Population Processes. Blackwell Publishers, 2000.
5. Agresti A. An Introduction to Categorial Analysis. John Wiley & Sons, Inc,
1996.
6. Arbyn M, Van Oyen H, Sartor F, Tibaldi F, Molenberghs G. Description of
the influence of age, period and cohort effects on cervical cancer mortality
by loglinear Poisson models (Belgium, 1955-94). Archives of Public Health
2002;60:73-100.
7. Clayton D,.Schifflers E. Models for temporal variation in cancer rates. I:
Age-period and age-cohort models. Statistics in Medicine 1987;6:449-67.
8. Clayton D,.Schifflers E. Models for temporal variation in cancer rates. II:
Age-period-cohort models. Statistics in Medicine 1987;6:469-81.
9. Statistics Denmark. Fertilitetsdatabasen - Vejledning i udtræk fra
Fertilitetsdatabasen. Statistics Denmark, 1996.
39
Appendix I – the relationship between survival analysis and Poisson regression
Consider the age at first birth as a survival time. The actual survival time of a woman
‘under risk’ of giving birth to her first child, t, can be regarded as the value of a variable
T, which can take any non-negative value. The values of T have a probability distribution
and can be regarded as a random variable associated with the survival time. If the random
variable T has a probability distribution with the underlying probability density function
f(t), the distribution function of T is given by
F(t) = P(T ≤ t) = �t
duuf0
)( , (I.1)
which represents the probability that the survival time is less than t.
The survival function, S(t), is defined as the probability that the survival time is greater
than or equal to t:
S(t) = P(T > t) = 1 – F(t) = �∞
t
duuf )( . (I.2)
Another central function is the hazard function, h(t), which is the probability that the
woman gives birth at time t, conditional on she was not given birth to that time. The
hazard function represents the instantaneous birth rate for a childless woman at time t.
The formal definition of the hazard function is:
h(t) = ���
����
� ≥+<≤
→ t
tTttTtP
t δ
δ
δ
)(lim
0. (I.3)
This implies that the probability that the random variable associated with a woman’s
survival time, T, lies between t and t + δt, conditioned on T being greater than or equal to
t. The hazard function is then the limiting value of this probability divided by the time
interval, δt, as δt tends to 0.
The integrated hazard function, H(t), is defined as follows
H(t) = �t
duuh0
)( . (I.4)
From these definitions it is possible to show that
h(t) = )(
)(
tS
tf. (I.5)
It can also be shown that
S(t) = exp(-H(t)) = exp(- �t
duuh0
)( ). (I.6)
The likelihood, L, of a sample data is the joint probability of the observed data, regarded
as a function of the unknown parameters in the assumed model. The survival time is
defined as X.
If the time of birth is X = ti then:
L = f(ti) = h(ti) S(ti). (I.7)
If the time of birth is X > ti, which happens with right censoring, then
L = S(ti). (I.8)
The combined likelihood could then be expressed the following way:
40
L = h(ti)Di S(ti), (I.9)
where Di = 1 if the woman i has given birth and Di = 0 if the woman i is censored.
L = h(ti)Di exp(- �
ti
duuh0
)( ) = h(ti)Di exp(- �
τ
0
)()( dusYuh i ),
where Yi(s) = I(ti ≥ x), which indicates if the woman i gives birth to a child in the whole
time span from time 0 to time τ. Yi(s) will assume the value 1 if the woman remains
childless through the time span and assume the value 0 if she gives birth to the first child
through the time span.
Consider now the likelihood for the i’th woman, Li, where i is 1, …,n independent
observations. The likelihood for all women, L, is then:
L = ∏=
n
i
iL1
= ∏=
n
i 1
h(ti)Di exp(-E) , (I.10)
where E = � �τ
0
)()(i
i duuYsh , which is the same as expected number of births: h(s) is the
probability of giving birth at the time s and �i
i uY )( expresses the number of women
remaining childless through the time of observation 0 - τ.
If I at this point assume that the hazard function is constant (h(t) = h), the following
likelihood function emerges:
L = hD exp(-h �
τ
0
)( duuY , (I.11)
where D = �=
n
i
iD1
and Y(u) = �=
n
i
i uY1
)( .
L = hD exp(-h T) , (I.12)
where T = �τ
0
)( duuY , which is the same as person-years at risk: Y(s) expresses the
number of childless women at the time s which is integrated over the time span 0 - τ.
log L = D log h - h T , (I.13)
where h T is the expected number of births.
This result is equal to the Poisson regression model. If I assume that T is fixed and D is
Poisson distributed with the expected value h T, the likelihood is
)exp(!
)(hT
D
hTD
− = constant ⋅ hD exp(-hT). (I.14)
This means that the Poisson regression model can be used for survival analysis, but that
the interpretation of the person-years of risk (T) should be cautiously interpreted, because
T is not a fixed but random variable.
Recommended