Upload
dobromira-kuneva
View
67
Download
0
Embed Size (px)
Citation preview
Department of Economics Internt undervisningsmateriale K nr. 44
Guidelines for Writing Papers in Descriptive Economics
Prepared by
Hans Linderoth and
Jan Bentzen
2007
Foreword Guidelines for Writing Papers in Descriptive Economics is based upon a rewriting and updating of data in Guidelines for Writing Empirical Papers in the Social Sciences Using Statistical Material (HHÅ, 1996). We thank cand. rer. soc. Birgit Nahrstedt for updating some of the data and secretaries Ann-Marie Gabel and Bodil Rasmussen for making the publication ready for press. Hans Linderoth Jan Bentzen Docent, cand. oecon., ph.d. Lektor, cand. oecon.
1. Introduction 1 2. Phases of the work .................................................................................................................... 1 3. Formulating the statement of the problem ............................................................................. 4
3.1. Base material........................................................................................................................ 4 3.2. Comparative material........................................................................................................... 6 3.3. Explanatory material............................................................................................................ 8
A. Correlation........................................................................................................................ 9 B. Background factors ........................................................................................................ 10 C. The direction of causation ............................................................................................. 13 D. The cause and effect mechanism................................................................................... 13 E. Short and long run ......................................................................................................... 14 F. The selection of explanatory factors − final comments............................................... 16
3.4. Examples of the selection of explanatory material ............................................................ 16 3.4.1. Economic growth ........................................................................................................ 16 3.4.2. Regional differences in income ................................................................................... 18 3.4.3. Space heating in private households........................................................................... 19
4. Collecting data and other material........................................................................................ 20 5. Working with the material ..................................................................................................... 24
5.1. Table construction.............................................................................................................. 24 5.2. Figure construction ............................................................................................................ 29
5.2.1. Technical criteria........................................................................................................ 30 5.2.2. Logarithmic scales ...................................................................................................... 34 5.2.3. Bar and pie diagrams.................................................................................................. 39 5.2.4. Scatter diagrams ......................................................................................................... 45 5.2.5. Lorenz curves .............................................................................................................. 45
5.3. Using comparative and explanatory material .................................................................... 49 5.4. Standardizing means .......................................................................................................... 52
5.4.1. Price and quantity (volume) indexes........................................................................... 55 5.5. Analyzing time series data ................................................................................................. 61
5.5.1. The elements of a time series ...................................................................................... 62 5.5.2. Moving averages ......................................................................................................... 64 5.5.3. Seasonal correction .................................................................................................... 66 5.5.4. Trends and cycles........................................................................................................ 73
6. Making commentaries ............................................................................................................ 76 7. Construction of the report...................................................................................................... 78 References .................................................................................................................................... 81
1
1. Introduction
The following guidelines are intended for the student writing an empirical paper on a subject in
descriptive economics. The guidelines present the means and methods to be used in the
preparation of that paper and are applicable to analyses of subjects for which statistical material
is the primary source. The techniques presented here are relatively elementary and are intended
for students in their first year of study. However, in many cases, these techniques are relevant for
the writing of all papers assigned throughout the course of study.
The empirical paper has a form that is typically used in short reports that are delivered to, and
received from, public authorities or private management. These reports contain issues and
problems that are also defined, described, and analysed on the basis of collected statistical
material, etc.
The following pages contain a description of the work phases one goes through in producing
such a report. The content focuses on the methods, based on both technical and analytical
principles, for producing reports. The technical principles include techniques for calculating
statistical data, for producing appropriate tables, etc., and the analytical principles include
techniques for formulating the problem, conclusions, etc. It should be emphasized that not all
methods presented here are relevant for all subjects tackled in a paper. The student must decide
for him or herself which methods and techniques are appropriate for a successful result.
2. Phases of the work
The empirical paper starts with the idea, the issue and/or the problem one wishes to investigate.
These ideas/issues/problems must be precisely defined and set out in the form of a statement of
the problem, which in a detailed and concrete manner forms the agenda for the ensuing
investigation. Such an agenda demands that all concepts and terms used in the investigation are
also precisely defined, that all relevant questions or sub-problems that are to be answered in the
investigation are formulated, and that the method to be used in arriving at the solution is
presented. A paper in descriptive economics is, for the most part, to be based on statistical
material published by institutions in the public sector. The method entails, therefore, an
identification of the statistical material that is to be collected.
2
For example, if the issue or problem is related to the state's future ability to provide for the
elderly, the first step in working toward the statement of the problem must be to define the
concept "ability to provide". The ability to provide might be defined or measured, for example,
as public sector disbursements to the elderly in relation to total public sector disbursements or in
relation to tax revenues.
In the process of formulating the statement of the problem, one is often led to a narrowing of
the issues and problems in the investigation. For example, in referring to the example above, one
can choose to ignore service-based public disbursements (disbursements to homes for the
elderly, hospitals, etc.) so that the investigation only considers income transfers to the elderly.
The definitions or delimitations selected will suggest the formulation of sub-problems: How
many persons will, in the future, be counted in the older age groups? How many of the elderly
will receive various types of income transfers, etc? These sub-problems indicate a concrete need
for material in the form of population projections, pension payments, early retirement benefits,
etc.
In general, working toward the statement of the problem requires a kind of brainstorming in
which one dissects the problem, sets the delimitations, and presents a method for solving the
problem. Formulating the statement of the problem (phase l) is treated in-depth in the next
section.
When the statement of the problem has been formulated, the collection phase can begin. Then
the collected material is processed, analysed and commented on. Finally, all the work is strung
together into one report. It is important to take the work phases in the order sketched in Figure 1
and, for example, not to begin immediately with collection of the material without first having
the steering mechanism in place. If the order is not followed, there is a danger you will collect
and waste time on a deal of irrelevant material and that your seminar will consist of loose,
unrelated parts. It is, however, a good idea to read relevant chapters in, for example, The Danish
Economy before working out the statement of the problem in that knowledge of your subject is a
prerequisite for being able to formulate the statement.
3
Figure 1. Phases of work for the report.
Idea/issue/problem
Formulating the statement of the problem
Collecting the material
Working with the material
Making commentaries
Construction of the report
Section 3
Section 4
Section 5
Section 6
Section 7
When collecting data and information (phase 2), you might encounter previously unnoticed
material and points of view that can be relevant to the analysis and which make it necessary to
revise the statement of the problem. Therefore, you should be prepared to differentiate between a
draft statement of the problem, which results from preliminary investigations during the
beginning phase of the work, and the definitive statement of the problem, which gets decided
after a substantial amount of material has been collected. You might also encounter new material
along the way that suggests a narrowing or broadening of the focus on the issue or problem being
investigated.
In processing and manipulating the data and information (phase 3), you might feel compelled
to work again with earlier phases of the work. For example, if a calculated annual growth rate
reveals that a noticeable change has occurred in a particular year, it might be necessary to find
relevant explanatory material pertaining to that year. The same kind of need may arise when
working in the analysis phase (phase 4). Because the process of discovery and preparation is not
necessarily linear, you may wind up cycling around several times between the phases of the
work. These various phases will be described in-depth in the following sections.
4
3. Formulating the statement of the problem
In the previous section, it was indicated that the formulation of the statement of the problem
results in a list that serves to identify materials (statistics, legislation, analyses, etc.) needed to
carry out the investigation. The formulation itself could also contain a general description of how
the issues are to be described, analysed, and judged, given these materials.
The following sub-sections will describe three categories of data and information, all of which
must be included in producing the report. Base material comprises the data and information that
the title of the report directly reflects. If the topic is the deficit in the state budget, basic data and
information will comprise statistics that measure this deficit. If the topic is U.S. oil imports, the
statistical material will, of course, include quantity and value of these imports.
Comparative material comprises the data and information that is used as a standard or scale
against which the base material is compared. For example, the deficit in the state budget could be
compared with the state's revenues, the deficit in other countries, and/or the GDP, all of which
can be used for evaluating the seriousness of this deficit.
The third category is explanatory material, which comprises data and information that is used
to deepen the analysis by explaining the course of development, movement, and/or changes that
have been demonstrated in the base and comparative material. For example, why has the deficit,
measured in relation to revenues, risen by a particular amount in a particular period? The
following section will treat these categories more thoroughly.
3.1. Base material
If there is any doubt about the meaning of the terms used in the title of the paper, these terms
must be clearly and precisely defined. A distinction should be made between theoretical terms
and operational terms. Theoretical terms can be defined more precisely using other, more well-
known terms, while operational terms must be defined more precisely using measurement
methods. Often, operational definitions are provided in the explanations of terms in the texts in
which the data is found. Theoretically, an unemployed person can be defined as a person without
work, who wants and can work for a wage that is normally paid to persons with similar
qualifications. If one uses statistics that only include individuals eligible to receive
5
unemployment benefits, however, the theoretical and operational definitions will not agree
because there will always be a group of out-of-work individuals who would and could work, but
who are not eligible to receive unemployment benefits. In such a case, the operational term is not
considered adequate for the theoretical term.
If writing a paper titled "Market Sensitivity of the Textile Industry", you must decide how to
measure market sensitivity. It would be reasonable to use a measure of production that could be
related to measures of production used in other industries and/or other branches. Such a measure
could be used to determine whether the textile industry is more or less sensitive than other
branches to market fluctuations.
A paper titled "Kuwait's Economy" also demands theoretical considerations. It must be
decided exactly how to define and measure "economy". Data and information regarding the
national accounts, balance of payments, government budgets, etc. might be part of that
definition. But in a 15-page paper, there is not enough space for a comprehensive economic
description of any country. One must choose the economic factors considered most important for
the country of interest and be ready to justify that choice.
GDP per capita in constant prices is the term usually used to indicate development in a
country's economy. It is problematic to use the term in certain cases because GDP per capita in
constant prices can fall during a period, or the growth rate can fall, even if the country has
obviously become much more prosperous. For example, this might be the case for an oil-
exporting country after a distinct oil price rise coupled with reduced oil exports. The reduced oil
exports will, ceteris paribus, reduce GDP in constant prices, but the increased oil revenues can be
used to increase consumption via increased imports. In this case, it would at least be natural to
supplement GDP per capita with consumption per capita as a measure of economic welfare.
The Gulf War has also been used as a seminar topic, from an economic perspective. This topic
demands a precise discussion of which economic factors might be the most important for the
subject and should, therefore, be included.
Clarification of the topic "Fuel Oil Consumption" demands neither theoretical nor operational
considerations as it deals with the consumption of a well-defined good, which is reported
quarterly in the statistics. A long list of examples could be presented here for which the
clarification of terms is not necessary. It is the responsibility of the author to decide if the paper
title contains problem words requiring clarification.
6
In addition to clarifying the meaning of the terms used in the base material, clarification should
also be made with respect to the choice of time period and the degree of detail. Deciding the time
period involves not only the choice of the year in which the investigation begins, but also the
interval of years used throughout. If the paper focuses on a 10-year period, it is not in all cases
necessary to include material from all 10 years. It can sometimes be a good idea to divide the
whole period up into sub-periods, for example. Regarding the end point in the time period, it is
ultimately important to use the most recently available material.
The degree of detail concerns the division of the base material into sub-groups. For a subject
involving age distribution, you must decide how many age groups and what range of ages to use.
For a subject involving Denmark's energy consumption, you must decide whether to divide
consumption by energy products or consumption sectors. Should consumption, thereafter, be
divided into all different forms of energy products, such as petrol, fuel oil, coal, brown coal, etc.,
or only divided into oil products, solid fuels, etc.? For a subject involving industrial structure,
should the manufacturing industry be treated according to its sub-industries, or should the
industry be treated as an aggregate? The division of material into sub-groups is usually an
essential part of the analysis.
The optimal degree of detail is determined by the objective of the paper. The student often uses
an unnecessary amount of detail, and this results in a paper with large, unclear tables in which
patterns in the material of interest are difficult to figure out.
Considerations about the use and definition of terms, time periods, and degree of detail are
relevant to not only the base material, but also to comparative and explanatory material. This
applies as well to consistency between the three material categories. Students often mistakenly
use inconsistent time periods when discussing base material, comparative material, and
explanatory material − either the intervals are different and/or the beginning and ending dates do
not match. As a ruIe, this does not work, especially because the comparative and explanatory
material must relate directly to the base material.
3.2. Comparative material
It was earlier mentioned that comparative material should be used as a context for the base
material. For example, an analysis of the wages of primary and lower secondary school teachers
7
might also be compared to the wages of individuals in other occupational groups. Or, an analysis
of the history of employment in a particular sector might be related to employment in other
sectors and/or to employment in that sector in other countries. Without comparative material, it is
not possible to judge if the base material reflects a high, an average, or a low value in a given
year, or if a growth rate measuring development, changes, or movement is high, average, or low.
In a paper written about copper, information indicated that the total amount of copper in
manganese deposits on the sea floor had been estimated at 3 billion tons. Such information
cannot stand alone. It must be related to copper consumption and/or quantities of copper from
other sources. In Meadows (1972), it is stated that:
Given present resource consumption rates and the projected increase in these rates, the great majority of the currently important non-renewable resources will be extremely costly 100 years from now. ... The price of mercury, for example, has gone up 500 percent in the last 20 years; the price of lead has increased 300 percent in the last 30 years.
It is clear that the last sentence is included to make more credible the prediction of a steep rise in
the future price of raw materials given that the prices of some raw materials have already begun
to rise sharply. However, the question is if the prices of mercury and lead really have risen very
much? That cannot be concluded without comparative material in the form of price movements
of other related products, for example. Besides, a price rise of 300% over the course of 30 years
is equivalent to annual rate of 4¾%, which is hardly more than the rise in prices for many other
products.
In another paper, it was mentioned that cultivated land area in Iraq increased by less than 2%
between 1979 and 1989. The paper concluded by saying "there had been very little change in the
size of cultivated land". That seems reasonable given that 2% is a modest number in many cases.
But cultivated land area changes very little over a decade, and in most countries, this area
actually decreases in size. Therefore, seen in the context of world-wide changes, a 2% increase is
relatively large.
In a third paper, it was stated that grain was the most important agricultural product. This
conclusion was based on the total quantity of production in tons. However, the production of
grain should be compared in value terms with the production of other agricultural products if the
intention is to identify the most important agricultural product. It would actually be best to use
8
value added as a measure for value.
Comparative material forms the context for evaluation. The better this context, the more in-
depth an analysis of the base material can be made, thus leading to a greater understanding of the
issue under investigation.
3.3. Explanatory material
An analysis including only base material, supplemented possibly by comparative material, can
only answer how, when, and what questions. The purpose is to map out the objects of the
analysis using a certain amount of information. For example, in 2006, there were x unemployed
persons on average per week, of which y were ... etc. For another example, in the period 1972-
2006, oil consumption fell by x PJ of which y PJ is due to a fall in oil consumption used for
space heating, z PJ is due to a fall in oil consumption in the utility sector, etc. Such a
decomposition of the total can be said to explain some of the development in the total number.
You can, but only to a certain degree, respond to the "why" question.
A deeper analysis of "why", however, requires a cause and effect (or causal) analysis.
A causal analysis provides the greatest knowledge about the objects of the analysis. The material
that supplements the base and comparative material in a causal analysis is called the explanatory
material. The purpose of a causal analysis is to establish a factor C as the cause of a particular
effect E.
E can be the number of unemployed individuals divided into groups by characteristics at
different points in time. For example, in a causal analysis, one might be to explain why
unemployment is larger in North Jutland than in other parts of Denmark, or why unemployment
is larger among women than among men. The purpose might be to establish a causal relationship
between occupation and mortality, between marital status and mortality, or between income and
mortality.
Cindep. Edep.Explained factor Explanatory factor
9
In the following sub-sections, the discussion gets around the considerations one should take
into account in making a causal analysis. These considerations are relevant both to the choice of
explanatory material and to the conclusion phase of the paper (cf. Section 6).
A. Correlation
The material should reveal a pattern between C and E. If C is occupation and E is mortality, the
pattern might consist of a large difference in mortality rates among the occupational groups.
Which occupations are hazardous and which are not? To the extent there is no difference,
occupation is not an explanatory factor in an analysis of mortality.
Occupation is an example of a qualitative variable, the value of which cannot be measured or
expressed in numbers. Other examples of qualitative variables include gender, municipalities,
countries, marital status.
As opposed to qualitative variables, quantitative variables can be expressed in numbers.
Examples of quantitative variables include age, height, product, and income. Instead of using
occupation as an explanatory factor in the analysis of mortality, one can choose income, as
mentioned earlier.
These guidelines do not contain a discussion of the statistical tests used to determine if two or
more variables are correlated. You must be content to compare the variables using a sketched
figure based on the respective variables' values or by listing these values in a table and looking
for the pattern in the material.
If large values of C correspond to large values of E, then the correlation is positive. If large
values of C correspond to small values of E, then the correlation is negative. For example, if
unemployment (E) is larger this year while economic growth (C) is lower, then the correlation is
negative. As a rule, there is a positive correlation between consumption and income and a
negative correlation between consumption and the price of a good.1 The probing for positive and
negative correlations has, of course, only significance in an analysis of relationships between
quantitative variables.
The degree of the relationship or correlation between two variables can be measured by R2,
which indicates the degree of linearity between the variables in question. In scatter diagrams,
1 For normal goods, income elasticity is positive and price elasticity is negative.
10
Excel can display R2 values on charts. The higher the value of R2, the stronger is the relationship
between the variables. If R2 is one, there is a perfect linear relationship between the variables. A
value equal to zero means there is no relationship at all.
B. Background factors
The pattern or correlation that is revealed under point A is not necessarily a sign of causality.
Correlation can be found among a number of variables for which no causality is present. The
correlation can be due to the condition that both C and E are causally connected to a common
cause Cl (cf. case 1 in the following figure). Income per capita is correlated with a number of
variables that are not necessarily causally connected themselves. For example, there is a positive
correlation between GDP per capita and women's participation in the labour force and between
GDP per capita and alcohol consumption. The positive correlation between women's
participation in the labour force and alcohol consumption that results from these two
relationships hardly expresses a causal relationship. At least, this requires a demonstration that
women in the labour force, ceteris paribus, drink more alcohol than women who remain at home.
The usual problem in causal analysis is that factors other than C have significance for E. These
other factors are called background factors (B, contributory causes), cf., case 2.
In the example of occupation and mortality, the background factors could be age, gender,
inclination to smoke, eating habits, alcohol consumption, marital status, etc. In another example,
not only is fertility dependent on income, but it is also dependent on occupation, religion, marital
status, and residence, among other factors.
Case 1: Situation with common causes
Case 2: Situation with contribu-tory causes
Case 3: Situation with intermedi-ate causes
C1 E C B E
C
E
C
B
11
Case 3 treats intermediate causes. As an example, alcohol consumption could be an
intermediate cause between occupation and mortality. In certain occupations, there may be a
tradition for an relatively high alcohol consumption. That is, it is not the work itself that is
dangerous.
Globally, a negative correlation between income and fertility can be displayed, cf. Figure 3.1.
Maybe this correlation is based on a positive correlation between income and the mother's level
of education and a negative correlation between the mother's level of education and fertility. If
this is the correct relationship, fertility will not fall as income rises if women's level of education
does not rise as well when income rises.
Figure 3.1. Correlation between fertility and GNI per capita, 2004.
R2 = 0.4641
0
1
2
3
4
5
6
7
0 10000 20000 30000 40000 50000 60000
GNI per capita, PPP
Ferti
lity
rate
, tot
al
Saudi Arabia
KuwaitIsrael
Luxembourg
Hong Kong, China
Denmark
Russia
China
Source: World Development Indicators, 2006.
Figure 3.1 shows a significant spread around the drawn curve. It shows, for example, that the
point for China lies significantly under the curve. This is partly explained by a distinct policy
China has for limiting fertility, and is also presumably partly explained by the high level of
education women receive in China relative to the level of income. In contrast, points for
12
countries in the Middle East lie above the drawn curve, presumably because of the low level of
education for women relative to the level of income. And the relatively low level of education
for women in the Middle East can possibly be explained on the basis of religious and cultural
background. One must remember, however, that income level in the Middle East has increased
tremendously over a short period of time as a consequence of the development in the oil market.
Danmarks Statistik2 has shown that the risk of an accident, and the resulting personal damage,
associated with private cars that are 8-11 years old are approximately double that for cars that are
only 0-3 years old.3 These numbers indicate a clear causal relationship between the age of a car
and the risk of an accident. But maybe a substantial part of this relationship can be explained by
the age of the driver. It has been documented that drivers under 25, and over 65, years of age run,
respectively, 4 times and 2½ times the risk of an accident than do drivers between 35 and 64
years of age. And due to economic reasons, drivers of older cars are principally under 25, and
over 64, years of age! Therefore, it can be the driver's age that is so decisive for accident risk and
not that of the car.
In all, it can be said that one faces a complicated network of relationships,4 where a factor can
be explained by a series of other factors which themselves can be explained by a series of other
factors, etc. These kinds of networks are called causal chains. This involves explanations of
explanations.
If the purpose of the paper is not to illustrate the relationship between E and a particular
explanatory factor C, then the distinction between C and B has no meaning in the formulation of
the statement of the problem, in which one takes into consideration only those explanatory
factors that should be brought into the analysis. On the other hand, where the relationship
between E and C is important, this distinction has great significance for the comments of
correlation between two variables. If an important background factor is not accounted for in the
analysis, the conclusion will most likely be completely off track.
2 This is the name for Denmark's statistical office, Statistics Denmark. 3 News from Danmarks Statistik (NYT), No. 321, 1993. 4 The economist attempts to account for this network of relationships by constructing models that build in the causal relationships among a range of economic variables.
13
C. The direction of causation
Does an occupation result in a particular mortality rate, or does a particular mortality rate lead to
a particular occupation? Should the arrow (the direction of causation) be turned around? There is
hardly any doubt that some occupations require a particular standard of health and thereby are
connected to mortality. Often the timing between factors is not clear. Do increased wages lead to
increased prices or vice verse? Has the increased mechanization in agriculture led to the
increased exodus of workers or vice versa? Does increased income per capita lead to increased
levels of education or vice versa? There is a negative correlation between income per capita and
agriculture's share of GDP at factor prices. This is not the same as saying that "increased income
is the cause of a fall in agriculture's share of GDP" because the increased income may possibly
be based on a transfer of labour from agriculture to other sectors where the wages to factors of
production are higher than in agriculture. If this is the case, a fall in the share of GDP is a
contributory cause to an increased GDP per capita. On the other hand, high economic growth in
general encourages the migration of workers from agriculture because high economic growth
creates relatively good employment possibilities in the manufacturing sector, for example. This
transfer of a factor of production to other sectors reduces the agricultural sector's share of GDP,
ceteris paribus.
When two factors influence each other (C ↔ E), there is mutual causality. One cannot
maintain that one factor causes the other. Economic growth and agriculture's decreasing share of
GDP are mutually related. In the context of mutual causality, the actual issue being investigated
can be decisive for which factor should be treated as the dependent factor (E) and which factor
should be treated as the independent factor (C).
In statistical analyses, the aim is often to test the explanatory power of a factor. For example,
this can be done by investigating if the change in an explanatory factory (C) takes effect before a
change in the explained factor (E). That is, the data is analyzed closely to determine if potential
changes in C typically lead (in time) to subsequent changes in E.
D. The cause and effect mechanism
The relationship between C and E might be based on a sequence of events which can be
14
described in more or less detail. Industry x is characterized by work taking place in shifts and
involving hazardous substances, etc. By supplementing the investigation with an explanation of
the causal mechanism, one can further establish whether the correlation between C and E is of a
causal type.
To explain the relationship between economic variables, you must use economic theories. In
reality, economic theories are brought in as the first step in trying to decide what the explanatory
material should consist of, in that these theories point towards material that is meaningful in a
given relationship. When the analysis concerns the consumption of a good, it is natural to bring
in disposable income as an explanatory factor, as well as other factors. Disposable income is, in
itself, dependent on tax policy. Investments are dependent on interest rate movements and
economic development in general, etc.
In a paper, the terms of trade (export price index/import price index) entered as an important
economic growth factor. It was hypothesized that improved terms of trade during a period had
led to increased growth. The correctness of this hypothesis depends on why the terms of trade
had improved. If it had improved because of increased domestic wages, competitive ability
would have become worse, ceteris paribus, which would have influenced the quantity of exports
negatively. Improved terms of trade based on domestic increases in costs is, therefore, growth
reducing. On the other hand, the terms of trade could have improved as a consequence of
increased demand for the country's export goods, and this increases growth.
Throughout the first two oil crises, the terms of trade fell for a range of industrial countries as
a consequence of the increased import prices for energy. Because energy consumption was/is
very price inelastic, at least in the short term, an increased share of income had to be used on
energy consumption, which of course reduced the demand for other goods and services, and this
reduced growth. Growth was negative in the wake of the energy crises.
To summarize, one should be able to justify the choice of explanatory material. The causal
connection between C and E must be made plausible.
E. Short and long run
Many examples can be found in economic theory where the effect is first felt after a period of
time has passed. A permanent increase in income leads normally to increased consumption, but
15
the full effect is first felt after consumers become used to the higher income. Higher oil prices
lead to an increased demand for other energy sources, but the increase in demand for energy
products other than oil is greater in the long run than in the short run because substitutability is
greater in the long run than in the short run.
If one is interested in the effects in the long run, one cannot be satisfied with data that registers
effects in the short run. An incorrect registration with respect to time can result in an incorrect
conclusion concerning the direction of correlation (positive/negative) between C and E and as
well as the strength of the correlation. If the correlation is negative in the short run and positive
in the long run, and one can only determine the short run effects, the chances are high that
incorrect conclusions will be drawn.
In the example about occupation and mortality, the following time-related sources might
incorrectly be drawn in:
Occupation x is the hazardous occupation which a worker leaves after a few years. As a
consequence of this hazardous occupation, he or she either retires with disability payments or is
so ill that he or she chooses the less hazardous occupation "u" after the illness period. In using
the correct data to determine the relationship, one can see that a hypothesis relating mortality and
occupation should be rejected; too simple a model overlooks the intervening time variables.
A paper on economic growth included a section on basic growth factors in the long run. One
discussion centred on increases in factors of production, such as capital, labour, and productivity.
But the paper included data for only a few years and was limited to a description of short-term
fluctuations in GDP. In reality, the long run explanatory factors were not of interest, since the
paper very clearly used only data that referred to the business cycle.
In another paper, it was mentioned that increased economic growth implied increased public
expenditures because greater growth increased government revenues and consequential the
possibility for committing to larger expenditures. This positive correlation between economic
growth and public expenditures applies in the long run in that a rich country generally makes
greater public expenditures than a less rich country does. In the short run, however, increased
economic growth will result in a fall in public transfer payments to unemployment benefits and
occupation x death
occupation x death illness occupation u
retirement due to disability
16
welfare; that is, the correlation is negative in the short run. And since this paper had only data for
a shorter number of years, neither the long run relationship nor the included hypothesis was
relevant.
F. The selection of explanatory factors − final comments
The objective of the paper is to decide which explanatory material to use. In a descriptive
investigation, there is need for only little or no explanatory material. On the other hand, in
making a causal analysis, one should not strive to pack in as many explanatory factors as
possible, but only to select the presumably most important factors, which can be treated more
thoroughly as a result. In the selection of these presumably most important factors, the
distinction between short run and long run explanatory factors is especially important. If the
analysis is to examine the change in base and comparative data over the short term, the factors
having great explanatory power will not be the same as those having explanatory power for long
run effects (cf. examples discussed earlier).
Often the base material is divided up into groups. As examples, industry in general can be
divided up into industry branches, and energy consumption can be divided into sectors as well
as energy products. In the selection of explanatory material, one should not select material
having great explanatory power for only a small sub-group. This will unbalance the paper.
However, in most cases, an analysis will be strengthened if special attention is paid to include
explanatory material relevant to analyses of those periods where the changes are distinct.
Finally, it should be mentioned that finding a reasonable argument for causality between two
variables does not mean that one has proved causality. More sophisticated tests are required.
But it can be said that causality is likely, given the arguments made in the paper.
3.4. Examples of the selection of explanatory material
3.4.1. Economic growth
As mentioned earlier, a socio-economic issue most often involves a network of relationships. An
analysis of the basis for economic growth can therefore be very complicated, involving a wide
17
range of explanatory variables, cf. Figure 3.4.1.
Figure 3.4.1. Causal network of economic growth
Primarily, growth can be explained by the development in factor inputs and the development in
output per unit of factor input. The larger the increase in the effort of production factors, the
greater is growth, and the greater output per unit of factor input, the greater is growth. To make
the matter more complicated, the included factors or variables are not independent of each other.
The development of technological progress is not independent of capital effort and educational
level. It is also clear that the development in employment is dependent on economic growth rate
and vice versa. It is, in general, difficult to isolate the contribution of the individual factors, so
some debatable calculated assumptions must be used. These will not be treated here.
The variables discussed above are proximate sources of growth. So-called ultimate sources of
growth reflect basic relationships in an economy, such as culture (tradition for education, etc.),
demography, history, institutional relationships, economic policy, etc.
Even in a comprehensive analysis, which is much more than what is expected in a paper, you
can select only a limited number of explanatory variables and be satisfied with that. The choice
of variables will depend on the actual issues under investigation. Under all circumstances, it is an
advantage to be able to recognize the larger causal network when concluding the paper.
18
3.4.2. Regional differences in income
It is apparent that average income for the active worker is a function of a region's distribution of
industries and industry-determined wages. A region can have a relatively low average income
because the region has relatively many employed in industries where the wages, in general, are
low. On average income may be low because wages, themselves, are relatively low for the
region for a given industry.
It would, therefore, be natural to start by investigating the distribution of industries and
industry-determined wages by establishing the explanatory material, as in Figure 3.4.2.
Figure 3.4.2. Explanatory material of regional differences in active workers' incomes.
The regional distribution of industries is determined by the resource base, among other things.
The resource base includes agricultural land, fish stocks, tourist attractions, etc. The distribution
of industries is also determined by the age and gender structure in the population. For example,
not only is the distribution of jobs held by young women different from those held by older
19
women, but the rate of participation also differs between the two groups. It must be
remembered, however, that the opportunities for working influence the age and gender
distribution found in the working population. For example, poor employment opportunities lead
to an emigration of young people, especially.
The international business cycle influences industry income fluctuations to different degrees.
Industries that produce investment goods for export are especially sensitive to these business
cycles. EU's agricultural prices influence earnings in the agricultural sector. The justification for
all the arrows in Figure 3.4.2 will not be given here. It is left as an exercise for the student to
work out the justification for these sketched relationships, as well as to suggest others.
As mentioned in the last section under point F, one should select only those explanatory
factors considered most important in relation to the chosen time horizon, among other things. In
Figure 3.4.2, one would consider resource base, age, and gender to be long run factors; one
would consider agricultural prices and the international business cycle to be short run factors.
3.4.3. Space heating in private households
In Figure 3.4.3, income, prices, temperature, housing area, etc. appear as explanatory factors.
Naturally, income influences energy consumption. The higher the income, the higher is
energy consumption. High energy consumption, caused by a low outdoor temperature, among
other things, will increase insulation activities (the arrow from consumption to insulation) and
will lead to a reduced indoor temperature (arrow from consumption to indoor temperature
setting) because high consumption means that the proportion of energy consumption in total
consumption is high. Desired indoor temperature should, therefore, be seen in relation to prices
and income.
Changes in outdoor temperature will influence consumption somewhat in the short run, while
housing area will influence consumption in the long run. If one is interested in explaining the
per capita use of energy for space heating among selected countries, relevant factors will
include income, price, housing area, and degree of insulation. Further explanation of Figure
3.4.3 is left to the reader.
20
Figure 3.4.3. Explanatory elements for energy used for space heating in private households.
Often, one must do one's best without relevant explanatory factors because information about
these factors cannot be located or is missing. For example, it would most likely be difficult to
obtain information about insulation standards in the majority of countries.
4. Collecting data and other material
When the formulation of the statement of the problem is finalized, one knows to a great extent
which material needs to be collected in the libraries. While the library staff can help with search
techniques, these guidelines will focus on problems that might occur in the process of collecting
data and information.
As indicated in Table 4.1, different statistical sources often report different results for
presumably identical terms. Several sources are often used when one is working with a period
21
where the oldest data must be taken from one source and the newest data must be taken from
another source. One can check to see if the shift in sources creates problems by comparing data
for the same year in the two sources. If there are significant differences, one ought to carefully
read the explanation/definition of terms usually accompanying statistical material. In the
conclusion, then, one can draw attention to the divergence in the statistics and provide an
assessment as to whether this divergence is significant for the analysis.
Table 4.1. Denmark's total energy requirements and final energy consumption in 2001, as assessed by various organizations (PJ).
Statistics Denmark Gross energy consumption 787 Gross energy consumption, adjusted 815 Danish Energy Agency Total gross energy consumption 829 Total final energy consumption 642 Gross energy consumption, adjusted 831 BP 779 OECD Total supply 828 Total final consumption 635 Sources: Statistical Ten Year Review 2003 (Statistics Denmark), The Danish Energy Agency (2002), Statistical
Review of World Energy 2001 (BP, www.bp.com), OECD (2003).
Several sources might also be used when one wishes to compare energy consumption in several
sectors, for example. That information may not necessarily be found in one source. It should be
noted that energy statistics, especially, are plagued by a lack of consistency among sources. In
most statistics, international efforts to work out the inconsistencies found in term definition and
structure, etc., have been so comprehensive that many comparisons today can be carried out
without a problem.
The most typical cases of inconsistency in data arise when you rely on statistical material
found in books. The material is often incomplete, and term definition is often lacking. It must be
emphasized that you should collect data and information from primary statistical sources and not
from books, to as great an extent as possible. Not only may books often be filled with errors,
they will not contain the most up-to-date material either.
One must also pay close attention to the continuous updating of, and corrections made to,
statistical data. For example, the numbers in the national accounts are issued in several versions
22
at different periods of time because the primary material used in creating the national accounts is
available at different periods of time. To the extent possible, therefore, one must use numbers
from the most recent sources.
Breaks in the data will also occur when the methods for constructing that data change. These
kinds of breaks occur often in a time series. One must asses, then, if the break in the data has
significance for the analysis. If this is the case, then the break in the data must be discussed in the
paper.
A data break can be caused by changes in administrative structure. For example, the reform of
local government structure in 1970 – like the recent reform (2007) - resulted in a significant
change in the number of municipalities and counties. This made statistical comparisons of data
collected before and after 1970 either very difficult or virtually impossible. Further, a data break
can be caused by a change in the definition of industries and branches.
OPEC,5 EU and EFTA have represented a different number of countries at different points in
time. This means that you should not only be aware of data breaks in the data issued by these
organizations, but also in the data issued by other organizations where similar changes may have
taken place.
In summary, it is very important that you are aware of significant changes occurring in a time
series due to breaks in the data. The student must closely read, and be familiar with, all
footnotes, notes, etc. that accompany data. Warnings about breaks in the data, term redefinition,
etc. will usually be found in footnotes, notes, etc.
Thus, data should only be collected from primary statistical sources as the national statistical
bureaus, OECD, ECB, etc. and not from a general search on the internet. Most of the
information and data found at different web-sites are not produced in a quality similar to the
before-mentioned sources and cannot generally be recommended for use in empirical papers –
with exemptions, of course. A problem with the electronic data sources is the limited amount of
information directly available when accessing the databases – compared to the printed,
statistical material - and it is often necessary to search for more information about the data,
definitions etc, e.g. the OECD homepage (or SourceOECD) where a lot of reports etc. are
available along huge amounts of data. Finally, be aware of the different use of 'comma' and
'period' used as separators in the data bases, where e.g. SourceOECD would list a number as
5 Organization of the Petroleum Exporting Countries.
23
1,280.00 – which would appear as 1280,00 (or 1.280,00) if Statistics Denmark should report
such a number.
List of www-addresses
http://www.dst.dk/ (Statistics Denmark) http://www.statistikbanken.dk/ (Databank at Statistics Denmark) http://www.sfi.dk/ (National Institute of Social Research) http://www.akf.dk/ (Institute of Local Government Studies - Denmark) http://www.fm.dk/ (Ministry of Finance) http://www.skm.dk/ (Ministry of Taxation) http://www.sm.dk/ (Ministry of Social Affairs) http://www.retsinfo.dk/ (Information on Danish Laws) http://www.oecd.org/ (OECD) http://www.ssb.no/ (Statistics Norway) http://www.scb.se/ (Statistics Sweden) http://www.ae-dk.dk/ (Economic Council of the Labor Movement) http://www.di.dk/ (Danish Industry) http://www.dors.dk/ (Economic Council, Denmark) http://www.ecb.int/ (European Central Bank) http://www.imf.org/ (IMF) http://www.nationalbanken.dk/ (The central bank of Denmark) http://www.undp.org/ (UNDP) http://www.who.int/ (WHO) http://www.doe.gov/ (The American Energy Administration) http://www.iea.org/ (International Energy Agency (IEA)) http://www.iisd.ca/ (International Institute for Sustainable Development) http://www.ipcc.ch/ (The UN Intergovernmental Panel on Climate Change (IPCC)) http://www.ens.dk/ (Danish Energy Agency) http://www.da.dk/ (Danish Employers Confederation) http://www.danmark.dk/ (Rules, transfer income and eligibility) http://www.saf.se/ (Swedish Employers Confederation) http://www.europa.int/comm/eurostat (Eurostat) http://www.europa.eu.int/ (EU) http://www.wto.org/ (WTO) http://www.bis.org/ (BIS) http://www.finansraadet.dk/ (Danish Bankers Association) http://www.forsikringenshus.dk/ (Danish Insurance Association) http://www.ftnet.dk/ (Danish Financial Supervisory Authority) http://www.realkreditraadet.dk/ (The Association of Danish Mortgage Banks) http://www.xcse.dk/ (Copenhagen Stock Exchange) http://www.em.dk/ (Danish Ministry of Business and Industry) http://www.fao.org/ (FAO) http://www.fvm.dk/ (Danish Ministry of Food, Agriculture and Fisheries) http://www.landbrug.dk/ (Links to Danish agricultural institutions etc.)
24
http://www.min.dk/ (Danish Ministry of the Environment) http://www.fedstats.gov/ (Links to US federal agencies) http://www.ks.dk/ (Danish Competition Authority) http://www.worldbank.org/ (The World Bank) http://www.worldwatch.org/ (Worldwatch Institute) http://www.wri.org/ (World Resources Institute) http://www.ilo.org/ (International Labour Organization) fmwww.bc.edu/ec/data.html (Economic and Financial Data) http://www.econlinks.com/ (Economics News and Data) http://www.economagic.com/ (Economic Time Series)
5. Working with the material
Working with the material means producing tables and figures as well as calculating relevant
indexes and other data, etc. This chapter covers the techniques for achieving just that. The first
two sections treat the techniques for table and figure construction. The sections following apply
to how you use comparative and explanatory material as well as how to standardize means and
analyze time series data.
5.1. Table construction
The main purpose in using tables is to present statistical material in a clear and succinct form. A
text filled with a lot of numbers is difficult to wade through. By creating a clear presentation of
the data in a table, the text is no longer cluttered with numbers.
A table consists of data presented in a special frame containing all the necessary information
for understanding what the data stands for and which sources have been used. Such a frame for
a table is illustrated below.
The table number is used in the text to refer back to the table. You can use consecutive
numbering throughout or use consecutive numbering within each section, as is done here. The
title must precisely state what information is found in the table and will most likely consist of
three elements. The first element is the statistical unit being counted. The composite whole, also
called the population, makes up the sum of all the units in the data. The whole can be, for
example, the population of Denmark, and the corresponding unit would be one Dane. The title
would begin with "Number of persons in Denmark" or "Population of Denmark".
25
Tabel 5.1.1. Title
Heading for Headings for the row variables the column variables Row variables Data Total Notes, if any: Footnotes, if any: Sources:
The second element is often an identifying variable(s) associated with the units in the table.
Identifying variables associated with Danes could be age, gender, income, marital status, etc. If
the included variables are age and gender, then, as the second element, they appears in the title
as "by age and gender". Note that the categories representing the identifying variables (in the
case for gender, these categories are female and male) are not mentioned in the title. The third
element included in the title is the time period. As in Table 5.1.2 below, the chosen time period
is January 1, 1970 and 2006. The title in this table thus becomes "Number of persons in
Denmark, by gender and age, January 1, 1970 and 2006."
The categories representing the identifying variables appear in the column and row headings.
In this case, there are two variables plus the time period. With three dimensions, it becomes
necessary to assign two of the dimensions to either the column or the row heading. If only two
dimensions were included, then there would be no need to divide either the column or the row
heading. If four dimensions were included, then it would be necessary to divide up both the
column and row headings unless either the row or the column heading could be divided up into
three. The student is warned against using more than four dimensions in a table because this
would create only confusion not clarity, which the table is supposed to achieve.
26
Table 5.1.2. Number of persons in Denmark, by gender and age, January 1, 1970 and 2006.
Source: www.statistikbanken.dk Labels are used in both the column and row headings to ease the interpretation of the table.
Note that nothing is gained by writing "gender" above the headings "female and male" because
it is already clear that the feature is gender. Correspondingly, there is no gain in writing
"country" in the heading above if Denmark, Sweden, Norway, etc. appear, or "year" where
1970, 1971, etc. appear. In the selected example, however, it could be of value to include a
heading for the included age groups even if you could figure out what the variable is all about
just by reading the title. The reason is that it is not immediately clear what these groupings
represent.
Next, a table requires mention of the measure used in the data material. Is the measure being
made in millions of persons, thousands of units, GJ (billion Joules) or something else? In
addition, the measure can be placed in a number of places. If there is only one measure
represented in the table, it can be placed last in the title or just above where the data is located.
The latter is often preferred (see Table 5.1.2). If there are several measures, these must be placed
outside or inside the areas of the respective columns or rows. The measures must not be placed
in a footnote or a note where they can easily be overlooked.
Note in the first column of Table 5.1.2 that, when all the numbers are added up, they do not
match the total given below. This occurs because of the practice of rounding off the individual
numbers for presentation while the total is based on the sum of the numbers before they are
rounded off. As a result, there is often a mismatch between the sum of the numbers in a column
and the total given for that column.
Rounding off is practiced because it is not always necessary to include all the digits in the data
to give a reasonable presentation of the numbers. For example, it is rarely necessary to write the
population of Denmark with seven digits. In Table 5.1.2, only 4 digits are used and the measure
Female Male Total 1970 2006 1970 2006 1970 2006 ------------------------------------ 1000 persons --------------------------------------- 0 - 19 years 743 648 780 682 1523 1330 20 - 39 years 667 699 689 713 1357 1412 40 - 59 years 593 754 575 768 1169 1523 60 years and over 472 640 388 523 859 1163 Total 2474 2740 2432 2686 4907 5427
27
is in 1000 persons.
Note the use of lines in the table − the data are not placed in separate windows, but appear
instead as a body. There is room in the table for the sum of the columns; and notes and footnotes
are placed under the last line of the table, if there are any. Notes are used to provide any
supplementary information about the statistical unit being measured or the table in general;
footnotes are used to provide supplementary information about individual elements in the table.
For example, there might be a need to discuss the definition of a term or the technique used in
calculating some of the numbers in a footnote.
The table is made complete by a clear citation of the sources used. When a table is used in a
report, you need to identify the source for the information under that table with just enough
information that the reader can easily locate the full information for that source in the reference
list at the end of the report. The information and form requirements for the reference list are
discussed in Section 7. You can identify a source by the author and year of publication, for
example, Andersen, et al. (2006), if one has used The Danish Economy as the source. If the
author appears several times in the reference list for the same year, one should provide a suffix
to the year, such as a, b, etc. (for example, 2006a). This suffix must also be used in the report's
reference list. In table 5.1.2, the source is a statistical database. Including the name of the
variable (www.statistikbanken.dk/BEF1A) will make it easier to find the data.
Diverse organizations, both public and private, often issue reports. When these reports are
used as the source for a table, citation of the source should contain either the title and year or
number of publication (for example, World Development Report, 2006 which is issued by the
World Bank) or the name of the organization and year (for example, The World Bank (2006)).
The citation of the source in the table is made complete by a page or table number. A
reference to the pages used makes it easier for the opponent, or other interested readers, to
reproduce the material. It is a general requirement for technical reports that the included
material can be reproduced. Without proper reference to pages or tables, it can be especially
difficult to locate the actual material when books are used as sources
The rules discussed for citing sources for information used in tables also apply for material
that is used in the text. References to sources for material used in the text are placed either in the
text (for example, Andersen (2006), p. 25), or as a footnote located at the end of the page. The
latter is often preferred in that a series of references to sources in the text can make the text
28
cumbersome.
It may be necessary to include numbers/data in the text when the amount to be used is too
small to establish a table. For example, one would hardly construct a table to present two or
three numbers. These numbers can be mentioned in the text without making the text
cumbersome. Footnotes can also be used to annotate the text or to make side comments that are
not central to the subject, but which can be interesting to the reader anyway.
In Table 5.1.2, only the actual numbers are included. It can be seen that the population
increased from 1970 to 2006 and that this increase took place only in the age groups 20 years
and over. But the structure in the material becomes clearer if the numbers are presented as
percentages, as in Table 5.1.3.
Tabel 5.1.3. Number of persons in Denmark, by gender and age, January 1, 1970 and 2006.
Source: As in Table 5.1.2.
In Table 5.1.3, the proportion of the population in each age group is shown by gender for 1970
and 2006. The actual numbers are included in the last row, distributed by gender and year,
making it is possible for the reader to calculate back to the original numbers found in Table
5.1.2. The table also reveals the development in the population for each gender and age group.
If the focus is to be on growth in the population for each gender and age group, the numbers
should be presented as in Table 5.1.4. In this table, the change in the population for each gender
and age group can be seen directly by comparing the index numbers in 2006 with the index
number for each group in 1970, or 100. Instead of index numbers, changes in per cent could have
been used. In this case, in the location "0-19 years / 2006 / female", − 13% would replace the
index number 87. The most useful form of presentation depends on the issue being investigated.
But it should be emphasized that in the greater number of cases, you will have to work the data
Female Male Total 1970 2006 1970 2006 1970 2006 -------------------------------------------- % --------------------------------------------- 0 - 19 years 30 24 32 25 31 25 20 - 39 years 27 25 28 31 28 26 40 - 59 years 24 28 24 26 24 28 60 years and over 19 23 16 18 17 21 Total 1000 persons
100 2474
100 2704
100 2432
100 2686
100 4906
100 5427
29
you get in order to present them into the form you want.
Tabel 5.1.4. Number of persons in Denmark, by gender and age, January 1, 1972 and 1992.
Source: As in Table 5.1.2.
In summary, a good table fulfils two requirements. First, it meets the technical criteria just
described. Second, it presents the material in such a way that the relevant patterns become
obvious. This second point concerns not only the calculation of per cent values and/or index
numbers, but refers also to an optimal degree of detail. In the previous tables, the material was
divided into four age groups. If you were interested in dependence of public sector expenditures
on shifts in the age distribution of the population, another division of age groups might be better.
However, you should be careful not to use too much detail; relevant patterns can drown in detail.
A good table is clear.
5.2. Figure construction
Above all, it should be pointed out that a good figure, like a good table, presents the material
clearly. The strength of a figure is that the pattern in the material appears more obvious than in
a table, in many cases. This applies especially if one is working with a data series extending
over a long period of time. If one wants to compare several data series that only cover 10 years,
for example, a figure will also provide greater clarity than a table, as a rule.
The next section presents the technical criteria for a figure, while the following sections
present various figure configurations.
1970 2006 Female Male Total Female Male Total ------------------ 1000 ------------------- ------------------ 1970 = 100 ------------- 0 - 19 years 743 780 1523 87 87 87 20 - 39 years 667 689 1356 105 103 104 40 - 59 years 593 576 1169 127 133 130 60 years and over 472 388 859 136 135 135 Total 2474 2432 4907 111 110 111
30
5.2.1. Technical criteria
In an empirical paper, the curve is the most popular type of figure used for picturing one or
several sets of data. That is why the curve will be used as the example for describing the
technical criteria associated with figures.
The figure must have a title and source identification analogous to that used for tables. The
title can, of course, be placed in other locations, but the placement on the top of the figure is
most often used. Any notes to be added are usually placed under the axis of the abscissa (x-
axis).
Axis labels must be included unless what is being measured on the axes is obvious. For
example, one does not need to include the word "year" when 1980, 1981, etc. appears on the
axis (this is most often the x-axis). Axis labels contain often a scale, for example, 1000 persons
or millions of $. Also, it is too cumbersome to include too many zeros on the axes. Instead of
writing 100,000, 200,000, etc. on the axis, it is better to use 100, 200, etc. and include 1000 on
the axis label. The scale can also be placed in the title.
Labels to the curves indicate what the individual curves represent (here A and B) and can be
placed in several places. The placement at the end of the respective curves is preferred in most
cases. In some cases, the curves run nearly together at the end, and so a placement at the end of
the curves can be problematic. You can, instead, place a symbol on the curves at a place where
the curves are clearly separate from each other. Labels to the curves can also be placed under the
x-axis. However, this placement reduces the clarity a bit, especially if the figure includes several
curves. One needs to remember the symbols for these representations to be able to read the
figure.
Figure 5.2.1 consists of two figures showing the development in the number of persons
employed in Denmark, where the ordinate axis (y-axis) begins at zero in the upper figure and at
2300 in the lower figure. It seems clear that the curves are very different in the two cases. You
might be inclined, therefore, to comment that the curves are different. In the upper figure, you
might note the smaller swings in employment, but would focus more on the rising trend, while
in the lower figure, you might be caught by the swings in the curve, for example, the increasing
employment from 1983 to 1987 as well as from 1993 to 2001. Which presentation is best
depends on the issue one is investigating. Normally, it is best to include the point of origin so as
31
not to exaggerate the swings in the data. But in certain cases, even smaller swings can yet be
essential for the issue under investigation. If this is the case, these swings of course should be
brought out by ignoring the point of origin.
Figure 5.2.1. Number of persons employed in Denmark, 1966-2006, 1000 persons.
0
500
1000
1500
2000
2500
3000
1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006
2300
2400
2500
2600
2700
2800
2900
1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006
Source: www.statistikbanken.dk/NAT18.
32
A figure must not be overcrowded, cf., the demand for clarity. In Figure 5.2.2, the world's oil
reserves are indicated by country or country group in a bar diagram. But this representation
makes it difficult to overview the material. And the overview is not much better in Figure 5.2.3,
where the bars from Figure 5.2.2 for the countries for each year are constructed on top of one
another. This means that the development associated with those countries in the middle of the
columns is difficult to read.
Figure 5.2.2. World oil reserves, by country or country group, 1982-2005, end of year.
0
5
10
15
20
25
30
35
1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
%
Saudi A. Iran Iraq Kuwait The Emirates Venezuela Non-OPEC Other OPEC countries
Source: www.opec.org – Annual Statistical Bulletin.
33
Figure 5.2.3. World oil reserves, by country or country group, 1982-2005, end of year.
0%
20%
40%
60%
80%
100%
1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
Saudi A. Iran Iraq Kuwait The Emirates Venezuela Non-OPEC Other OPEC countries
Source: As in Figure 5.2.2.
Figure 5.2.4 shows the development in labour productivity as measured in eight sectors. The
figure appears burdened. It does not help that the rather complicated key to the curves has to be
placed under the x-axis because of space limitations. This placement makes the figure even less
clear.
34
Figure 5.2.4. Labour productivity (GDP in 2000 basic prices per worker) for the main sectors of the economy, 1966-2006.
0
100
200
300
400
500
600
700
800
900
1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006
1000
DK
K./w
orke
r
All sectors Agriculture, fishing and quarryingManufactoring ConstructionTrade, hotels, restaurants Transport, storage and communicationFinancial intermediation, business activities Public and personal services
Source: www.statistikbanken.dk/NATo7 and NAT18.
5.2.2. Logarithmic scales
In the previous sections, examples of curves were used to illustrate the development of one or
several data series (time series), where individual observations were connected into straight
lines.6 As mentioned earlier, curves are the figure used most often. Curves drawn with a
logarithmic scale on one axis (also called semi-logarithmic) are discussed in this sub-section.
The use of the logarithmic scale implies that equal linear distances on the axis correspond to
6 Data can be represented either as flow variables or as stock variables. Flow variables concern a period of time, for example, the number of births during 2006, while stock variables concern a value as of a certain point in time, e.g., population January 1, 2006. Actually, when plotting the point for the number of births in 2006 in a figure, the value for 2006 ought to be identified with the middle of the year, i.e., July 1st, but this is not the custom. More usually, the number of births for all of 2006 is indicated on the x-axis. Corresponding indications are used for stock variables in
35
equal percentage changes on a normal scale. This implies that curves with the same slope have
the same growth rate, as measured in per cent.
Figure 5.2.5 illustrates three cases of varying slopes, where each case represents two curves
drawn with the same orientation. In all three cases, the curves run parallel, which means that the
two time series depicted by the curves have the same percentage growth rate. In the first case,
the curves are straight lines, have constant slopes, and have, therefore, constant growth rates.
This is, in other words, an exponential function growing by a constant per cent from period to
period. In many cases, it can be expedient to calculate the growth rates and include them in the
accompanying discussion.7 Seen over the long run, many time series follow approximately an
exponential course, which can be depicted on a semi-logarithmic scale.
Figure 5.2.5. Logarithmic scale and growth rates.
Log
t
Case 1 Caee 2 Case 3
Growth rates: Growth rates: Growth rates:
Identical and constant
Identical and increasing
Identical and decreasing
In the second case, the increasing slopes of the curves mean increasing growth rates; and in the
third case, the growth rates fall with time. It is apparent that the logarithmic depiction is
especially useful when you wish to compare the growth rates of different time series.
A second advantage with semi-logarithmic scales is illustrated in Figure 5.2.6. In the upper
figure, a normal scale for GNP per capita is used; in the lower figure, a semi-logarithmic scale
that one as a rule includes the correct date in the title. You could write "Population of Denmark, January l, 2006" and plot the population value on the x-axis. 7 yt = yo(1+r)t indicates that y grows exponentially with the growth rate r. yt is the value at time t, yo is the value at time o and t is the number of time periods between o and t. When yt yo and t are known, r can of course be determined from the equation by isolating r. It is, however, more normal to use a PC or calculator, where r is found simply by keying in the three known values. The known rules for logarithims transform the previous equation to log yt = log yo + tlog(l+r), the equation for the straight line represented in the first case, where log yo is the intercept on the y-axis and log(1+r) is the slope.
36
for GNP per capita is used. In the upper figure countries are added by a textbox. In many cases,
the location of countries will be of interest. A trendline, the equation of the trendline, as well as
the corresponding R2 value are displayed on the chart.
Figure 5.2.6. Total fertility rate as related to GNP per capita world-wide, 2004.
R2 = 0.4641
0
1
2
3
4
5
6
7
0 10000 20000 30000 40000 50000 60000
GNI per capita, PPP
Ferti
lity
rate
, tot
al
Saudi Arabia
KuwaitIsrael
Luxembourg
Hong Kong, China
Denmark
Russia
China
0
1
2
3
4
5
6
7
100 1000 10000 100000
GNI $ (PPP) per capita (log scale)
Ferti
lity
rate
, tot
al
y = -0,8131Ln(x) + 9,6942
R2 = 0,4641
Source: The World Bank: World Development Indicators.
37
Many countries have a very small per capita income compared with that of western
industrialized countries. In a plot of observations of per capita income using a normal scale,
those for the poorer countries will lie in a large dump close to the y-axis. If the plot is instead
made using a logarithmic scale, the clump will dissolve, and the material will stand more
distinct.
Therefore, the semi-logarithmic scale is often better than a normal scale when comparing
numbers that differ by magnitudes. This advantage is illustrated even more clearly in Figure
5.2.7. The curve of employment in the electricity, gas, and heat sector is nearly one with the x-
axis in the upper figure, while the corresponding curve is clearly represented in the lower figure
and lies distinctly separate from the x-axis.
The lower part of Figure 5.2.7 illustrates, in addition, that total employment was constant after
1966. However, this constant level of employment does not appear in the upper figure. This
illustrates the weakness of the semi-logarithmic scale. It is often not useful for illustrating
smaller changes that can be important in the investigation of certain issues.
38
Figure 5.2.7. Employment in the electricity, gas, and water sector and in Denmark in general, 1966-2006.
0
500
1000
1500
2000
2500
3000
1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006
1000
per
sons
Total
Electricity, gas and w ater
1
10
100
1000
10000
1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006
1000
per
sons
(log
sca
le)
Total
Electricity, gas and water
Source: As in Figure 5.2.4.
39
5.2.3. Bar and pie diagrams
In bar diagrams, data is represented by a column or part of a column. There are two types of bar
diagrams. One type uses qualitative variables and has no scale. Municipalities, gender,
countries, etc., are qualitative variables, where specific values for gender are male/female,
specific values for countries are Saudi Arabia, Iran, etc. Normally, a ratio (e.g., male/female) is
measured along the y-axis and the qualitative variable is measured along the x-axis. Because
there is no scale along the x-axis, the bars can be placed freely, and one uses normal room
between the bars to create clarity. The bars can also be placed as extensions of each other or
next to each other.
The other type of bar diagram is called a histogram and is used for illustrating quantitative
variables. Quantitative variables are age, income, length of marriage, company profits, size of
farms, etc. The quantitative variable is divided up into intervals placed on the x-axis only after
the scale has been determined. Next, the unit intervals must be chosen. This can be seen in the
following example.
Table 5.2.1. Number of divorces, by length of marriage, 2005.
Under 1 year
1 year
2 years
3 years
4 years
5 years
6-7 years
8-9 years
10-14 years
15-19 years
20-24 years
25 yearsand over
Total
169 568 872 1088 1277 1107 1763 1416 2816 1832 1008 1383 15299 1) Excluding 1 case for which duration of marriage was not given. Source: www.statistikbanken.dk/SKI107.
In Table 5.2.1, the intervals have different widths. This is not accounted for in the upper part of
Figure 5.2.8 which gives the impression that a large number of marriages ended in divorce after
10-14 years of marriage.8 This is not the case. The problem is that the width of that interval is not
consistent with the unit interval (which is one year) and therefore overrepresents the importance
of divorces for the time period. By plotting instead the number of divorces per year of marriage
as in the lower part of Figure 5.2.8, the correct picture of the relationship between the two
variables will be made. It can be seen that the greatest number of divorces occurs after four years
8 The interval measures marriages that have lasted 14.999 years; that is, the interval measures up to, but does not include, the 15th year.
40
of marriage.9
Figure 5.2.8. Number of divorces, by length of marriage, 2005.
0
500
1000
1500
2000
2500
3000
0 5 10 15 20 25 30
Duration of marriage, years
Num
ber o
f div
orce
s
0
200
400
600
800
1000
1200
1400
0 5 10 15 20 25 30
Duration of marriage, years
No.
of d
ivor
ces
pr. y
ear o
f mar
riage
25 years and over
Source: As in Table 5.2.1.
9 This conclusion should be taken with reservation because it is not clear how many marriages make up the basis for the divorce data. In other words, one must know the divorce rate for the individual years of the marriage's duration. to be able to determine how many years of marriage are associated with the breaking up of the greatest number of marriages.
41
The frequency in a histogram is determined by the area of the bar; the number of divorces in the
interval 10-14 years is the column height (563.2) multiplied by the ratio of the interval width to
the unit interval (5/1) which is equal to 2816. The column height can be calculated as the
frequency divided by the number of times the interval width is larger than the unit interval.10
It is clear that changing the widths of intervals should not change a figure completely when
wishing to illustrate the relationship between two variables. The figure should look like what one
would have if all the information about the number of divorces occurring for each length of
marriage were available. This ideal is not entirely fulfilled when the distribution is skewed.
Figure 5.2.8 is skewed to the right (the tail is on the right). It must, then, be assumed that there
are more observations in the first half of an interval (prior to the peak) than in the last half of an
interval.
The frequency for the interval at 25 years and over is expressed as a line in the upper part of
Figure 5.2.8 without closure to indicate that the interval is open. One could choose to close the
interval at, for example, 50 years and then calculate the height of the column by dividing the
number of divorces by 25. The open interval can also be represented as a rectangle placed
appropriately in the figure, as in the lower part of Figure 5.2.8, where a rectangle is drawn in
corresponding to 1383 divorces. This area can be immediately compared with the other areas in
the figure.
Another example is the number of tax-paying persons, arranged according to size of taxable
income, as in Table 5.2.2. Income is a quantitative variable, so the frequency will also be plotted
by unit interval, chosen to be 25,000 DKK in Figure 5.2.9.
10 The various interval widths make it a little difficult to use a graphics programme. One can use a scatter diagram which represents the relationship between two variables with points, cf. Section 5.2.4. The points are used to draw the columns and the points are erased after the columns are drawn in.
42
Table 5.2.2. Number of tax-paying persons, by size of taxable income, 2005
Income, DKK 1000
persons
Total income
mill. DKK
No. of persons
%
No of persons
accumulated%
Income Acc., %
0,5B(A+(A+C))
< 25,000 312 1,975 7.1 7.1 0.2 0.71 25,000 - 49,999 125 4,625 2.9 10.0 0.8 1.45 50,000 - 74,999 177 11,150 4.0 14.1 2.2 6.00 75,000 - 99,999 355 31,463 8.1 22.2 6.1 33.62 100,000 - 124,999 502 57,243 11.5 33.7 13.2 110.98 125,000 - 149,999 495 67,801 11.3 45.0 21.7 197.19 150,000 - 174,999 426 69,008 9.8 54.8 30.3 254.80 175,000 - 199,999 381 71,500 8.7 63.5 39.1 301.89 200,000 - 224,999 365 77,477 8.4 71.8 48.8 369.18 225,000 - 249,999 309 73,302 7.1 78.9 57.9 378.79 250,000 - 299,999 414 112,722 9.5 88.4 71.9 616.55 300,000 - 349,999 202 64,966 4.6 93.0 80.0 349.37 350,000 - 399,999 107 39,904 2.5 95.4 85.0 206.25 400,000 and over 199 120,968 4.6 100.0 100.0 425.50 Total 4369 804,103 100.0 3252.26
Source: Statistikbanken.dk/IF13 and IF23.
There are a number of persons whose taxable income equals zero. This is without doubt the most
typical income to the extent the material is divided up by very small income intervals. Here, an
open interval is used for income levels at 25,000 and under.
The interval 250,000 to 299,999 DKK is two times the unit interval. This means that the
column height in Figure 5.2.9 is only 207, rather than 414, as is shown in the table. The figure
shows a sharp fall in the column height after 250,000 DKK. There is hardly doubt that there are
more taxpayers between 250,000 and 274,999 than between 275,000 and 299,999, so the figure
is drawn somewhat incorrectly, cf., the discussion of the examples regarding length of marriage
and divorce. Therefore, material should be reported with as small an interval width as possible. If
the same interval width is used throughout, it implies that the unit interval is equal to the width
of the interval. There is no need to list the unit interval on the axis label in this case. The
remaining data in the table will be used in section 5.2.5.
43
Figure 5.2.9. Number of tax-paying persons, by size of taxable income, 2005.
0
100
200
300
400
500
600
0 50 100 150 200 250 300 350 400 450
Taxable income in 1000 DKK
1000
per
sons
, uni
t int
erva
l 25,
000
DK
K
Source: As in Table 5.2.2.
Population pyramids are a special form of bar diagrams, where the frequency is placed on the x-
axis instead of on the y-axis. Circle diagrams are used for illustrating per cent distributions.
Figure 5.2.10 uses a circle diagram to show the distribution of global oil reserves among the
eight countries or country groups used in Figures 5.2.2 and 5.2.3.
For clarity's sake, it is recommended that labels for the individual sections of the circle be
written in proximity to the respective areas, as in the upper part of the figure, instead of written
elsewhere, as in the lower part of the figure. In addition, you are warned against using too much
detail in the circle diagram and against using too many of them. It is apparent that 10 circle
diagrams or more in a paper to illustrate the development in global oil reserves since 1960 is not
at all sensible. One table or two curve diagrams (four countries/country groups in each diagram)
would be much more preferable.
44
Figure 5.2.10. World oil reserves, by country, 2005, in per cent.
Saudi Arabia
Iran
Iraq
KuwaitThe Emirates
Venezuela
Non-OPEC
Other OPEC countries
Saudi Arabia
Iran
Iraq
Kuwait
The Emirates
Venezuela
Non-OPEC
Other OPEC countries
Source: www.opec.org – Annual Statistical Bulletin.
45
5.2.4. Scatter diagrams
As mentioned earlier in the section on formulating the statement of the problem, one normally
includes explanatory material in a paper. This is often done by comparing the base material (the
explained variable) and the explanatory variable in a curve, where time is on the x-axis. One
looks for a pattern in the material that indicates a relationship between the two variables.
In many cases, this pattern can be illustrated using a plot of the values (most usually values
from the same year) of the explained variable and the explanatory variable in a so-called scatter
diagram.
The scatter diagram is used in Figure 5.2.6 with GNI per capita as the explanatory variable
and fertility as the explained variable. Of course, a lot of variables are correlated with income.
5.2.5. Lorenz curves
GNI per capita differs significantly among the countries of the world. This variation or
skewness in global income distribution can be illustrated using a histogram, where GNI per
capita is divided up into intervals, and the number of countries falling within the individual
intervals is, then, the value that determines the size of the columns. One could choose to let the
value for each country be determined by population size and assume that all persons in a given
country have an income corresponding to the respective country's GNI per capita. In this case,
China, with a population approximately 225 times that of Denmark, would be represented by a
column that is 225 times larger than that of Denmark.
The skewness in a distribution can also be illustrated in a Lorenz curve, as in Figure 5.2.11.
The Lorenz curve is constructed in the following way: First, the countries are arranged
according to size of GNI per capita. Next the countries' per cent share of world GNI and world
population are calculated, respectively, and, after that, accumulated. Finally, the two
accumulated per cent shares are plotted in a scatter diagram. From this, one can read how large a
share of the world's GNI accrues altogether to the poorest 50% of world population. The dotted
line from 50% on the x-axis to the curve indicates on the y-axis that the poorest 50% receive
approximately 5% of the world's GNI. One can also read from the curve that the poorest 80% of
the population receive about 16% of the world's GNI. This means that the distribution of the
46
world's GNI is very much skewed. Note that no consideration is taken for the spread of income
within the individual countries.
Figure 5.2.11. GNI, by world population, 2004, Lorenz Curve.
% of GNI, accumulated
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
% of population, accumulated Source: The World Bank: World Development Indicators
A totally even income distribution means that the per cent share of GNI and population are the
same over the entire curve, as illustrated by the straight line (totally even distribution) in Figure
5.2.11. A totally uneven income distribution means that the person with the highest income has,
in fact, all the income. This distribution is represented by a horizontal curve following the x-axis
up to 100% and the vertical line from 100% to the top of the diagram. The closer the Lorenz
curve lies to the line illustrated totally even distribution, the more equal the distribution. By
drawing a Lorenz curve for several years of data, one can determine with a picture if the
direction of change has been toward greater global equality or the opposite.
Table 5.2.2 showed the distribution of taxable income for taxpayers and Figure 5.2.9 showed
Totally uneven distribution
Totally even distribution
47
a histogram of this distribution. The data can also be represented in a Lorenz curve, as in Figure
5.2.12.
Figure 5.2.12. Taxable income, by size of taxable income, 2005, Lorenz Curve.
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
% of taxpayers, acc.
% o
f tax
able
inco
me,
acc
.
A
B
C
Source: As in Table 5.2.2.
It is apparent that the distribution of data in Figure 5.2.14 is not nearly as skewed as the
distribution shown in Figure 5.2.13. 80% of the taxpayers with the lowest income received
approximately 60% of taxable income. In Figure 5.2.13, the corresponding amount was
approximately 16%.
It seems as if world income distribution is more skewed than income distribution in Denmark.
Such a conclusion should be taken with precaution given that different definitions of income
have most likely been used in calculating income distribution. In addition, the different income
definitions each have weariness that affects a true calculation of income distribution. A
substantial criticism can, in any case, be made against using GNI as a measure of prosperity, and
48
taxable income can be reduced when deductions are taken into account. The value of taxable
income does not take into consideration that income varies over lifetimes. Two persons who
have the same lifetime income will most likely receive largely different taxable incomes for a
given year.
The skewness in the distribution can also be illustrated using the the Gini coefficient which
measures the ratio of the area bounded by the line AB (totally even distribution) and the Lorenz
curve, divided by the area of the triangle ABC = 0,5*100*100 = 5000. The larger the Gini
coefficient, the larger the skewness in the distribution. It is apparent that an equal income
distribution results in a Gini coefficient of 0 and a totally unequal income distribution results in a
coefficient of 1. The Gini coefficient for the curve in Figure 5.2.12 is calculated to be 0.35. Calculation of the Gini coefficient.
Yi Yi
Xi
Gi
Start by calculating the area between the Lorenz curve and ACB (totally uneven distribution). Area income interval i = Yi * Xi + 0,5 (Xi*Gi) = 0,5Xi (Yi + (Yi + Gi)). Income interval 75.000 – 99.999 DKK: = 0,5 * 8.1 (2.2 + 6.1) = 33.62, cf. Table 5.2.2. Total area is then the sum of the area for all the income intervals = 3252.26, cf. Table 5.2.2. The area bounded by AB and the Lorenz curve = 5000 – 3252.26 = 1747.74. Gini coefficient = 1747.74/5000 = 0.35.
Deciles and quartiles are often used in income and wealth statistics. As with the construction
of Lorenz curves, taxpayers are arranged according to size of income. The first decile (10%
fractile) is the value of income below which 10% of the observations lie. The second decile
(20% fractile) is the income below which 20% of the observations lie. The first quartile
corresponds to the 25% fractile and the upper quartile corresponds to the 75% quartile. The 50%
49
quartile is also called the median. The median is equal to the taxable income of the person who
is located just in the middle of the distribution when the taxpayers are arranged according to size
of taxable income.
5.3. Using comparative and explanatory material
In discussing the formulation of the statement of the problem, it was seen that it is of great value
to put the base material in a context using comparative material, in many cases. Comparisons
can be carried out by assembling the base and comparative data in the same table or figure and
by using an accompanying text to highlight the differences in the two.
In an analysis of the industry structure in the County of Ringkøbing, it would be natural to
compare that with the industry structure in Denmark in general. This comparison could be made
based on the per cent distribution of employment within each industry, as in Table 5.3.1. But the
use of a simple calculation can often be an advantage when comparing distributions. In Table
5.3.1, the coefficients in the last column are calculated by dividing the per cent value for
Ringkøbing by the corresponding value for the whole country. If the coefficient is greater than
1, a relatively large number of persons are employed in the respective industry in the County of
Ringkøbing. The table shows that there are relatively many employed in agriculture, fishing and
manufacturing (especially manufactory of textile and leather) in the County of Ringkøbing
compared to Denmark as a whole.
In this example, two distributions of the same kind are compared, that is, employment
distributed by industry. But these relative coefficients can also be used with advantage when
relating distributions of different kinds to each other. In energy analyses, relative energy
intensities can be calculated for different industries by dividing the share of energy consumption
for the industry (as a proportion of that for all industries) with the share of production for the
industry. This results in a measure of the relative energy demand for the production of
individual industries.
No further comment of Table 5.3.1 will be made here. It should be pointed out, however, that
all material used in the paper must be processed and worked on so that the relevant comparisons
appear clearly in the report.
50
Table 5.3.1. Employment, by industry, County of Ringkøbing and Denmark, January 1, 2005.
Source: www.statistikbanken.dk
In the previous discussion on causal analysis (and in that on formulating the statement of the
problem), it was shown that an event or an effect is normally caused by a long series of factors.
If the wish is to determine the relationship between C and E, it is apparent that the background
factors B should be seriously considered. An example was given where C was occupation and E
was mortality. The B's were then, other factors that had contributed to death, such as gender,
age, alcohol misuse, etc.
Denmark Ringkøbing Relative coefficient
------------------------ % ------------------- Agriculture, horticulture and forestry 3.12 6.01 1.93 Fishing 0.16 0.69 4.44 Mining and quarrying 0.14 0.05 0.39 Manufactory of food, beverages and tobacco 2.72 4.36 1.61 Manufactory of textiles and leather 0.37 2.51 6.78 Manufactory of wood products, printing and publication
2.10 3.85 1.83
Manufactory of chemicals and plastic products 1.87 1.45 0.77 Manufactory of other non-metallic mineral products 0.57 0.62 1.09 Manufactory of basic metals and fab. of metal products
6.19 9.78 1.58
Manufactory of furniture, manufacturing n.e.c. 0.99 1.94 1.96 Electricity, gas and water supply 0.53 0.43 0.81 Construction 6.27 6.15 0.98 Sale and repair of motor vehicles sales of auto. Fuel 2.25 2.37 1.05 Wholesale except of motor vehicles 5.80 6.48 1.12 Retail trade and repair work exc. of m. vehicles 6.96 6.93 1.00 Hotels and restaurants 3.07 2.43 0.79 Transport 4.28 3.36 0.79 Post and telecommunications 1.87 1.13 0.60 Finance and insurance 2.71 1.96 0.73 Letting and sale of real estate 1.67 1.25 0.75 Business activities 9.74 5.72 0.59 Public administration 5.46 3.89 0.71 Education 7.54 6.37 0.85 Human health activities 5.81 4.57 0.79 Social institutions etc. 12..08 11.27 0.93 Associations, culture and refuse disposal 5.29 4.07 0.77 Activity not stated 0.44 0.37 0.83 Total 100.0 100.0 1.0
51
The background factors must be taken into account when clustering the data. That is, a
comparison is made of groups that are equivalent with respect to background factors. In the
example with mortality, you would, then, compare mortality between occupations u and x for
those groups that are equivalent with respect to gender, age, or alcohol misuse. If the most
important B's are included in the clustering, the remaining difference in mortality can probably
be ascribed to occupation.
Clustering is based simply on a division of the total into groups. This could be called an
additive analytical method. Deaths are divided up into groups in which different values of the
characteristics relevant for an analysis of mortality are represented. If you are to analyze energy
consumption, it would be natural to divide this consumption up into sectors or purposes
consisting of sub-sectors or sub-purposes in which development in consumption is dependent on
the same explanatory factors. Industries' energy consumption is dependent on factors other than
factors affecting energy consumption in the heating of private households. Industries' energy
consumption is, to a great extent, dependent on industry production, while private household's
energy consumption is very much dependent on explanatory factors such as disposable income,
the relative price of energy, etc.
The multiplicative analytical method can be used to advantage for other problems being
investigated. If you are analyzing petrol consumption, it is natural to include the following
explanatory factors: petrol consumption/mile, miles/car, number of cars/GDP (constant prices)
and GDP (constant prices). All these factors multiplied by each other result in petrol
consumption. The first factor is a measure of energy intensity for petrol-driven cars. This factor
is dependent on petrol prices, among other things. The higher the price of petrol, the higher is
the willingness to drive in cars that get high mileage per gallon.
The second factor is a measure of how much the car has been used. This factor can also be
assumed to be dependent on petrol prices and disposable income. The third factor links the
number of cars together with the current measure for economic development. An increasing GDP
will, ceteris paribus, result in a larger fleet of cars. These explanatory factors can be compared in
a table or figure with data for a series of years, and one can calculate the individual factor's
contribution to the change in petrol consumption.
The multiplicative method of analysis is used also in the calculation of the standardized mean,
which is discussed in the next sub-section.
52
5.4. Standardizing means
Let us say the result, E, can be explained by the multiplication of two factors, B1 and B2' and one
wishes to know how much one factor influences the result when the other factor is held constant
(i.e., the calculation is standardized).
In a note to Table 5.4.1, it is indicated that the proportion of women employed in manufacturing
in 1979 was somewhat larger in the County of Ringkøbing than in the rest of Denmark. The
differences in the proportion of women employed can partly be explained by the differences in
manufacturing structure and partly by the differences in the proportion of women in the various
manufacturing branches. The data can be clustered using the proportion of women employed in
the individual manufacturing branches for Denmark as the standard, as in Table 5.4.1. It is
calculated, then, how many women would be employed in the County of Ringkøbing if the
manufacturing groups in Ringkøbing employed the same proportion of women that the
manufacturing groups in Denmark employ in general. This calculation yields 11,635 women,
corresponding to a proportion of woman equal to (100 x 11,635/35,415) 32.9. But the actual
share of women was 29.9.11 That is, the proportion of women employed in the individual
branches of manufacturing in the County of Ringkøbing was lower than that for those branches
in the rest of Denmark. It is interesting to note, however, that the County of Ringkøbing is the
industrial centre for industries in which the share of women employed was (is) high, for
example, the textile, and leather industry.
11 Statistisk Årbog 2006, SÅ 2005 Table 114.
Components of the standardized mean
B1
B2
E Result
53
Table 5.4.1. Proportion of women employed in manufacturing, by sector, 2005.
1. Number of employed women x 100/total employed. 2. The share of women employed in all of Denmark multiplied by the number of men and women employed in Ringkøbing/100. Source: Statistikbanken.dk
An analogous example could be made for the changes in energy intensity. Using data from 1973
through 2005, energy intensity is measured by energy consumption divided by GDP in constant
basic prices. The change in the total intensity can partly be explained by changes in the intensity
for various industries and partly by the shift in the relative significance of industries, as measured
by their contribution to GDP at basic prices. GDP at basic prices, distributed by industries in
1973, can be chosen as the standard. By multiplying this standard by the energy intensities in
2005, the energy consumption can be calculated as if there had not been changes in industries'
contribution to GDP at basic prices from 1973 to 2005. The calculated energy consumption for
2005 is then compared with the actual energy consumption for 1973, and if the calculated
consumption is less than the actual, the energy intensity for industries has fallen during the
period. The calculated consumption can be expressed in per cent of the actual consumption, and
the conclusion could be that energy consumption has fallen by x% as a result of the industries'
falling energy intensity.
It might be easier to sketch the explanatory factors and the calculated results in the following:
Proportion of women
Denmark1
Total numbers employed in county of
Ringkøbing
Calculated proportion of employed women in the country of Ringkøbing2
Mining and quarrying Food, beverages and tobacco Textiles and leather Wood products, printing and publication Chemicals and plastic products Other non-metallic mineral products Basic metals and fabric metal products Furniture, manufacturing n.e.c.
13.0 40.6 54.0 32.3 41.6 18.5 23.8 33.2
78 6287 3626 5545 2088 898
14095 2798
10 2550 1959 1792 869 166
3360 929
Total 31.4 35415 11635
54
By using the above diagram, it should be clear what the differences are and which factors are
responsible for these differences. When comparing the results in the upper row, the standard is
GDP at basic prices distributed by industries in 1973. The difference in results should, then, be
assigned to the other factor, here energy intensity for the various industries. Energy intensity
could also be used as the standard and a calculation could be made for how much energy
consumption changed as a result of the development in GDP at basic prices.
A corresponding diagram can be worked out for the proportion of women employed in
industry in the County of Ringkøbing and in Denmark in general.
An analogous calculation of standardized means can be made using population statistics. The
total fertility rate is dependent on both the inclination of women to give birth as well as the
number of women in the child-bearing years. When measuring women's fertility, you would need
to neutralize the age structure. Said in another way, the age structure must be standardized when
illustrating the development in fertility. This is done by calculating total fertility for 1000 women
going through the child-bearing years. This measure is, then, independent of the number of
women in the child-bearing years.
The national accounts contain a standardized share of wages. The total wage share (total
wages/GDP at basic prices) is dependent on both the wage share for the individual industries as
well as the industry structure. By standardizing the industry structure, the development in the
wage share can be illustrated.
There are many examples in which the final value is the result of the multiplication of two
factors. In all of these examples, standardized calculations can be used to advantage.
Energy intensity
1973 2005
1973 Actual CalculatedGDP at basic prices distributed by industries
2005 Calculated Actual
Diagram for standardized means
55
5.4.1. Price and quantity (volume) indexes
The most frequently used techniques for standardization appear in connection with the
calculation of price and volume indexes. Movements in values depend on both price and quantity
changes. It is interesting to know, for example, if increased domestic consumption is caused by
both increased consumer prices and increased consumption in terms of quantity. If you wish to
isolate the pure price movement, the quantities must be used as a standard. In this section, index
and other related calculations will be demonstrated.
The price index represents a total expression of the movement in prices for several goods or
services. In the following, the discussion is limited to goods only. The problem with index
calculations is how to determine the weights that appropriately represent price movements for
individual goods used in the summary price index. The weight problem is solved in different
ways in the following three, most popular index formulas.
Laypeyres price index: , ,
:, ,
100
i t i oLA i
t oi o i o
i
p qP
p q
×= ⋅
×
∑∑
The budget method: i = l,....,m, p = prices, q = quantity, and Bi,o = the budget share for good i in year o.
The numerator indicates the expenditure on the quantities of m goods bought in the index base
year (year o) valued at the prices in the final year (year t). When this is expressed in relation to
the same quantities valued at the prices of the base year of the index (the denominator), the result
is the price increase for the m goods from year o to year t.12 The Laspeyres index uses the
quantities of the base year of the index as the standard, and this means that this index measures
price movements for a fixed goods combination.
The index can also be calculated using the (equivalent) budget method, in which the price
increase for the individual good is weighted by the share of the expenditure on that good in the
budget for the base year of the index. The greater the weight given the good consumed, the
, , ,,
, , ,
100,
i t i o i o
i o i,oi i o i o i o
i
p p qB B
p p q×
= ⋅ ⋅ =×∑ ∑
56
stronger is the representation of the price increase of this good in the consumer price index.
Paasche price index: , ,
:, ,
100
i t i tPA i
t oi o i t
i
p qP
p q
×= ⋅
×
∑∑
The Paasche price index uses up-to-date weights, which means the weights derive from the
current time period. The calculation of the Laspeyres and Paasche indexes are illustrated in Table
5.4.2.
Table 5.4.2. Calculation of the beer and wine index.
Beer Wine Year q p q p :
LAt oP :
PAt oP
0 100 4 50 10 100 100 1 75 4 75 10 100 100 2 100 4 50 12 111 111 3 135 4 30 12 111 107 4 50 4 100 12 111 117
Table 5.4.2 shows that the two indexes do not react on pure quantity changes, e.g., the first year.
The table also shows that the two indexes are identical when no quantity changes have taken
place, e.g., in the second year. This relationship is evident when the two indexes are identical,
i.e., when qo and qt are identical. The table further shows that the price increase calculated by the
Laspeyres index is larger than that calculated by the Paasche index when consumption of the
good that has become relatively cheaper increases (beer consumption increases relative to wine
consumption; the beer price has fallen relative to the wine price), e.g., the third year.
Normally, consumption of a good will increase relative to the consumption of all other goods
when the price of that good decreases relative to the prices of all other goods. The Laspeyres
index does not take into consideration this substitution that takes place when relative prices
change. This results in a numerator that is too high because prices and quantities relate to
different years. Therefore, the index overestimates the real price increase.
In contrast, the Paasche index underestimates the real price increase (the denominator is too
high). Table 5.4.2 shows, however, that the two indexes "exchange places" when consumption
increases relatively for the good for which the price has increased relatively, e.g., the fourth year.
12 The ca1cu1ation can, of course, be made on a period less than l year. The time dimension given in the example is used for pedagogical reasons.
57
It should be stressed that the Laspeyres index only overestimates, and the Paasche index only
underestimates, the real price increase when normal substitution takes place, ie., away from the
good that has become relatively expensive.
Table 5.4.3 shows the calculation of the price movement from 1996 to 2006 using three
components of the consumer price index, using the Laspeyres formula.
Table 5.4.3. Calculation of the "housing index".
1) Year 2000=100.
25.130
10093
10922.650.747.22
22.681
12122.650.747.22
50.791
11622.650.747.22
47.221996:2006
=
⋅⎥⎦⎤
⎢⎣⎡ ⋅
+++⋅
+++⋅
++=LAP
Source: www.statistikbanken.dk
In an empirical paper, it can often be necessary to calculate partial indexes of the price index.
The calculation of these partial indexes is normally made using the budget formula of the
Laspeyres index in that the prices are indexed and the share of the budget is provided. These
elements are sufficient for calculating the price index. In the budget formula, you only need to
know the relative price (pt/po) and not the individual prices in the two years.
In Table 5.4.3, the weights are not taken from the base year of the index, the year in which the
index equals 100. The weights derive from the values in 2003. When the year from which the
weights are taken for the index lies between the base year and the most current year in the data
series, it is not possible to claim that the index overestimates or underestimates the real price rise
when substitution takes place.
Danmarks Statistik changes the weight basis used in calculating the Laspeyres index on a
continuous basis, and they also change the base year of the index once in a while. If you use a
price index in a paper covering a longer period of time, a linkage between the indexes will often
be necessary. This type of linking is illustrated in Table 5.4.4.
If you want the price index for 2006, using 1990 as the base year of the index, you must first
calculate the index for the year in which the link is being made (2003), with 1990 set equal to
Weight distribution Consumer price index1)
2003, % 1996 2006 Rent housing 22.47 91 116 Electricity and fuel 7.50 81 121 Furniture, furnishings, households service, etc. 6.22 93 109
58
100. Next, the index for 2006 is calculated, with 2003 set equal to 100. Finally, the two indexes
are multiplied, yielding the price rise from 1990 to 2006.
Table 5.4.4. Linking consumer price indexes.
1990 2003 1980=100 177.4 234.6 1990=100 100 132.3 (234.6*100/177.4) 2003 2006 2000=100 107.0 112.3 2003=100 100 104.9 (112.3*100/107.0) 1990 2006 1990=100 100 138.8 (132.3*104.9/100) Source: www.statistikbanken.dk
Since the Laspeyres index normally overestimates the real price rise and the Paasche index
normally underestimates it, it seems natural to calculate an index that lies between the two
indexes. One such intermediate index is the Fisher index, which calculates a geometric average
of the two other indexes.
Fisher price index: : : : FI LA PAt o t o t oP P P= ×
The Fisher index is used for calculating export and import price indexes in trade statistics. Data
from these statistics are used for creating Table 5.4.5.
Table 5.4.5. Denmark's import of new petrol-driven cars from Germany, by motor size, 2000-2005.
2000 2005 p05 * q00 p00 * q05 Quantity Value
1000 DKK
Price (DKK)
Quantity Value 1000 DKK
Price (DKK)
1 * 6 3 * 4
1 2 3 4 5 6 1000 DKK ≤ 1000 cm3
1000-1500cm3
≥ 1500 cm3
88 4,127
446
4,115 214,668 153,552
46,762 52,015
344,287
131 9,493 1,071
3,637 492,911 389,033
27,770 51,924
363,242
2,444 214,289 162,006
6,126 493,782 368,731
Total 4,661 372,335 79,883 10,695 885,582 82,803 378,739 868,640
72.101100335,372739,378100
0000
000500:05 =⋅=⋅
⋅⋅
=∑∑
qpqp
P LA
59
95.101100640,868582,885100
0500
050500:05 =⋅=⋅
⋅⋅
=∑∑
qpqp
P PA
84.10195.10172.10100:0500:0500:05 =⋅=⋅= PALAFI PPP Source: www.statistikbanken.dk
The table shows a price increase of cars with motor sizes above 1500 m3 and a price decrease
elsewhere. If normal substitution occurred during the period, the import of cars with motor sizes
above 1500 m3 would fall relatively. This is not the case. The import of cars with motor sizes
above 1500 m3 makes up about 10% of the import measured in quantities in 2000 as well as in
2005. The import of small cars decreased from 1.8% to 1.2% of the import measured in
quantities even though the price decreased relatively much. An abnormal substitution has taken
place. The Paasche index, therefore, increased just a little bit more than the Laspeyres index.
A price index based solely on the import of new cars in total can be calculated using the
numbers from Table 5.4.5: 82,803 x 100/ 79,883 = 103.66. This index shows a larger price rise
than the other indexes, which analytically are the best. An "in total" price index is actually not a
proper price index because it is influenced also by quantity changes. The index is based on an
average price (total import value/quantity of imports) for one year divided by average price for
another year. Therefore, the quantities, as well as the prices, are from two different years.
This price rise of cars with motor size above 1500 m3 may not be real. Given the product
groups in the trade statistics, there has, perhaps, been a shift toward the most luxury cars. In
other words, no account has been taken for a shift within the individual product groups. The table
illustrates the quality problem in index calculations based on these statistics. The price rise can
be based on both price increases and quality changes.
Totally analogous to these price indexes, there are three corresponding quantity or volume
indexes. In these indexes, the prices are standardized:
, , , ,
: : : : :, , , ,
100, 100, i o i t i t i t
LA PA Fi LA PAi it o t o t o t o t o
i o i o i t i oi i
p q p qQ Q Q Q Q
p q p q
× ×= ⋅ = ⋅ = ×
× ×
∑ ∑∑ ∑
Using base year prices is a problem for periods far apart from the base year. Therefore, one
may give preference to chain indices as a measure of real changes in quantities.
60
Chain Laspeyres' volume index: :0 1:0 2:1 : 1.......LA LA LA LAt t tQ Q Q Q −= × × ×
The quantity indexes show the real changes in quantities, or the changes given constant prices.
Using the various index formulas, it can easily be shown that:
, ,
: : : : : : :, ,
100i t i t
PA LA LA PA Fi Fiit o t o t o t o t o t o t o
i o i oi
p qV P Q P Q P Q
p q
×= ⋅ = × = × = ×
×
∑∑
V is a value index that relates the value in year t to the value in year o. If you know V as well as
a price index, the quantity index can be easily calculated. For example, the Fisher price index is
calculated in Table 5.4.5 to be 186.3. V can be calculated in the following way: 885,582 x 100 /
372,335 = 237.85. The Fisher quantity index is then: 237.85 x 100 /101.84 = 233.55. If you find
the quantity change by considering only the number of cars (each car counting as l), the result is:
10,695 x 100/46.61 = 229.46, which lies below the result measured using the quantity index.
In the system of national accounts, the material is reported in both constant as well as current
prices. Dividing the value index by the quantity index produces the (implicit) price index. You
can calculate this implicit price index for many of the indexes presented in the national accounts.
If you know the value index and a price index, the quantity index can be calculated by
dividing the value index by the price index. Such a calculation is called deflating the index.
There is often a need to deflate in a paper because it is the real change or movement that is of
interest. If you have, for example, hourly earnings, income, or private consumption in current
prices and a Laspeyres price index, real quantity movements can be calculated, as in Table 5.4.6.
Table 5.4.6. Index of average hourly earnings in Danish manufacturing B nominal and real changes, 1989-2005.
.
1989
2005
Index of average hourly earnings:
1980=100 181 323 1989=100 100 178
Consumer price index:
1900=100 4142 5790 1989=100 100 140
Index of real hourly earnings (quantity index):
1989=100 100 1271
1) 178x100/140.
Source:www.statistikbanken.dk
61
Deflating must be made with careful thought! It is necessary, when deflating, to use a price index
that is relevant for the given relationship. The deflating used in empirical papers is often
unsuitable. For example, export values in the trade statistics are deflated using the consumer
price index. The calculated result cannot be interpreted meaningfully because consumer prices
are influenced by the price movements of goods and services that are not at all inc1uded in the
export of goods and services. Consumer prices are influenced also by indirect taxes, which also
do not affect export goods.
If the weight of goods in the consumption basket of employees in manufacturing industries
differs substantially from the weights used in the consumer price index, deflating with the
consumer price index can be a problem. If the prices for the goods weighted heavily in the
consumption basket of employees in manufacturing industries have increased relatively greatly,
deflation with the consumer price index will overestimate the movement in the real hourly wage
since this division is with a price index that has risen too little with respect to the consumption
choices of employees in manufacturing industries. A corresponding problem applies to retired
individuals. If the value of goods consumed by retired individuals is deflated using the consumer
price index, the result will most likely be incorrect since retired individuals have another
consumption pattern than that of the population in general.
Often, a price index will be used to deflate another price index to illustrate the relative or real
price movement. For example, a price index for oil can be deflated with a price index for
exported manufactures. Such an index can be interpreted meaningfully in that it shows the
movement in purchasing power for a barrel of oil measured in manufactured goods.
5.5. Analyzing time series data
Many economic indicators, such as GDP, are reported for a given time period. When these
values are available over several time periods, a time series is produced: observations over time
for a given variable, where the time distance between observations is identical. For example,
GDP is often discussed as if it were only available on a yearly basis, that is, that the time series
consisted of only one observation per year. However, some time series are available on a
quarterly, monthly, weekly, and daily basis, depending on the frequency with which the data is
collected.
62
For some economic data, the activities behind the data are carried out over a period of time,
and the measurement of the data concerns the activity for that entire period. GDP is one example
of this type of indicator. There will often be a lower bound with respect to the length of period.
For example, Statistics Denmark publishes quarterly data for GDP in addition to the annual data.
For other variables, you can imagine that observations relate to a particular time period, for
example, bond interest rates, currency rates, etc. where the price formation through "electronic
trades" occurs continuously. Data for economic variables, such as currency rates, will typically
appear as daily data. That is, an average of the day's prices or a price at a particular time (for
example, the currency rate at 12:00 p.m.) might form the basis for the respective observation
value.
5.5.1. The elements of a time series
In a time series, there is often dependency between the observed value in the current time period,
Xt, and the value in the previous period, Xt-1. This dependence must be analyzed when the
movement of a given economic variable is estimated over time. In general, it is likely that
fluctuations in economic time series can result from movements in one or more of the following
components:
Trend (T): Long-term movements in the respective variable. It can be either positive or
negative, i.e., the values of the variable are either increasing or decreasing in general.
Cycle (C): Movement over the course of the business cycle, i.e., normal1y over more than one
year, where peaks and troughs in a business cycles cause cyclical swings in a number of
economic variables. These swings are not necessarily 'even', i.e., the swings are not necessarily
identical in magnitude nor in duration. Swings that typically last several years can be difficult to
distinguish from a possible trend (you can work with a trend/cycle component; refer to the
following section of seasonal corrections).
Seasonal swings (S): Movements that repeat themselves within a given period (typically a
year), i.e., a particular pattern in variation emerges over a time period that is observed in other
periods too. A pattern observed in a time series based on monthly data will repeat itself every
year. For example, sales of certain vegetables are always greatest in the summer months, car
sales are largest in the spring months, etc. For these examples, you could possibly work with
63
quarterly or monthly data and still observe the seasonal variations over the year.
Moreover, you should consider the number of work days per year when economic conditions
are being analyzed. For example, if you have a time series with monthly data, you might choose
further to correct for the uneven number of (work) days in each month, given that the number of
Sundays and holidays, etc. are unevenly distributed over the months.
Irregular swings (I): Coincidental swings (noise) that appear after consideration is made for
the T, C, and S components and get allocated to residual variation. These stochastic fluctuations
are unpredictable and can be due to political-economical interference in the economy, natural
catastrophes, etc.
A time series (Y) can be modelled with the help of the T, C, S, and I components in two ways:
Multiplicative formula: Y = T x C x S x I
Additive formula: Y = T + C + S + I
In analyzing economic data, the multiplicative model is often used. For example, when the trend
is increasing, seasonal swings of 10% in a given month mean that the absolute swing will
become larger and larger over time. This can be totally reasonable considering that the increasing
trend implies increasing levels for the respective variable. On the other hand, the additive model
implies identical, absolute seasonal swings. This can be reasonable in certain cases, for example,
with the seasonal correction of unemployment numbers.
The purpose of time series analysis is to estimate the dynamic or time structure in the data of
interest, i.e., to divide the time series up into the above stated, possible components. It is sensible
to start with a graphical analysis of the time series, i.e., construct a figure with the observed
values as a function of time, and to make a first assessment. If a given time series is valued in
current prices, the data must be deflated, because it is normally the real change in the data that is
of interest.
The following Section 5.5.2 presents a discussion of the calculation technique for the so-called
moving average, which can be used in connection with the determination of the trend component
in the above-mentioned models. The moving average is used also as a central element in the
seasonal correction of data, which is treated in Section 5.5.3. Finally, the analysis ends with a
discussion of the T and C components of the model in Section 5.5.4.
64
5.5.2. Moving averages
A method for smoothing time series consists of the calculation of a so-called moving average,
where the idea is to modify a given period's observation using an average of the time-related
observations just prior and after that in focus. Using this method can make it easier to determine
a possible trend in the time series because more short-term, inc1uding coincidental swings, are
smoothed out. The method for calculating can be illustrated using numbers for GDP at factor
prices (1966-1980) and a moving average that here is based on five terms and calculated as:
Y1' = (Yt-2 + Yt-l + Yt + Yt+1 + Yt+2) /5
The first value in the 5-term moving average can be calculated for 1968 (average of 1966-1970),
the value for 1969 becomes the average of the next five periods, etc., and the last calculation
taken is for 1978. If only data for the period 1966-1980 is available, values in the beginning and
at the end of that period will be missing. In the case here, data for the years after 1980 is
available, and therefore the values for 1979-1980 can be calculated. Figure 5.5.1 shows the result
when the period is extended from 1966 to 2002.
Table 5.5.1. Agriculture's contribution to GDP at factor prices (in millions of 1995-DKK) and the 5-term moving average, 1966-1980.
Original time series Yt
5-term moving average
1966 13662 1967 13473 1968 13469 13283 1969 13919 13224 1970 11894 13281 1971 13363 13183 1972 13760 13426 1973 12981 13775 1974 15133 13551 1975 13636 13683 1976 12247 14038 1977 14416 13919 1978 14756 14201 1979 14540 1980 15048 Source: Statistikbanken.dk/NAT07 (Statistics Denmark).
65
With yearly data, like that used in Figure 5.5.1, a 5-term moving average will smooth out all
swings with respect to those 5 years, and this results in a c1earer picture of the long-term trend in
the time series. For example, the development in the agricultural sector during the time Denmark
joined the EU in 1972 can be clearly seen; a prior falling trend in agriculture's contribution to
GDP at factor prices was reversed rather strongly.
Figure 5.5.1. Agriculture's contribution to GDP at factor prices (in billions of 1995-DKK) and the 5-year
moving average, 1966-2002.
8
12
16
20
24
28
32
36
1966 1970 1974 1978 1982 1986 1990 1994 1998 2002GDP GDP95
Billion kr.
Note: GDP95 is a 5-term moving average of real GDP (1995-DKK).
Source: Statistikbanken.dk/NAT07 (Statistics Denmark).
Individual observations can strongly influence a moving average, for example, a large fall in
agriculture's contribution to GDP 1969-1970 is inc1uded in the calculations for all the years
1968-1972. This can be c1early seen in the figure. A correction for this could be to use a
weighted moving average where different weights are used for the yearly values.13 That is, the
greatest weight is given to Yt, and declining weights are given to the remaining values.14 For
example, an extension of the calculation period to a 7-year moving average would in this case
not greatly change the already shown 5-year average.
When using an equal number of periods in the moving average, you must use a technique that
13 For the earlier shown average all the yearly values have identical weights (0.2 in this case).
66
centres the calculated average around the statistic in focus. Suppose you wish to smooth out a
time series based on quarterly data for the period 1990-1992. A 4-term moving average seems
most reasonable, since one observation from each of the four quarters will be used in the
calculation of a given value in the moving average,15 e.g., an average where the calculation uses
data from the third quarter 1990 through and inc1uding the second quarter 1991. In this last
mentioned example, the calculated value will reflect a value for the middle of the calculation
period, which is January 1, 1991. To obtain a value in the middle of a quarter, however, the
calculation can be made using 5 terms and letting the first and last periods enter with a weight
equal to 0.5. That is, the first quarter 1991 is calculated using observations from the third quarter
1990 until and inc1uding the third quarter 1991 (using the weights 0.5, 1, 1, 1, 0.5). This results
in a centred moving average.
5.5.3. Seasonal correction
In using time series data where the distance between the observations is less than one year, for
example, quarterly, monthly, or daily data, it may be necessary to further process the data to
estimate the potential seasonal elements (S-components from the earlier used model). You can
carry out seasonal correction using a reasonably simple calculation technique, which will be
illustrated in the following sub-section. The purpose is to remove the more or less systematic
swings, for example, over the year when they often are irrelevant for an analysis of fundamental
long-term trends. On the other hand, the purpose can also be to establish the pattern of seasons.
With respect to the course of correction over the months of the year, a calculation is made for the
value of a given month as if it were a normal month. Seasonally corrected data will be a big help
in judging actual business cyc1e developments. An overview of the time series data, on which
Statistics Denmark makes seasonal corrections and publishes, is found via the home page
(www.dst.dk).
14 The low values for agriculture's GDP in 1970 will then enter with a smaller weight in the calculations for the surrounding periods, but concerning 1970, the observation will enter with larger weight than before. 15 The length of the calculation period implies here that all swings within a year (four quarters) are smoothed out, that is, the seasonal movements are eliminated which is why the method is often used in connection with seasonal corrections. The 1ength of the period can in this way be uniquely determined from the formula (e.g., seasonal correction), where the length of the moving average in other contexts (e.g., the five terms in Figure 5.5.1) must be determined from more subjective considerations.
67
Year to year comparisons and moving averages
A very simple and often used method for estimating, for example, a given monthly value is to
compare that value with the value from the same month the previous year. By comparing values
from the same month over the period of analysis, the seasonal element ought to be removed. But
the method is very crude and sensitive to incidental movements in the months under
consideration and changing growth rates in the trend.
This can be illustrated with data for GDP and the seasonally-corrected GDP (quarterly figures,
calculated by Statistics Denmark, DS), cf. Figure 5.5.2, which shows these two time series for
1992-1994. If you estimate GDP in the third quarter 1993 in the context of the corresponding
quarter from the year before, you should conc1ude that there is a decreasing trend in GDP. If you
look, however, at the development from the second to the third quarters 1993 in the seasonally-
corrected series, you would reach another conclusion, namely a stable development in the data.
Without these seasonally-corrected time series, therefore, you would wind up drawing the wrong
conclusions in certain cases.
Figure 5.5.2. GDP and seasonally-corrected GDP (billion 1995-DKK), 1992-1994.
225
230
235
240
245
250
255
260
92:1 92:2 92:3 92:4 93:1 93:2 93:3 93:4 94:1 94:2 94:3 94:4
GDP95 GDP95S GDP95(4)
Billion kr.
Note: GDP95S is the seasonally corrected series of the original data (GDP95). GDP95(4) is a 4-term centred moving
average.
Source: Statistikbanken.dk/NAT07 (Statistics Denmark).
68
Another simple method that can eliminate or reduce seasonal swings is the calculation of the
moving average, as earlier described. With quarterly data, a 4-term moving average (12 terms
for monthly data) will smooth out swings over a year, that is, remove the seasonal fluctuations.
To illustrate this, a calculation is made on the indicated GDP figures. Here, quarterly data for
GDP is given for a period longer than 1992-1994, which is why a 4-term centred moving
average can be easily calculated for all the quarters in the designated period, cf., GDP95(4) in
Figure 5.5.2. In this case, a 4-term moving average apparently leads to a smoothing of both
seasonal and irregular swings that is more powerful than the seasonal corrections made by
Danmarks Statistik.
Seasonal indices and the X-11 procedure
A seasonal index for the year's 12 months states the seasonal swings over the year in index form
and is calculated so that the index's average value is 100. A value for July equal to 96 means
that, for that month, the observations are expected to lie 4% under what the trend and cycle
components are in general.
To establish a seasonal index, you must first estimate the seasonal component in the time
series, which is not so easy and can only be approximately determined, as ought to be obvious
from the previous discussion. In the following, a multiplicative relationship is assumed among
the T, C, S, and I components, and seasonal correction, etc. will be illustrated using monthly
data of retailers' sales of food, beverages, and tobacco.16 This is a quantity index, which is why
deflating is not necessary. The calculations are made for the period 1990:01-2003:10, and to
make the resulting construction of the method easier to follow, individual data from the time
series as well as some of the results from the calculations are shown in Table 5.5.2.
16 Statistics Denmark's seasonal correction of this time series is based also on an assumption of a multiplicative relationship.
69
Table 5.5.2. Calculation of the seasonal index for food, beverages, and tobacco.
1990 1991 2002 2003 Yt Yt' YSI Yt Yt' YSI … Yt Yt' YSI Yt Yt' YSI
Monthly ave. of
YSI
Seasonal index
Jan 90.20 92.70 100.99 91.79 99.50 107.69 92.40 104.92 109.43 95.87 92.00 92.02Feb 86.00 87.60 101.38 86.41 93.80 107.95 86.89 97.04 109.57 88.56 88.06 88.08Mar 98.80 103.40 101.53 101.84 110.53 108.10 102.25 103.73 109.53 94.70 99.42 99.44Apr 99.10 97.80 101.70 96.16 104.92 108.32 96.86 110.74 109.63 101.01 99.07 99.09May 106.70 109.20 101.89 107.17 114.74 108.58 105.67 115.17 104.01 104.03Jun 102.90 101.70 101.98 99.73 107.07 108.60 98.59 107.83 100.64 100.67Jul 101.70 100.10 101.59 108.10 102.08 105.89 110.74 108.78 101.80 113.98 103.80 103.82Aug 102.30 100.28 102.02 105.10 102.27 102.77 114.52 109.14 104.93 114.63 102.15 102.17Sep 93.80 100.53 93.30 94.70 101.96 92.88 102.43 108.99 93.98 101.46 94.64 94.66Oct 97.50 100.67 96.85 100.80 101.80 99.02 109.13 108.95 100.16 112.47 98.48 98.50Nov 100.00 100.72 99.28 101.20 101.95 99.27 110.96 109.22 101.60 98.40 98.43Dec 121.00 100.78 120.07 121.80 102.05 119.36 124.34 109.26 113.80 119.06 119.08Sum 1199.73 1200.00Note: Yt indicates the quantity index for sales of food, beverages, and tobacco (1990=100). Yt' is a centred 12-month moving average and YSI =(Yt/ Yt') x 100. Source: Statistikbanken.dk/DETA2 (Statistics Denmark).
The original time series, as well as the 12-term centred moving average, which is assumed to
smooth out seasonal swings, is shown in Figure 5.5.3.
There is a clear seasonal pattern in retail sales of food, beverages, and tobacco, where the
largest sales occur in December. This pattern disappears totally in the moving average values,
which show the course of the trend and cycle.
Figure 5.5.3. Quantity index and the 12-term moving average for sales of food, beverages, and tobacco, January 1990-December 1995.
80
90
100
110
120
130
140
90:1 90:7 91:1 91:7 92:1 92:7 93:1 93:7 94:1 94:7 95:1 95:7
Q Q(12)
Note: Q(12) is a centred 12-term moving average of the quantity index of the sales of food etc. (Q).
Source: Statistikbanken.dk/DETA2 (Statistics Denmark).
70
If you assume that the moving average only contains the trend and cycle components, the
following division17 of those components in Figure 5.5.3 can be shown as:
YSI = (T x C x S x I) / (T x C)
The total index is divided by the moving average.18 The result becomes an index (time series),
cf. YSI in Table 5.5.2, that − apart from the irregular components − only contains the seasonal
component. Because the trend and cycle swings are eliminated in the new index, for the most
part, the average for the 12 months in each of the years is approximately 100.
Here the calculations are carried out for January 1990 to October 2003, which means thirteen
observations for each month. Because YSI can contain irregular elements (I), an average is
computed on the basis of these observations for each month so that the final result becomes an
index that only represents the S component;19 cf., the seasonal index in Table 5.5.2 and Figure
5.5.4.
Figure 5.5.4. Seasonal index of sales of food, beverages, and tobacco (constant prices).
80
90
100
110
120
130
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Source: Table 5.5.2.
17 An additive relationship between T, C, S, and I means using an addition/subtraction similar to the procedure here. 18 You can multiply by 100 throughout the calculation if you want to maintain an index level in that form. The sketched method is called the "ratio-to-moving-average method". 19 The average for the year comes close to 100 but can deviate a little (due to rounding of numbers and incomplete smoothing of all non-season determined components), in which case the index is level-adjusted as in Table 5.5.2, where 1200 / 1199.7 is multiplied by the monthly average of YSI.
71
Given this seasonal index, the original time series can now be seasonally corrected. By dividing
the total index by the seasonal index, the movement is cleared of seasonal swings. This is
shown in Figure 5.5.5, together with that of Statistic Denmark's published seasonally-corrected
quantity index for the same data.
Figure 5.5.5. Seasonally-corrected quantity index of sales of food, beverages, and tobacco, "calculated" and DS, 2000-2001.
90
95
100
105
110
115
120
125
130
J F M A M J J A S O N D J F M A M J J A S O N D
Original data DS Calculated
2000 2001
Note: "Calculated" is indicated by the seasonally-corrected index as given in the text. Danmarks Statistik's
seasonally-corrected index is shown as well as the original uncorrected time series.
Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and Table 5.5.2.
For clarity's sake, only the values for 2000-01 are shown. The calculated seasonally-corrected
index deviates a little from that of Statistics Denmark's published index, partly because the
method for calculation is relatively simple, but also because the calculations here are carried out
on data that only covers the period 1990-2003.
Seasonal correction is made at Statistics Denmark (and usually also applied at other statistical
agencies) with the help of a programme called X-11 – or X-12, which is the latest version of the
programme - developed by the U.S. Bureau of the Census in the 1960's. This is capable of
seasonally correcting quarterly and monthly data. The programme separates a time series into a
trend and cycle component (TC), a seasonal component (S), and an irregular component (I). All
unevenness in the number of work and business days over the year can be corrected for.
72
The calculation procedure builds on the technique using moving averages, a centred 12-month
moving average for establishing the T-C components. This is used − as was earlier shown − to
obtain a first estimate of the S-I components. Going through several iterations of calculations,
using various forms of the moving average, yields an adjusted (final) estimate of the season
component. In this connection, an attempt is made to isolate the I component, and extreme
values (outliers) are given less weight so that their influence is reduced. Given the output
possibilities in X-11, the original time series can be divided up into a trend-cyclical component,
a seasonal component, the irregular component, and of course, a seasonally-corrected time
series.
As illustration of the last, the time series used in Figure 5.5.3 is seasonally-corrected with the
help of X-11. Only the period 1990-2003 is used, and a correction has not been made for the
number of work days, which is why the results will deviate from the earlier shown seasonally-
corrected numbers from Statistics Denmark. For clarity sake, only the results for 2000-01 are
presented again, cf., Figure 5.5.6.
Figure 5.5.6. Seasonally-corrected quantity index of sales of food, beverages, and tobacco, "calculated"
and X-11, 2000-01.
103.2
104.4
105.6
106.8
108.0
109.2
110.4
111.6
J F M A M J J A S O N D J F M A M J J A S O N D
X11 Calculated
2000 2001
Note: The calculations are carried out using the time series programme SAS/ETS. The index "calculated" is as
shown in Figure 5.5.5., and X-11 is calculated on data covering the period 1990-2003 (corresponding to the data set
that the original 12-month average was computed from).
Source: Statistikbanken.dk/DETA2 (Statistics Denmark), and the X11 procedure.
73
There is a nice merging between the result from X-11 and the earlier manually-computed index
− the differences have no practical significance. Correspondingly, the seasonal index produced
by X-11 (not exhibited) is nearly totally identical with that presented in Figure 5.5.4.
5.5.4. Trends and cycles
An example of a trend is seen in Figure 5.5.7, where GDP in constant factor prices is shown
from 1900 to the present. For certain sub-periods, the course is somewhat smoothly increasing,
that is, at a constant growth rate over time. As shown earlier in Section 5.5.2, a development
that can be described by an exponential function − as that in Figure 5.5.7 resembles
approximately − will have a constant growth rate. It is seen in the figure here that, by applying a
logarithmic scale to GDP, the graph becomes partly linear − the slope is determined by the GDP
growth rate. For the indicated period, there are cyclical swings and reactions to certain events,
such as wars and oil price shocks. When yearly data is used, seasonal variation will be
eliminated (which can be an advantage in that there is one less component to isolate in a given
time series).
The trend in a time series can be of different types − the exponential function has already
been mentioned. A second type is a simple, linear relationship over time. A third type is the
logistical curve, which has an S-shape.
Figure 5.5.7. GDP at factor prices, 1900-2002 (in billions of 1995-kroner).
0100200300400500600700800900
10001100
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Kr. billion
74
10
100
1000
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
Kr. billion (logarithmic scale)
500
50
Source: Sv. Aa. Hansen: Økonomisk vækst i Danmark (Economic Growth in Denmark) Copenhagen, 1974;
Adam's databank; Statistikbanken.dk/NAT07.
If a time series consists of, for example, monthly data for which a seasonal index has been
computed, the trend and cycle components can be established based on this seasonal index. This
can be illustrated with data used earlier for sales of food, beverages, and tobacco. In order to
make a judgement about the trend, the time period must be relatively long, and in the present
case data from 1990 has been used.
First, a seasonal correction is made on the entire series by dividing the monthly values for the
individual years by the seasonal index, as shown earlier in Figure 5.5.4. The result appears in
Figure 5.5.8.
The seasonally-corrected values exhibit significantly fewer fluctuations than the original
series. At the same time, it can be seen that the projection of a trend forward from the period
1990-2003 appears less favourable since the increasing trend here does not seem to apply to the
future periods (the upper part of Figure 5.5.8).
The seasonally-corrected curve contains only T, C, and I components. The second step in the
analysis is to evaluate the trend, which here has led to the assumption of a constant growth rate
over the whole period (exponential growth). With the help of regression analysis,20 this trend is
determined from the seasonally-corrected values, which are sketched in the lower diagram of
20 Under the assumption about exponential growth, the trend is determined as the curve for which the sum of the squared distance between the observations and the curve is minimized.
75
Figure 5.5.8.
Figure 5.5.8. Quantity index of sales of food, beverages, and tobacco, January 1990 - October 2003 (1990=100).
Original data
84
90
96
102
108
114
120
126
132
90:1 91:1 92:1 93:1 94:1 95:1 96:1 97:1 98:1 99:1 00:1 01:1 02:1 03:1
Seasonally-corrected data
92
96
100
104
108
112
116
90:1 91:1 92:1 93:1 94:1 95:1 96:1 97:1 98:1 99:1 00:1 01:1 02:1 03:1
Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and own calculations.
A similar exercise can be done with the data from Figure 5.5.3 where a 12-term centred
moving average did seem to smooth out the seasonal pattern. With data for the period 1990-
2003 the (seasonal) adjusted data and the trend are exhibited in Figure 5.5.9. One interpretation
of the cycles or fluctuations around the linear trend will be that this illustrates the business cycle
component of the original series.
76
Figure 5.5.9.Trend and cyclical components of the sales of food, beverages, and tobacco (Index,
1990=100).
100
102
104
106
108
110
112
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and own calculations.
With this, the decomposition of the original time series is finished. But you must still remember
that behind the calculations are some (self chosen) assumptions, which is why the final
decomposition may not represent the "true" picture. This means that the trend in the last figure
can be rightfully criticized, because it does not explain much of the variation in the time series.
A corresponding trend with a constant growth rate would be a much better fit for the GDP data
in Figure 5.5.7.
6. Making commentaries
All tables and figures included in the paper must be discussed in the paper. The purpose of this
section is to provide some guidelines as to what types of comments are appropriate.
The paper must be written in clear language that is easily read and does not use complicated
and twisted sentence construction. Banal language like slang, catchwords, or clichés are to be
avoided, just as the "I" and "we" form, as well as other diction, should be avoided. The
following sentences have been taken (and translated) from previous empirical papers as
examples of what has not worked, mainly because of the lack of substance in the words:
77
o Everyone knows that sales have been nothing, but ideal until now.
o It has always been a popular, vogue, fashionable, favourite phenomenon to compare us
with each other here in Scandinavia.
o Now it all hangs together ...
o One thing is imports and exports − a relatively positive experience − but what about
interest rates?
In addition, you must avoid exaggeration and assertions that are not covered in the material
used in the paper. The following sentences have been taken (and translated) from previous
papers as examples of what has not worked, mainly because of the lack of documentation
behind the statements:
o We don't have to go far into the future before energy becomes a scarce commodity.
o It is a known fact that youth are more environmentally-minded than the elderly.
o Bank failure on the Faro Islands has been an everyday affair.
o The development has been full of swindles and diverse allegations.
With the help of few well chosen sentences, comments must point toward the patterns that the
tables and figures reflect, without repeating the data itself in the text. An example of an
appropriate comment concerning Table 5.1.3 follows:
Table 5.1.3 shows first that, in the period 1970-2006, a shift occurred in age distribution resulting in relatively more elderly individuals and relatively fewer children and young adults, both male and female. Second, the table shows that, during this period, the population increased. And third, the table shows that there are more females than males among the elderly and fewer females than males among the children and young adults.
The comment is short and precise. The type of material used to analyze the issues under
investigation influences the type of commentaries that should be made. If the focus of the paper
has been, for example, on public sector expenditures, comments should be directed toward the
effect the stated changes in age distribution could have on these public sector expenditures. So
78
the comment should not just be short and correct, it must be relevant to the problem under
investigation.
An important aspect in the comment might be to point toward what might be lacking in terms
of available material (or material adequacy) in light of the chosen formulation of the statement
of the problem. The material is adequate for the statement of the problem when it allows a
substantial analysis of the problem.
Material inadequacy can also result from a lack of material concerning explanatory factors. If
a decisive explanatory factor has not been accounted for in the analysis, you will most likely
make incorrect conclusions, as earlier mentioned. Therefore, you should include in the comment
the lack of material for an important explanatory factor, if this is the case. In general, the points
in the previous discussion of causal analysis are all relevant for evaluating the adequacy of the
material used in the paper.
In conclusion, a good commentary
1. is written in precise and concise language − remember to use correct punctuation.
2. highlights the patterns in the material without a long-winded discussion of the individual
elements.
3. to a greater or lesser extent, addresses the adequacy of the material with respect to both
conceptual understanding and as well as a lack of material.
4. contains assessments of the explanatory value of the included material seen in the context
of the statement of the problem and therefore contributes to continuity between the
sections.
7. Construction of the report
This section covers the formal demands, not previously mentioned, for preparing the empirical
paper.
The paper begins with a title page. After the title page comes a table of contents, which
overviews the sections included in the paper, presenting the section title, number and the page
on which the respective section begins, cf., the table of contents for these guidelines.
The first section is called the introduction and is used for a discussion of, and a justification
79
for, the chosen statement of the problem and of the chosen delimitations. A start in setting the
delimitations might be the defining of the central concepts. It should be pointed out that you
should not bother to define concepts that the audience is expected to be familiar with. The
introduction should also be used to point out aspects of the problem that could possibly be
relevant/interesting, but which are not to be treated.
The introduction binds the succeeding sections together in that these sections present relevant
material that is first introduced in the introduction. As mentioned, the statement of the problem
is the control mechanism for the succeeding phases of work. The introduction is , therefore, the
control mechanism for all succeeding sections of the report. It should be emphasized that the
introduction must not be a verbalization of the table of contents, and you should not start by
saying that the purpose of the paper is to give an account of that which stands in the title. This
ought to be obvious and is, therefore, unnecessary to mention. Finally, it should be mentioned
that data material does not normally appear in the introduction.
In a paper 15 pages in length, where the choice of method, etc., does not require an in-depth
discussion, the introduction will typically fill one page maximum.
In the sections following the introduction, the statement of the problem is addressed using the
collected data and information. Normally, the base material is located in the second section. The
remaining sections are used, then, to account for the development in this material. Both sections
and sub-sections can be used. Every section treats a sub-problem of the statement of the problem
and will, as a rule, comprise at least one-half of a page. The sections must follow each other in a
logical order and with a reasonable weight, determined relative to the problem at hand. The order
and weighting given to the paper is given large consideration in the evaluation of the paper. A
sensible weighting involves also the choice of which material is to be visualized and in what
form it is to be visualized. Note, it is by exception that data already presented in one visual form
is again presented in another. For example, rather than treating the same data in both figure and
table forms, you might instead include additional explanatory material and thereby reach a
greater depth in the analysis.
Use short and precise section titles and avoid having tables and/or figures follow immediately
after each other. Comments should be used, instead, to "encircle" the tables or figures that are
being referred to. A section should not, under normal circumstances, begin with a table or
figure, but rather a text. Text and tables and figures are separated with an extra line so there is
80
space between, for example, a table's source and the surrounding text. However, to avoid large
empty spaces (typically at the bottom of the page), it might be necessary to separate comments
and tables or figures from each other in the text.
The final section in the paper is called the conclusion and is used to summarize the most
important conclusions reached in the text. Reading just the introduction and the conclusion
should be enough to give the reader the essence of the report − this is a great advantage for the
busy reader. New aspects or information must not be treated in the conclusion, and all clichés
about what the future might bring should be avoided − they are subjective predictions about
future development that have no basis in the material. The sections must be numbered, and the
section titles must be marked clearly, for example, with underlining or by using bold or italic
type.
After the conclusion comes the reference list, which appears on a separate page and presents
an unambiguous overview of the utilized sources. That the reference list is unambiguous means
that, since each source is unique, each source must be cited with enough information to uniquely
identify the source.
The reference list includes enough information to be able to uniquely identify sources. The
following rules should be used: books are written with author(s), title, location of publisher,
publisher, year − for example, Andersen, T. M, et al.: The Danish Economy, 2. edition, DJØF
Publishing, Copenhagen, 2006. Note that the author' s name is written last name first and that, as
in the example given, only one author's name is written, followed by 'et al.' when three or more
authors are associated with a given text. With two or three authors, the first name appears as
noted above, and the following names appear with first name first. Note further, that the title of
the work might appear in italic type (although style may dictate that it appears in bold).
Periodicals are written with author(s), title of work, title of periodical, year, volume (if any),
issue, and page number(s) − for example, Bentzen, J.: An empirical analysis of gasoline demand
in Denmark using cointegration techniques, Energy Economics, Vol. 16, No. 2, 1994, pp. 139-
143.
Statistics are referenced by organization issuing the statistics, title, year, number – for
example, OECD: Annual National Accounts – Volume 1 – Main aggregates, 2007.
Normally, homepage and database addresses are placed at the end of the reference list.
If there are appendices, they come after the reference list. The appendices contain the raw data
81
and other information that was used to establish the base, comparative, and explanatory material
used and presented in the tables and figures in the text. The appendices should not include
copies of tables from the statistical sources. The appendices might also contain an account of the
calculations used to obtain further representations of the data then used in the text. Presentation
of these results gives the reader the chance for replicating the presented material. A legal text
can also be included in the appendix material. In general, the appendices should be used for the
material that ties the material used in the text back to the original form of the data and for that
material which was not directly used in analyzing the problem. In this vein, the appendix
material is not directly discussed in the text, but perhaps referred to.
References
Adam's databank, Statistics Denmark
Andersen. T.M. et al.: The Danish Economy. DJØF Publishing. Copenhagen 2006.
Danmarks Statistik: NYT, No. 321, 1993.
Danmarks Statistik: Statististisk tiårsoversigt (Statistical ten-year review) (STO).
Danmarks Statistik: Statistisk årbog (Statistical yearbook) (SÅ), 2006.
Danish Energy Agency: Energistatistik (Energy statistics), 2002.
Hansen, Sv.Aa.: Økonomisk vækst i Danmark (Economic Growth in Denmark) Copenhagen,
1974.
Meadows, D, et al.: The Limits to Growth, The New American Library, Inc., 1972. Middle East
OECD: Energy Balances of OECD Countries, 2003 Edition.
The World Bank: World Development Report, 2006.
The World Bank: World Development Indicators.
www.statistikbanken.dk
www.opec.org – Annual Statistical Bulletin 2006.
www.bp.com – Review of World Energy. 2001.