87
Department of Economics Internt undervisningsmateriale K nr. 44 Guidelines for Writing Papers in Descriptive Economics Prepared by Hans Linderoth and Jan Bentzen 2007

Guidelines for Writing Papers in De

Embed Size (px)

Citation preview

Page 1: Guidelines for Writing Papers in De

Department of Economics Internt undervisningsmateriale K nr. 44

Guidelines for Writing Papers in Descriptive Economics

Prepared by

Hans Linderoth and

Jan Bentzen

2007

Page 2: Guidelines for Writing Papers in De
Page 3: Guidelines for Writing Papers in De

Foreword Guidelines for Writing Papers in Descriptive Economics is based upon a rewriting and updating of data in Guidelines for Writing Empirical Papers in the Social Sciences Using Statistical Material (HHÅ, 1996). We thank cand. rer. soc. Birgit Nahrstedt for updating some of the data and secretaries Ann-Marie Gabel and Bodil Rasmussen for making the publication ready for press. Hans Linderoth Jan Bentzen Docent, cand. oecon., ph.d. Lektor, cand. oecon.

Page 4: Guidelines for Writing Papers in De
Page 5: Guidelines for Writing Papers in De

1. Introduction 1 2. Phases of the work .................................................................................................................... 1 3. Formulating the statement of the problem ............................................................................. 4

3.1. Base material........................................................................................................................ 4 3.2. Comparative material........................................................................................................... 6 3.3. Explanatory material............................................................................................................ 8

A. Correlation........................................................................................................................ 9 B. Background factors ........................................................................................................ 10 C. The direction of causation ............................................................................................. 13 D. The cause and effect mechanism................................................................................... 13 E. Short and long run ......................................................................................................... 14 F. The selection of explanatory factors − final comments............................................... 16

3.4. Examples of the selection of explanatory material ............................................................ 16 3.4.1. Economic growth ........................................................................................................ 16 3.4.2. Regional differences in income ................................................................................... 18 3.4.3. Space heating in private households........................................................................... 19

4. Collecting data and other material........................................................................................ 20 5. Working with the material ..................................................................................................... 24

5.1. Table construction.............................................................................................................. 24 5.2. Figure construction ............................................................................................................ 29

5.2.1. Technical criteria........................................................................................................ 30 5.2.2. Logarithmic scales ...................................................................................................... 34 5.2.3. Bar and pie diagrams.................................................................................................. 39 5.2.4. Scatter diagrams ......................................................................................................... 45 5.2.5. Lorenz curves .............................................................................................................. 45

5.3. Using comparative and explanatory material .................................................................... 49 5.4. Standardizing means .......................................................................................................... 52

5.4.1. Price and quantity (volume) indexes........................................................................... 55 5.5. Analyzing time series data ................................................................................................. 61

5.5.1. The elements of a time series ...................................................................................... 62 5.5.2. Moving averages ......................................................................................................... 64 5.5.3. Seasonal correction .................................................................................................... 66 5.5.4. Trends and cycles........................................................................................................ 73

6. Making commentaries ............................................................................................................ 76 7. Construction of the report...................................................................................................... 78 References .................................................................................................................................... 81

Page 6: Guidelines for Writing Papers in De
Page 7: Guidelines for Writing Papers in De

1

1. Introduction

The following guidelines are intended for the student writing an empirical paper on a subject in

descriptive economics. The guidelines present the means and methods to be used in the

preparation of that paper and are applicable to analyses of subjects for which statistical material

is the primary source. The techniques presented here are relatively elementary and are intended

for students in their first year of study. However, in many cases, these techniques are relevant for

the writing of all papers assigned throughout the course of study.

The empirical paper has a form that is typically used in short reports that are delivered to, and

received from, public authorities or private management. These reports contain issues and

problems that are also defined, described, and analysed on the basis of collected statistical

material, etc.

The following pages contain a description of the work phases one goes through in producing

such a report. The content focuses on the methods, based on both technical and analytical

principles, for producing reports. The technical principles include techniques for calculating

statistical data, for producing appropriate tables, etc., and the analytical principles include

techniques for formulating the problem, conclusions, etc. It should be emphasized that not all

methods presented here are relevant for all subjects tackled in a paper. The student must decide

for him or herself which methods and techniques are appropriate for a successful result.

2. Phases of the work

The empirical paper starts with the idea, the issue and/or the problem one wishes to investigate.

These ideas/issues/problems must be precisely defined and set out in the form of a statement of

the problem, which in a detailed and concrete manner forms the agenda for the ensuing

investigation. Such an agenda demands that all concepts and terms used in the investigation are

also precisely defined, that all relevant questions or sub-problems that are to be answered in the

investigation are formulated, and that the method to be used in arriving at the solution is

presented. A paper in descriptive economics is, for the most part, to be based on statistical

material published by institutions in the public sector. The method entails, therefore, an

identification of the statistical material that is to be collected.

Page 8: Guidelines for Writing Papers in De

2

For example, if the issue or problem is related to the state's future ability to provide for the

elderly, the first step in working toward the statement of the problem must be to define the

concept "ability to provide". The ability to provide might be defined or measured, for example,

as public sector disbursements to the elderly in relation to total public sector disbursements or in

relation to tax revenues.

In the process of formulating the statement of the problem, one is often led to a narrowing of

the issues and problems in the investigation. For example, in referring to the example above, one

can choose to ignore service-based public disbursements (disbursements to homes for the

elderly, hospitals, etc.) so that the investigation only considers income transfers to the elderly.

The definitions or delimitations selected will suggest the formulation of sub-problems: How

many persons will, in the future, be counted in the older age groups? How many of the elderly

will receive various types of income transfers, etc? These sub-problems indicate a concrete need

for material in the form of population projections, pension payments, early retirement benefits,

etc.

In general, working toward the statement of the problem requires a kind of brainstorming in

which one dissects the problem, sets the delimitations, and presents a method for solving the

problem. Formulating the statement of the problem (phase l) is treated in-depth in the next

section.

When the statement of the problem has been formulated, the collection phase can begin. Then

the collected material is processed, analysed and commented on. Finally, all the work is strung

together into one report. It is important to take the work phases in the order sketched in Figure 1

and, for example, not to begin immediately with collection of the material without first having

the steering mechanism in place. If the order is not followed, there is a danger you will collect

and waste time on a deal of irrelevant material and that your seminar will consist of loose,

unrelated parts. It is, however, a good idea to read relevant chapters in, for example, The Danish

Economy before working out the statement of the problem in that knowledge of your subject is a

prerequisite for being able to formulate the statement.

Page 9: Guidelines for Writing Papers in De

3

Figure 1. Phases of work for the report.

Idea/issue/problem

Formulating the statement of the problem

Collecting the material

Working with the material

Making commentaries

Construction of the report

Section 3

Section 4

Section 5

Section 6

Section 7

When collecting data and information (phase 2), you might encounter previously unnoticed

material and points of view that can be relevant to the analysis and which make it necessary to

revise the statement of the problem. Therefore, you should be prepared to differentiate between a

draft statement of the problem, which results from preliminary investigations during the

beginning phase of the work, and the definitive statement of the problem, which gets decided

after a substantial amount of material has been collected. You might also encounter new material

along the way that suggests a narrowing or broadening of the focus on the issue or problem being

investigated.

In processing and manipulating the data and information (phase 3), you might feel compelled

to work again with earlier phases of the work. For example, if a calculated annual growth rate

reveals that a noticeable change has occurred in a particular year, it might be necessary to find

relevant explanatory material pertaining to that year. The same kind of need may arise when

working in the analysis phase (phase 4). Because the process of discovery and preparation is not

necessarily linear, you may wind up cycling around several times between the phases of the

work. These various phases will be described in-depth in the following sections.

Page 10: Guidelines for Writing Papers in De

4

3. Formulating the statement of the problem

In the previous section, it was indicated that the formulation of the statement of the problem

results in a list that serves to identify materials (statistics, legislation, analyses, etc.) needed to

carry out the investigation. The formulation itself could also contain a general description of how

the issues are to be described, analysed, and judged, given these materials.

The following sub-sections will describe three categories of data and information, all of which

must be included in producing the report. Base material comprises the data and information that

the title of the report directly reflects. If the topic is the deficit in the state budget, basic data and

information will comprise statistics that measure this deficit. If the topic is U.S. oil imports, the

statistical material will, of course, include quantity and value of these imports.

Comparative material comprises the data and information that is used as a standard or scale

against which the base material is compared. For example, the deficit in the state budget could be

compared with the state's revenues, the deficit in other countries, and/or the GDP, all of which

can be used for evaluating the seriousness of this deficit.

The third category is explanatory material, which comprises data and information that is used

to deepen the analysis by explaining the course of development, movement, and/or changes that

have been demonstrated in the base and comparative material. For example, why has the deficit,

measured in relation to revenues, risen by a particular amount in a particular period? The

following section will treat these categories more thoroughly.

3.1. Base material

If there is any doubt about the meaning of the terms used in the title of the paper, these terms

must be clearly and precisely defined. A distinction should be made between theoretical terms

and operational terms. Theoretical terms can be defined more precisely using other, more well-

known terms, while operational terms must be defined more precisely using measurement

methods. Often, operational definitions are provided in the explanations of terms in the texts in

which the data is found. Theoretically, an unemployed person can be defined as a person without

work, who wants and can work for a wage that is normally paid to persons with similar

qualifications. If one uses statistics that only include individuals eligible to receive

Page 11: Guidelines for Writing Papers in De

5

unemployment benefits, however, the theoretical and operational definitions will not agree

because there will always be a group of out-of-work individuals who would and could work, but

who are not eligible to receive unemployment benefits. In such a case, the operational term is not

considered adequate for the theoretical term.

If writing a paper titled "Market Sensitivity of the Textile Industry", you must decide how to

measure market sensitivity. It would be reasonable to use a measure of production that could be

related to measures of production used in other industries and/or other branches. Such a measure

could be used to determine whether the textile industry is more or less sensitive than other

branches to market fluctuations.

A paper titled "Kuwait's Economy" also demands theoretical considerations. It must be

decided exactly how to define and measure "economy". Data and information regarding the

national accounts, balance of payments, government budgets, etc. might be part of that

definition. But in a 15-page paper, there is not enough space for a comprehensive economic

description of any country. One must choose the economic factors considered most important for

the country of interest and be ready to justify that choice.

GDP per capita in constant prices is the term usually used to indicate development in a

country's economy. It is problematic to use the term in certain cases because GDP per capita in

constant prices can fall during a period, or the growth rate can fall, even if the country has

obviously become much more prosperous. For example, this might be the case for an oil-

exporting country after a distinct oil price rise coupled with reduced oil exports. The reduced oil

exports will, ceteris paribus, reduce GDP in constant prices, but the increased oil revenues can be

used to increase consumption via increased imports. In this case, it would at least be natural to

supplement GDP per capita with consumption per capita as a measure of economic welfare.

The Gulf War has also been used as a seminar topic, from an economic perspective. This topic

demands a precise discussion of which economic factors might be the most important for the

subject and should, therefore, be included.

Clarification of the topic "Fuel Oil Consumption" demands neither theoretical nor operational

considerations as it deals with the consumption of a well-defined good, which is reported

quarterly in the statistics. A long list of examples could be presented here for which the

clarification of terms is not necessary. It is the responsibility of the author to decide if the paper

title contains problem words requiring clarification.

Page 12: Guidelines for Writing Papers in De

6

In addition to clarifying the meaning of the terms used in the base material, clarification should

also be made with respect to the choice of time period and the degree of detail. Deciding the time

period involves not only the choice of the year in which the investigation begins, but also the

interval of years used throughout. If the paper focuses on a 10-year period, it is not in all cases

necessary to include material from all 10 years. It can sometimes be a good idea to divide the

whole period up into sub-periods, for example. Regarding the end point in the time period, it is

ultimately important to use the most recently available material.

The degree of detail concerns the division of the base material into sub-groups. For a subject

involving age distribution, you must decide how many age groups and what range of ages to use.

For a subject involving Denmark's energy consumption, you must decide whether to divide

consumption by energy products or consumption sectors. Should consumption, thereafter, be

divided into all different forms of energy products, such as petrol, fuel oil, coal, brown coal, etc.,

or only divided into oil products, solid fuels, etc.? For a subject involving industrial structure,

should the manufacturing industry be treated according to its sub-industries, or should the

industry be treated as an aggregate? The division of material into sub-groups is usually an

essential part of the analysis.

The optimal degree of detail is determined by the objective of the paper. The student often uses

an unnecessary amount of detail, and this results in a paper with large, unclear tables in which

patterns in the material of interest are difficult to figure out.

Considerations about the use and definition of terms, time periods, and degree of detail are

relevant to not only the base material, but also to comparative and explanatory material. This

applies as well to consistency between the three material categories. Students often mistakenly

use inconsistent time periods when discussing base material, comparative material, and

explanatory material − either the intervals are different and/or the beginning and ending dates do

not match. As a ruIe, this does not work, especially because the comparative and explanatory

material must relate directly to the base material.

3.2. Comparative material

It was earlier mentioned that comparative material should be used as a context for the base

material. For example, an analysis of the wages of primary and lower secondary school teachers

Page 13: Guidelines for Writing Papers in De

7

might also be compared to the wages of individuals in other occupational groups. Or, an analysis

of the history of employment in a particular sector might be related to employment in other

sectors and/or to employment in that sector in other countries. Without comparative material, it is

not possible to judge if the base material reflects a high, an average, or a low value in a given

year, or if a growth rate measuring development, changes, or movement is high, average, or low.

In a paper written about copper, information indicated that the total amount of copper in

manganese deposits on the sea floor had been estimated at 3 billion tons. Such information

cannot stand alone. It must be related to copper consumption and/or quantities of copper from

other sources. In Meadows (1972), it is stated that:

Given present resource consumption rates and the projected increase in these rates, the great majority of the currently important non-renewable resources will be extremely costly 100 years from now. ... The price of mercury, for example, has gone up 500 percent in the last 20 years; the price of lead has increased 300 percent in the last 30 years.

It is clear that the last sentence is included to make more credible the prediction of a steep rise in

the future price of raw materials given that the prices of some raw materials have already begun

to rise sharply. However, the question is if the prices of mercury and lead really have risen very

much? That cannot be concluded without comparative material in the form of price movements

of other related products, for example. Besides, a price rise of 300% over the course of 30 years

is equivalent to annual rate of 4¾%, which is hardly more than the rise in prices for many other

products.

In another paper, it was mentioned that cultivated land area in Iraq increased by less than 2%

between 1979 and 1989. The paper concluded by saying "there had been very little change in the

size of cultivated land". That seems reasonable given that 2% is a modest number in many cases.

But cultivated land area changes very little over a decade, and in most countries, this area

actually decreases in size. Therefore, seen in the context of world-wide changes, a 2% increase is

relatively large.

In a third paper, it was stated that grain was the most important agricultural product. This

conclusion was based on the total quantity of production in tons. However, the production of

grain should be compared in value terms with the production of other agricultural products if the

intention is to identify the most important agricultural product. It would actually be best to use

Page 14: Guidelines for Writing Papers in De

8

value added as a measure for value.

Comparative material forms the context for evaluation. The better this context, the more in-

depth an analysis of the base material can be made, thus leading to a greater understanding of the

issue under investigation.

3.3. Explanatory material

An analysis including only base material, supplemented possibly by comparative material, can

only answer how, when, and what questions. The purpose is to map out the objects of the

analysis using a certain amount of information. For example, in 2006, there were x unemployed

persons on average per week, of which y were ... etc. For another example, in the period 1972-

2006, oil consumption fell by x PJ of which y PJ is due to a fall in oil consumption used for

space heating, z PJ is due to a fall in oil consumption in the utility sector, etc. Such a

decomposition of the total can be said to explain some of the development in the total number.

You can, but only to a certain degree, respond to the "why" question.

A deeper analysis of "why", however, requires a cause and effect (or causal) analysis.

A causal analysis provides the greatest knowledge about the objects of the analysis. The material

that supplements the base and comparative material in a causal analysis is called the explanatory

material. The purpose of a causal analysis is to establish a factor C as the cause of a particular

effect E.

E can be the number of unemployed individuals divided into groups by characteristics at

different points in time. For example, in a causal analysis, one might be to explain why

unemployment is larger in North Jutland than in other parts of Denmark, or why unemployment

is larger among women than among men. The purpose might be to establish a causal relationship

between occupation and mortality, between marital status and mortality, or between income and

mortality.

Cindep. Edep.Explained factor Explanatory factor

Page 15: Guidelines for Writing Papers in De

9

In the following sub-sections, the discussion gets around the considerations one should take

into account in making a causal analysis. These considerations are relevant both to the choice of

explanatory material and to the conclusion phase of the paper (cf. Section 6).

A. Correlation

The material should reveal a pattern between C and E. If C is occupation and E is mortality, the

pattern might consist of a large difference in mortality rates among the occupational groups.

Which occupations are hazardous and which are not? To the extent there is no difference,

occupation is not an explanatory factor in an analysis of mortality.

Occupation is an example of a qualitative variable, the value of which cannot be measured or

expressed in numbers. Other examples of qualitative variables include gender, municipalities,

countries, marital status.

As opposed to qualitative variables, quantitative variables can be expressed in numbers.

Examples of quantitative variables include age, height, product, and income. Instead of using

occupation as an explanatory factor in the analysis of mortality, one can choose income, as

mentioned earlier.

These guidelines do not contain a discussion of the statistical tests used to determine if two or

more variables are correlated. You must be content to compare the variables using a sketched

figure based on the respective variables' values or by listing these values in a table and looking

for the pattern in the material.

If large values of C correspond to large values of E, then the correlation is positive. If large

values of C correspond to small values of E, then the correlation is negative. For example, if

unemployment (E) is larger this year while economic growth (C) is lower, then the correlation is

negative. As a rule, there is a positive correlation between consumption and income and a

negative correlation between consumption and the price of a good.1 The probing for positive and

negative correlations has, of course, only significance in an analysis of relationships between

quantitative variables.

The degree of the relationship or correlation between two variables can be measured by R2,

which indicates the degree of linearity between the variables in question. In scatter diagrams,

1 For normal goods, income elasticity is positive and price elasticity is negative.

Page 16: Guidelines for Writing Papers in De

10

Excel can display R2 values on charts. The higher the value of R2, the stronger is the relationship

between the variables. If R2 is one, there is a perfect linear relationship between the variables. A

value equal to zero means there is no relationship at all.

B. Background factors

The pattern or correlation that is revealed under point A is not necessarily a sign of causality.

Correlation can be found among a number of variables for which no causality is present. The

correlation can be due to the condition that both C and E are causally connected to a common

cause Cl (cf. case 1 in the following figure). Income per capita is correlated with a number of

variables that are not necessarily causally connected themselves. For example, there is a positive

correlation between GDP per capita and women's participation in the labour force and between

GDP per capita and alcohol consumption. The positive correlation between women's

participation in the labour force and alcohol consumption that results from these two

relationships hardly expresses a causal relationship. At least, this requires a demonstration that

women in the labour force, ceteris paribus, drink more alcohol than women who remain at home.

The usual problem in causal analysis is that factors other than C have significance for E. These

other factors are called background factors (B, contributory causes), cf., case 2.

In the example of occupation and mortality, the background factors could be age, gender,

inclination to smoke, eating habits, alcohol consumption, marital status, etc. In another example,

not only is fertility dependent on income, but it is also dependent on occupation, religion, marital

status, and residence, among other factors.

Case 1: Situation with common causes

Case 2: Situation with contribu-tory causes

Case 3: Situation with intermedi-ate causes

C1 E C B E

C

E

C

B

Page 17: Guidelines for Writing Papers in De

11

Case 3 treats intermediate causes. As an example, alcohol consumption could be an

intermediate cause between occupation and mortality. In certain occupations, there may be a

tradition for an relatively high alcohol consumption. That is, it is not the work itself that is

dangerous.

Globally, a negative correlation between income and fertility can be displayed, cf. Figure 3.1.

Maybe this correlation is based on a positive correlation between income and the mother's level

of education and a negative correlation between the mother's level of education and fertility. If

this is the correct relationship, fertility will not fall as income rises if women's level of education

does not rise as well when income rises.

Figure 3.1. Correlation between fertility and GNI per capita, 2004.

R2 = 0.4641

0

1

2

3

4

5

6

7

0 10000 20000 30000 40000 50000 60000

GNI per capita, PPP

Ferti

lity

rate

, tot

al

Saudi Arabia

KuwaitIsrael

Luxembourg

Hong Kong, China

Denmark

Russia

China

Source: World Development Indicators, 2006.

Figure 3.1 shows a significant spread around the drawn curve. It shows, for example, that the

point for China lies significantly under the curve. This is partly explained by a distinct policy

China has for limiting fertility, and is also presumably partly explained by the high level of

education women receive in China relative to the level of income. In contrast, points for

Page 18: Guidelines for Writing Papers in De

12

countries in the Middle East lie above the drawn curve, presumably because of the low level of

education for women relative to the level of income. And the relatively low level of education

for women in the Middle East can possibly be explained on the basis of religious and cultural

background. One must remember, however, that income level in the Middle East has increased

tremendously over a short period of time as a consequence of the development in the oil market.

Danmarks Statistik2 has shown that the risk of an accident, and the resulting personal damage,

associated with private cars that are 8-11 years old are approximately double that for cars that are

only 0-3 years old.3 These numbers indicate a clear causal relationship between the age of a car

and the risk of an accident. But maybe a substantial part of this relationship can be explained by

the age of the driver. It has been documented that drivers under 25, and over 65, years of age run,

respectively, 4 times and 2½ times the risk of an accident than do drivers between 35 and 64

years of age. And due to economic reasons, drivers of older cars are principally under 25, and

over 64, years of age! Therefore, it can be the driver's age that is so decisive for accident risk and

not that of the car.

In all, it can be said that one faces a complicated network of relationships,4 where a factor can

be explained by a series of other factors which themselves can be explained by a series of other

factors, etc. These kinds of networks are called causal chains. This involves explanations of

explanations.

If the purpose of the paper is not to illustrate the relationship between E and a particular

explanatory factor C, then the distinction between C and B has no meaning in the formulation of

the statement of the problem, in which one takes into consideration only those explanatory

factors that should be brought into the analysis. On the other hand, where the relationship

between E and C is important, this distinction has great significance for the comments of

correlation between two variables. If an important background factor is not accounted for in the

analysis, the conclusion will most likely be completely off track.

2 This is the name for Denmark's statistical office, Statistics Denmark. 3 News from Danmarks Statistik (NYT), No. 321, 1993. 4 The economist attempts to account for this network of relationships by constructing models that build in the causal relationships among a range of economic variables.

Page 19: Guidelines for Writing Papers in De

13

C. The direction of causation

Does an occupation result in a particular mortality rate, or does a particular mortality rate lead to

a particular occupation? Should the arrow (the direction of causation) be turned around? There is

hardly any doubt that some occupations require a particular standard of health and thereby are

connected to mortality. Often the timing between factors is not clear. Do increased wages lead to

increased prices or vice verse? Has the increased mechanization in agriculture led to the

increased exodus of workers or vice versa? Does increased income per capita lead to increased

levels of education or vice versa? There is a negative correlation between income per capita and

agriculture's share of GDP at factor prices. This is not the same as saying that "increased income

is the cause of a fall in agriculture's share of GDP" because the increased income may possibly

be based on a transfer of labour from agriculture to other sectors where the wages to factors of

production are higher than in agriculture. If this is the case, a fall in the share of GDP is a

contributory cause to an increased GDP per capita. On the other hand, high economic growth in

general encourages the migration of workers from agriculture because high economic growth

creates relatively good employment possibilities in the manufacturing sector, for example. This

transfer of a factor of production to other sectors reduces the agricultural sector's share of GDP,

ceteris paribus.

When two factors influence each other (C ↔ E), there is mutual causality. One cannot

maintain that one factor causes the other. Economic growth and agriculture's decreasing share of

GDP are mutually related. In the context of mutual causality, the actual issue being investigated

can be decisive for which factor should be treated as the dependent factor (E) and which factor

should be treated as the independent factor (C).

In statistical analyses, the aim is often to test the explanatory power of a factor. For example,

this can be done by investigating if the change in an explanatory factory (C) takes effect before a

change in the explained factor (E). That is, the data is analyzed closely to determine if potential

changes in C typically lead (in time) to subsequent changes in E.

D. The cause and effect mechanism

The relationship between C and E might be based on a sequence of events which can be

Page 20: Guidelines for Writing Papers in De

14

described in more or less detail. Industry x is characterized by work taking place in shifts and

involving hazardous substances, etc. By supplementing the investigation with an explanation of

the causal mechanism, one can further establish whether the correlation between C and E is of a

causal type.

To explain the relationship between economic variables, you must use economic theories. In

reality, economic theories are brought in as the first step in trying to decide what the explanatory

material should consist of, in that these theories point towards material that is meaningful in a

given relationship. When the analysis concerns the consumption of a good, it is natural to bring

in disposable income as an explanatory factor, as well as other factors. Disposable income is, in

itself, dependent on tax policy. Investments are dependent on interest rate movements and

economic development in general, etc.

In a paper, the terms of trade (export price index/import price index) entered as an important

economic growth factor. It was hypothesized that improved terms of trade during a period had

led to increased growth. The correctness of this hypothesis depends on why the terms of trade

had improved. If it had improved because of increased domestic wages, competitive ability

would have become worse, ceteris paribus, which would have influenced the quantity of exports

negatively. Improved terms of trade based on domestic increases in costs is, therefore, growth

reducing. On the other hand, the terms of trade could have improved as a consequence of

increased demand for the country's export goods, and this increases growth.

Throughout the first two oil crises, the terms of trade fell for a range of industrial countries as

a consequence of the increased import prices for energy. Because energy consumption was/is

very price inelastic, at least in the short term, an increased share of income had to be used on

energy consumption, which of course reduced the demand for other goods and services, and this

reduced growth. Growth was negative in the wake of the energy crises.

To summarize, one should be able to justify the choice of explanatory material. The causal

connection between C and E must be made plausible.

E. Short and long run

Many examples can be found in economic theory where the effect is first felt after a period of

time has passed. A permanent increase in income leads normally to increased consumption, but

Page 21: Guidelines for Writing Papers in De

15

the full effect is first felt after consumers become used to the higher income. Higher oil prices

lead to an increased demand for other energy sources, but the increase in demand for energy

products other than oil is greater in the long run than in the short run because substitutability is

greater in the long run than in the short run.

If one is interested in the effects in the long run, one cannot be satisfied with data that registers

effects in the short run. An incorrect registration with respect to time can result in an incorrect

conclusion concerning the direction of correlation (positive/negative) between C and E and as

well as the strength of the correlation. If the correlation is negative in the short run and positive

in the long run, and one can only determine the short run effects, the chances are high that

incorrect conclusions will be drawn.

In the example about occupation and mortality, the following time-related sources might

incorrectly be drawn in:

Occupation x is the hazardous occupation which a worker leaves after a few years. As a

consequence of this hazardous occupation, he or she either retires with disability payments or is

so ill that he or she chooses the less hazardous occupation "u" after the illness period. In using

the correct data to determine the relationship, one can see that a hypothesis relating mortality and

occupation should be rejected; too simple a model overlooks the intervening time variables.

A paper on economic growth included a section on basic growth factors in the long run. One

discussion centred on increases in factors of production, such as capital, labour, and productivity.

But the paper included data for only a few years and was limited to a description of short-term

fluctuations in GDP. In reality, the long run explanatory factors were not of interest, since the

paper very clearly used only data that referred to the business cycle.

In another paper, it was mentioned that increased economic growth implied increased public

expenditures because greater growth increased government revenues and consequential the

possibility for committing to larger expenditures. This positive correlation between economic

growth and public expenditures applies in the long run in that a rich country generally makes

greater public expenditures than a less rich country does. In the short run, however, increased

economic growth will result in a fall in public transfer payments to unemployment benefits and

occupation x death

occupation x death illness occupation u

retirement due to disability

Page 22: Guidelines for Writing Papers in De

16

welfare; that is, the correlation is negative in the short run. And since this paper had only data for

a shorter number of years, neither the long run relationship nor the included hypothesis was

relevant.

F. The selection of explanatory factors − final comments

The objective of the paper is to decide which explanatory material to use. In a descriptive

investigation, there is need for only little or no explanatory material. On the other hand, in

making a causal analysis, one should not strive to pack in as many explanatory factors as

possible, but only to select the presumably most important factors, which can be treated more

thoroughly as a result. In the selection of these presumably most important factors, the

distinction between short run and long run explanatory factors is especially important. If the

analysis is to examine the change in base and comparative data over the short term, the factors

having great explanatory power will not be the same as those having explanatory power for long

run effects (cf. examples discussed earlier).

Often the base material is divided up into groups. As examples, industry in general can be

divided up into industry branches, and energy consumption can be divided into sectors as well

as energy products. In the selection of explanatory material, one should not select material

having great explanatory power for only a small sub-group. This will unbalance the paper.

However, in most cases, an analysis will be strengthened if special attention is paid to include

explanatory material relevant to analyses of those periods where the changes are distinct.

Finally, it should be mentioned that finding a reasonable argument for causality between two

variables does not mean that one has proved causality. More sophisticated tests are required.

But it can be said that causality is likely, given the arguments made in the paper.

3.4. Examples of the selection of explanatory material

3.4.1. Economic growth

As mentioned earlier, a socio-economic issue most often involves a network of relationships. An

analysis of the basis for economic growth can therefore be very complicated, involving a wide

Page 23: Guidelines for Writing Papers in De

17

range of explanatory variables, cf. Figure 3.4.1.

Figure 3.4.1. Causal network of economic growth

Primarily, growth can be explained by the development in factor inputs and the development in

output per unit of factor input. The larger the increase in the effort of production factors, the

greater is growth, and the greater output per unit of factor input, the greater is growth. To make

the matter more complicated, the included factors or variables are not independent of each other.

The development of technological progress is not independent of capital effort and educational

level. It is also clear that the development in employment is dependent on economic growth rate

and vice versa. It is, in general, difficult to isolate the contribution of the individual factors, so

some debatable calculated assumptions must be used. These will not be treated here.

The variables discussed above are proximate sources of growth. So-called ultimate sources of

growth reflect basic relationships in an economy, such as culture (tradition for education, etc.),

demography, history, institutional relationships, economic policy, etc.

Even in a comprehensive analysis, which is much more than what is expected in a paper, you

can select only a limited number of explanatory variables and be satisfied with that. The choice

of variables will depend on the actual issues under investigation. Under all circumstances, it is an

advantage to be able to recognize the larger causal network when concluding the paper.

Page 24: Guidelines for Writing Papers in De

18

3.4.2. Regional differences in income

It is apparent that average income for the active worker is a function of a region's distribution of

industries and industry-determined wages. A region can have a relatively low average income

because the region has relatively many employed in industries where the wages, in general, are

low. On average income may be low because wages, themselves, are relatively low for the

region for a given industry.

It would, therefore, be natural to start by investigating the distribution of industries and

industry-determined wages by establishing the explanatory material, as in Figure 3.4.2.

Figure 3.4.2. Explanatory material of regional differences in active workers' incomes.

The regional distribution of industries is determined by the resource base, among other things.

The resource base includes agricultural land, fish stocks, tourist attractions, etc. The distribution

of industries is also determined by the age and gender structure in the population. For example,

not only is the distribution of jobs held by young women different from those held by older

Page 25: Guidelines for Writing Papers in De

19

women, but the rate of participation also differs between the two groups. It must be

remembered, however, that the opportunities for working influence the age and gender

distribution found in the working population. For example, poor employment opportunities lead

to an emigration of young people, especially.

The international business cycle influences industry income fluctuations to different degrees.

Industries that produce investment goods for export are especially sensitive to these business

cycles. EU's agricultural prices influence earnings in the agricultural sector. The justification for

all the arrows in Figure 3.4.2 will not be given here. It is left as an exercise for the student to

work out the justification for these sketched relationships, as well as to suggest others.

As mentioned in the last section under point F, one should select only those explanatory

factors considered most important in relation to the chosen time horizon, among other things. In

Figure 3.4.2, one would consider resource base, age, and gender to be long run factors; one

would consider agricultural prices and the international business cycle to be short run factors.

3.4.3. Space heating in private households

In Figure 3.4.3, income, prices, temperature, housing area, etc. appear as explanatory factors.

Naturally, income influences energy consumption. The higher the income, the higher is

energy consumption. High energy consumption, caused by a low outdoor temperature, among

other things, will increase insulation activities (the arrow from consumption to insulation) and

will lead to a reduced indoor temperature (arrow from consumption to indoor temperature

setting) because high consumption means that the proportion of energy consumption in total

consumption is high. Desired indoor temperature should, therefore, be seen in relation to prices

and income.

Changes in outdoor temperature will influence consumption somewhat in the short run, while

housing area will influence consumption in the long run. If one is interested in explaining the

per capita use of energy for space heating among selected countries, relevant factors will

include income, price, housing area, and degree of insulation. Further explanation of Figure

3.4.3 is left to the reader.

Page 26: Guidelines for Writing Papers in De

20

Figure 3.4.3. Explanatory elements for energy used for space heating in private households.

Often, one must do one's best without relevant explanatory factors because information about

these factors cannot be located or is missing. For example, it would most likely be difficult to

obtain information about insulation standards in the majority of countries.

4. Collecting data and other material

When the formulation of the statement of the problem is finalized, one knows to a great extent

which material needs to be collected in the libraries. While the library staff can help with search

techniques, these guidelines will focus on problems that might occur in the process of collecting

data and information.

As indicated in Table 4.1, different statistical sources often report different results for

presumably identical terms. Several sources are often used when one is working with a period

Page 27: Guidelines for Writing Papers in De

21

where the oldest data must be taken from one source and the newest data must be taken from

another source. One can check to see if the shift in sources creates problems by comparing data

for the same year in the two sources. If there are significant differences, one ought to carefully

read the explanation/definition of terms usually accompanying statistical material. In the

conclusion, then, one can draw attention to the divergence in the statistics and provide an

assessment as to whether this divergence is significant for the analysis.

Table 4.1. Denmark's total energy requirements and final energy consumption in 2001, as assessed by various organizations (PJ).

Statistics Denmark Gross energy consumption 787 Gross energy consumption, adjusted 815 Danish Energy Agency Total gross energy consumption 829 Total final energy consumption 642 Gross energy consumption, adjusted 831 BP 779 OECD Total supply 828 Total final consumption 635 Sources: Statistical Ten Year Review 2003 (Statistics Denmark), The Danish Energy Agency (2002), Statistical

Review of World Energy 2001 (BP, www.bp.com), OECD (2003).

Several sources might also be used when one wishes to compare energy consumption in several

sectors, for example. That information may not necessarily be found in one source. It should be

noted that energy statistics, especially, are plagued by a lack of consistency among sources. In

most statistics, international efforts to work out the inconsistencies found in term definition and

structure, etc., have been so comprehensive that many comparisons today can be carried out

without a problem.

The most typical cases of inconsistency in data arise when you rely on statistical material

found in books. The material is often incomplete, and term definition is often lacking. It must be

emphasized that you should collect data and information from primary statistical sources and not

from books, to as great an extent as possible. Not only may books often be filled with errors,

they will not contain the most up-to-date material either.

One must also pay close attention to the continuous updating of, and corrections made to,

statistical data. For example, the numbers in the national accounts are issued in several versions

Page 28: Guidelines for Writing Papers in De

22

at different periods of time because the primary material used in creating the national accounts is

available at different periods of time. To the extent possible, therefore, one must use numbers

from the most recent sources.

Breaks in the data will also occur when the methods for constructing that data change. These

kinds of breaks occur often in a time series. One must asses, then, if the break in the data has

significance for the analysis. If this is the case, then the break in the data must be discussed in the

paper.

A data break can be caused by changes in administrative structure. For example, the reform of

local government structure in 1970 – like the recent reform (2007) - resulted in a significant

change in the number of municipalities and counties. This made statistical comparisons of data

collected before and after 1970 either very difficult or virtually impossible. Further, a data break

can be caused by a change in the definition of industries and branches.

OPEC,5 EU and EFTA have represented a different number of countries at different points in

time. This means that you should not only be aware of data breaks in the data issued by these

organizations, but also in the data issued by other organizations where similar changes may have

taken place.

In summary, it is very important that you are aware of significant changes occurring in a time

series due to breaks in the data. The student must closely read, and be familiar with, all

footnotes, notes, etc. that accompany data. Warnings about breaks in the data, term redefinition,

etc. will usually be found in footnotes, notes, etc.

Thus, data should only be collected from primary statistical sources as the national statistical

bureaus, OECD, ECB, etc. and not from a general search on the internet. Most of the

information and data found at different web-sites are not produced in a quality similar to the

before-mentioned sources and cannot generally be recommended for use in empirical papers –

with exemptions, of course. A problem with the electronic data sources is the limited amount of

information directly available when accessing the databases – compared to the printed,

statistical material - and it is often necessary to search for more information about the data,

definitions etc, e.g. the OECD homepage (or SourceOECD) where a lot of reports etc. are

available along huge amounts of data. Finally, be aware of the different use of 'comma' and

'period' used as separators in the data bases, where e.g. SourceOECD would list a number as

5 Organization of the Petroleum Exporting Countries.

Page 29: Guidelines for Writing Papers in De

23

1,280.00 – which would appear as 1280,00 (or 1.280,00) if Statistics Denmark should report

such a number.

List of www-addresses

http://www.dst.dk/ (Statistics Denmark) http://www.statistikbanken.dk/ (Databank at Statistics Denmark) http://www.sfi.dk/ (National Institute of Social Research) http://www.akf.dk/ (Institute of Local Government Studies - Denmark) http://www.fm.dk/ (Ministry of Finance) http://www.skm.dk/ (Ministry of Taxation) http://www.sm.dk/ (Ministry of Social Affairs) http://www.retsinfo.dk/ (Information on Danish Laws) http://www.oecd.org/ (OECD) http://www.ssb.no/ (Statistics Norway) http://www.scb.se/ (Statistics Sweden) http://www.ae-dk.dk/ (Economic Council of the Labor Movement) http://www.di.dk/ (Danish Industry) http://www.dors.dk/ (Economic Council, Denmark) http://www.ecb.int/ (European Central Bank) http://www.imf.org/ (IMF) http://www.nationalbanken.dk/ (The central bank of Denmark) http://www.undp.org/ (UNDP) http://www.who.int/ (WHO) http://www.doe.gov/ (The American Energy Administration) http://www.iea.org/ (International Energy Agency (IEA)) http://www.iisd.ca/ (International Institute for Sustainable Development) http://www.ipcc.ch/ (The UN Intergovernmental Panel on Climate Change (IPCC)) http://www.ens.dk/ (Danish Energy Agency) http://www.da.dk/ (Danish Employers Confederation) http://www.danmark.dk/ (Rules, transfer income and eligibility) http://www.saf.se/ (Swedish Employers Confederation) http://www.europa.int/comm/eurostat (Eurostat) http://www.europa.eu.int/ (EU) http://www.wto.org/ (WTO) http://www.bis.org/ (BIS) http://www.finansraadet.dk/ (Danish Bankers Association) http://www.forsikringenshus.dk/ (Danish Insurance Association) http://www.ftnet.dk/ (Danish Financial Supervisory Authority) http://www.realkreditraadet.dk/ (The Association of Danish Mortgage Banks) http://www.xcse.dk/ (Copenhagen Stock Exchange) http://www.em.dk/ (Danish Ministry of Business and Industry) http://www.fao.org/ (FAO) http://www.fvm.dk/ (Danish Ministry of Food, Agriculture and Fisheries) http://www.landbrug.dk/ (Links to Danish agricultural institutions etc.)

Page 30: Guidelines for Writing Papers in De

24

http://www.min.dk/ (Danish Ministry of the Environment) http://www.fedstats.gov/ (Links to US federal agencies) http://www.ks.dk/ (Danish Competition Authority) http://www.worldbank.org/ (The World Bank) http://www.worldwatch.org/ (Worldwatch Institute) http://www.wri.org/ (World Resources Institute) http://www.ilo.org/ (International Labour Organization) fmwww.bc.edu/ec/data.html (Economic and Financial Data) http://www.econlinks.com/ (Economics News and Data) http://www.economagic.com/ (Economic Time Series)

5. Working with the material

Working with the material means producing tables and figures as well as calculating relevant

indexes and other data, etc. This chapter covers the techniques for achieving just that. The first

two sections treat the techniques for table and figure construction. The sections following apply

to how you use comparative and explanatory material as well as how to standardize means and

analyze time series data.

5.1. Table construction

The main purpose in using tables is to present statistical material in a clear and succinct form. A

text filled with a lot of numbers is difficult to wade through. By creating a clear presentation of

the data in a table, the text is no longer cluttered with numbers.

A table consists of data presented in a special frame containing all the necessary information

for understanding what the data stands for and which sources have been used. Such a frame for

a table is illustrated below.

The table number is used in the text to refer back to the table. You can use consecutive

numbering throughout or use consecutive numbering within each section, as is done here. The

title must precisely state what information is found in the table and will most likely consist of

three elements. The first element is the statistical unit being counted. The composite whole, also

called the population, makes up the sum of all the units in the data. The whole can be, for

example, the population of Denmark, and the corresponding unit would be one Dane. The title

would begin with "Number of persons in Denmark" or "Population of Denmark".

Page 31: Guidelines for Writing Papers in De

25

Tabel 5.1.1. Title

Heading for Headings for the row variables the column variables Row variables Data Total Notes, if any: Footnotes, if any: Sources:

The second element is often an identifying variable(s) associated with the units in the table.

Identifying variables associated with Danes could be age, gender, income, marital status, etc. If

the included variables are age and gender, then, as the second element, they appears in the title

as "by age and gender". Note that the categories representing the identifying variables (in the

case for gender, these categories are female and male) are not mentioned in the title. The third

element included in the title is the time period. As in Table 5.1.2 below, the chosen time period

is January 1, 1970 and 2006. The title in this table thus becomes "Number of persons in

Denmark, by gender and age, January 1, 1970 and 2006."

The categories representing the identifying variables appear in the column and row headings.

In this case, there are two variables plus the time period. With three dimensions, it becomes

necessary to assign two of the dimensions to either the column or the row heading. If only two

dimensions were included, then there would be no need to divide either the column or the row

heading. If four dimensions were included, then it would be necessary to divide up both the

column and row headings unless either the row or the column heading could be divided up into

three. The student is warned against using more than four dimensions in a table because this

would create only confusion not clarity, which the table is supposed to achieve.

Page 32: Guidelines for Writing Papers in De

26

Table 5.1.2. Number of persons in Denmark, by gender and age, January 1, 1970 and 2006.

Source: www.statistikbanken.dk Labels are used in both the column and row headings to ease the interpretation of the table.

Note that nothing is gained by writing "gender" above the headings "female and male" because

it is already clear that the feature is gender. Correspondingly, there is no gain in writing

"country" in the heading above if Denmark, Sweden, Norway, etc. appear, or "year" where

1970, 1971, etc. appear. In the selected example, however, it could be of value to include a

heading for the included age groups even if you could figure out what the variable is all about

just by reading the title. The reason is that it is not immediately clear what these groupings

represent.

Next, a table requires mention of the measure used in the data material. Is the measure being

made in millions of persons, thousands of units, GJ (billion Joules) or something else? In

addition, the measure can be placed in a number of places. If there is only one measure

represented in the table, it can be placed last in the title or just above where the data is located.

The latter is often preferred (see Table 5.1.2). If there are several measures, these must be placed

outside or inside the areas of the respective columns or rows. The measures must not be placed

in a footnote or a note where they can easily be overlooked.

Note in the first column of Table 5.1.2 that, when all the numbers are added up, they do not

match the total given below. This occurs because of the practice of rounding off the individual

numbers for presentation while the total is based on the sum of the numbers before they are

rounded off. As a result, there is often a mismatch between the sum of the numbers in a column

and the total given for that column.

Rounding off is practiced because it is not always necessary to include all the digits in the data

to give a reasonable presentation of the numbers. For example, it is rarely necessary to write the

population of Denmark with seven digits. In Table 5.1.2, only 4 digits are used and the measure

Female Male Total 1970 2006 1970 2006 1970 2006 ------------------------------------ 1000 persons --------------------------------------- 0 - 19 years 743 648 780 682 1523 1330 20 - 39 years 667 699 689 713 1357 1412 40 - 59 years 593 754 575 768 1169 1523 60 years and over 472 640 388 523 859 1163 Total 2474 2740 2432 2686 4907 5427

Page 33: Guidelines for Writing Papers in De

27

is in 1000 persons.

Note the use of lines in the table − the data are not placed in separate windows, but appear

instead as a body. There is room in the table for the sum of the columns; and notes and footnotes

are placed under the last line of the table, if there are any. Notes are used to provide any

supplementary information about the statistical unit being measured or the table in general;

footnotes are used to provide supplementary information about individual elements in the table.

For example, there might be a need to discuss the definition of a term or the technique used in

calculating some of the numbers in a footnote.

The table is made complete by a clear citation of the sources used. When a table is used in a

report, you need to identify the source for the information under that table with just enough

information that the reader can easily locate the full information for that source in the reference

list at the end of the report. The information and form requirements for the reference list are

discussed in Section 7. You can identify a source by the author and year of publication, for

example, Andersen, et al. (2006), if one has used The Danish Economy as the source. If the

author appears several times in the reference list for the same year, one should provide a suffix

to the year, such as a, b, etc. (for example, 2006a). This suffix must also be used in the report's

reference list. In table 5.1.2, the source is a statistical database. Including the name of the

variable (www.statistikbanken.dk/BEF1A) will make it easier to find the data.

Diverse organizations, both public and private, often issue reports. When these reports are

used as the source for a table, citation of the source should contain either the title and year or

number of publication (for example, World Development Report, 2006 which is issued by the

World Bank) or the name of the organization and year (for example, The World Bank (2006)).

The citation of the source in the table is made complete by a page or table number. A

reference to the pages used makes it easier for the opponent, or other interested readers, to

reproduce the material. It is a general requirement for technical reports that the included

material can be reproduced. Without proper reference to pages or tables, it can be especially

difficult to locate the actual material when books are used as sources

The rules discussed for citing sources for information used in tables also apply for material

that is used in the text. References to sources for material used in the text are placed either in the

text (for example, Andersen (2006), p. 25), or as a footnote located at the end of the page. The

latter is often preferred in that a series of references to sources in the text can make the text

Page 34: Guidelines for Writing Papers in De

28

cumbersome.

It may be necessary to include numbers/data in the text when the amount to be used is too

small to establish a table. For example, one would hardly construct a table to present two or

three numbers. These numbers can be mentioned in the text without making the text

cumbersome. Footnotes can also be used to annotate the text or to make side comments that are

not central to the subject, but which can be interesting to the reader anyway.

In Table 5.1.2, only the actual numbers are included. It can be seen that the population

increased from 1970 to 2006 and that this increase took place only in the age groups 20 years

and over. But the structure in the material becomes clearer if the numbers are presented as

percentages, as in Table 5.1.3.

Tabel 5.1.3. Number of persons in Denmark, by gender and age, January 1, 1970 and 2006.

Source: As in Table 5.1.2.

In Table 5.1.3, the proportion of the population in each age group is shown by gender for 1970

and 2006. The actual numbers are included in the last row, distributed by gender and year,

making it is possible for the reader to calculate back to the original numbers found in Table

5.1.2. The table also reveals the development in the population for each gender and age group.

If the focus is to be on growth in the population for each gender and age group, the numbers

should be presented as in Table 5.1.4. In this table, the change in the population for each gender

and age group can be seen directly by comparing the index numbers in 2006 with the index

number for each group in 1970, or 100. Instead of index numbers, changes in per cent could have

been used. In this case, in the location "0-19 years / 2006 / female", − 13% would replace the

index number 87. The most useful form of presentation depends on the issue being investigated.

But it should be emphasized that in the greater number of cases, you will have to work the data

Female Male Total 1970 2006 1970 2006 1970 2006 -------------------------------------------- % --------------------------------------------- 0 - 19 years 30 24 32 25 31 25 20 - 39 years 27 25 28 31 28 26 40 - 59 years 24 28 24 26 24 28 60 years and over 19 23 16 18 17 21 Total 1000 persons

100 2474

100 2704

100 2432

100 2686

100 4906

100 5427

Page 35: Guidelines for Writing Papers in De

29

you get in order to present them into the form you want.

Tabel 5.1.4. Number of persons in Denmark, by gender and age, January 1, 1972 and 1992.

Source: As in Table 5.1.2.

In summary, a good table fulfils two requirements. First, it meets the technical criteria just

described. Second, it presents the material in such a way that the relevant patterns become

obvious. This second point concerns not only the calculation of per cent values and/or index

numbers, but refers also to an optimal degree of detail. In the previous tables, the material was

divided into four age groups. If you were interested in dependence of public sector expenditures

on shifts in the age distribution of the population, another division of age groups might be better.

However, you should be careful not to use too much detail; relevant patterns can drown in detail.

A good table is clear.

5.2. Figure construction

Above all, it should be pointed out that a good figure, like a good table, presents the material

clearly. The strength of a figure is that the pattern in the material appears more obvious than in

a table, in many cases. This applies especially if one is working with a data series extending

over a long period of time. If one wants to compare several data series that only cover 10 years,

for example, a figure will also provide greater clarity than a table, as a rule.

The next section presents the technical criteria for a figure, while the following sections

present various figure configurations.

1970 2006 Female Male Total Female Male Total ------------------ 1000 ------------------- ------------------ 1970 = 100 ------------- 0 - 19 years 743 780 1523 87 87 87 20 - 39 years 667 689 1356 105 103 104 40 - 59 years 593 576 1169 127 133 130 60 years and over 472 388 859 136 135 135 Total 2474 2432 4907 111 110 111

Page 36: Guidelines for Writing Papers in De

30

5.2.1. Technical criteria

In an empirical paper, the curve is the most popular type of figure used for picturing one or

several sets of data. That is why the curve will be used as the example for describing the

technical criteria associated with figures.

The figure must have a title and source identification analogous to that used for tables. The

title can, of course, be placed in other locations, but the placement on the top of the figure is

most often used. Any notes to be added are usually placed under the axis of the abscissa (x-

axis).

Axis labels must be included unless what is being measured on the axes is obvious. For

example, one does not need to include the word "year" when 1980, 1981, etc. appears on the

axis (this is most often the x-axis). Axis labels contain often a scale, for example, 1000 persons

or millions of $. Also, it is too cumbersome to include too many zeros on the axes. Instead of

writing 100,000, 200,000, etc. on the axis, it is better to use 100, 200, etc. and include 1000 on

the axis label. The scale can also be placed in the title.

Labels to the curves indicate what the individual curves represent (here A and B) and can be

placed in several places. The placement at the end of the respective curves is preferred in most

cases. In some cases, the curves run nearly together at the end, and so a placement at the end of

the curves can be problematic. You can, instead, place a symbol on the curves at a place where

the curves are clearly separate from each other. Labels to the curves can also be placed under the

x-axis. However, this placement reduces the clarity a bit, especially if the figure includes several

curves. One needs to remember the symbols for these representations to be able to read the

figure.

Figure 5.2.1 consists of two figures showing the development in the number of persons

employed in Denmark, where the ordinate axis (y-axis) begins at zero in the upper figure and at

2300 in the lower figure. It seems clear that the curves are very different in the two cases. You

might be inclined, therefore, to comment that the curves are different. In the upper figure, you

might note the smaller swings in employment, but would focus more on the rising trend, while

in the lower figure, you might be caught by the swings in the curve, for example, the increasing

employment from 1983 to 1987 as well as from 1993 to 2001. Which presentation is best

depends on the issue one is investigating. Normally, it is best to include the point of origin so as

Page 37: Guidelines for Writing Papers in De

31

not to exaggerate the swings in the data. But in certain cases, even smaller swings can yet be

essential for the issue under investigation. If this is the case, these swings of course should be

brought out by ignoring the point of origin.

Figure 5.2.1. Number of persons employed in Denmark, 1966-2006, 1000 persons.

0

500

1000

1500

2000

2500

3000

1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

2300

2400

2500

2600

2700

2800

2900

1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

Source: www.statistikbanken.dk/NAT18.

Page 38: Guidelines for Writing Papers in De

32

A figure must not be overcrowded, cf., the demand for clarity. In Figure 5.2.2, the world's oil

reserves are indicated by country or country group in a bar diagram. But this representation

makes it difficult to overview the material. And the overview is not much better in Figure 5.2.3,

where the bars from Figure 5.2.2 for the countries for each year are constructed on top of one

another. This means that the development associated with those countries in the middle of the

columns is difficult to read.

Figure 5.2.2. World oil reserves, by country or country group, 1982-2005, end of year.

0

5

10

15

20

25

30

35

1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

%

Saudi A. Iran Iraq Kuwait The Emirates Venezuela Non-OPEC Other OPEC countries

Source: www.opec.org – Annual Statistical Bulletin.

Page 39: Guidelines for Writing Papers in De

33

Figure 5.2.3. World oil reserves, by country or country group, 1982-2005, end of year.

0%

20%

40%

60%

80%

100%

1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004

Saudi A. Iran Iraq Kuwait The Emirates Venezuela Non-OPEC Other OPEC countries

Source: As in Figure 5.2.2.

Figure 5.2.4 shows the development in labour productivity as measured in eight sectors. The

figure appears burdened. It does not help that the rather complicated key to the curves has to be

placed under the x-axis because of space limitations. This placement makes the figure even less

clear.

Page 40: Guidelines for Writing Papers in De

34

Figure 5.2.4. Labour productivity (GDP in 2000 basic prices per worker) for the main sectors of the economy, 1966-2006.

0

100

200

300

400

500

600

700

800

900

1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

1000

DK

K./w

orke

r

All sectors Agriculture, fishing and quarryingManufactoring ConstructionTrade, hotels, restaurants Transport, storage and communicationFinancial intermediation, business activities Public and personal services

Source: www.statistikbanken.dk/NATo7 and NAT18.

5.2.2. Logarithmic scales

In the previous sections, examples of curves were used to illustrate the development of one or

several data series (time series), where individual observations were connected into straight

lines.6 As mentioned earlier, curves are the figure used most often. Curves drawn with a

logarithmic scale on one axis (also called semi-logarithmic) are discussed in this sub-section.

The use of the logarithmic scale implies that equal linear distances on the axis correspond to

6 Data can be represented either as flow variables or as stock variables. Flow variables concern a period of time, for example, the number of births during 2006, while stock variables concern a value as of a certain point in time, e.g., population January 1, 2006. Actually, when plotting the point for the number of births in 2006 in a figure, the value for 2006 ought to be identified with the middle of the year, i.e., July 1st, but this is not the custom. More usually, the number of births for all of 2006 is indicated on the x-axis. Corresponding indications are used for stock variables in

Page 41: Guidelines for Writing Papers in De

35

equal percentage changes on a normal scale. This implies that curves with the same slope have

the same growth rate, as measured in per cent.

Figure 5.2.5 illustrates three cases of varying slopes, where each case represents two curves

drawn with the same orientation. In all three cases, the curves run parallel, which means that the

two time series depicted by the curves have the same percentage growth rate. In the first case,

the curves are straight lines, have constant slopes, and have, therefore, constant growth rates.

This is, in other words, an exponential function growing by a constant per cent from period to

period. In many cases, it can be expedient to calculate the growth rates and include them in the

accompanying discussion.7 Seen over the long run, many time series follow approximately an

exponential course, which can be depicted on a semi-logarithmic scale.

Figure 5.2.5. Logarithmic scale and growth rates.

Log

t

Case 1 Caee 2 Case 3

Growth rates: Growth rates: Growth rates:

Identical and constant

Identical and increasing

Identical and decreasing

In the second case, the increasing slopes of the curves mean increasing growth rates; and in the

third case, the growth rates fall with time. It is apparent that the logarithmic depiction is

especially useful when you wish to compare the growth rates of different time series.

A second advantage with semi-logarithmic scales is illustrated in Figure 5.2.6. In the upper

figure, a normal scale for GNP per capita is used; in the lower figure, a semi-logarithmic scale

that one as a rule includes the correct date in the title. You could write "Population of Denmark, January l, 2006" and plot the population value on the x-axis. 7 yt = yo(1+r)t indicates that y grows exponentially with the growth rate r. yt is the value at time t, yo is the value at time o and t is the number of time periods between o and t. When yt yo and t are known, r can of course be determined from the equation by isolating r. It is, however, more normal to use a PC or calculator, where r is found simply by keying in the three known values. The known rules for logarithims transform the previous equation to log yt = log yo + tlog(l+r), the equation for the straight line represented in the first case, where log yo is the intercept on the y-axis and log(1+r) is the slope.

Page 42: Guidelines for Writing Papers in De

36

for GNP per capita is used. In the upper figure countries are added by a textbox. In many cases,

the location of countries will be of interest. A trendline, the equation of the trendline, as well as

the corresponding R2 value are displayed on the chart.

Figure 5.2.6. Total fertility rate as related to GNP per capita world-wide, 2004.

R2 = 0.4641

0

1

2

3

4

5

6

7

0 10000 20000 30000 40000 50000 60000

GNI per capita, PPP

Ferti

lity

rate

, tot

al

Saudi Arabia

KuwaitIsrael

Luxembourg

Hong Kong, China

Denmark

Russia

China

0

1

2

3

4

5

6

7

100 1000 10000 100000

GNI $ (PPP) per capita (log scale)

Ferti

lity

rate

, tot

al

y = -0,8131Ln(x) + 9,6942

R2 = 0,4641

Source: The World Bank: World Development Indicators.

Page 43: Guidelines for Writing Papers in De

37

Many countries have a very small per capita income compared with that of western

industrialized countries. In a plot of observations of per capita income using a normal scale,

those for the poorer countries will lie in a large dump close to the y-axis. If the plot is instead

made using a logarithmic scale, the clump will dissolve, and the material will stand more

distinct.

Therefore, the semi-logarithmic scale is often better than a normal scale when comparing

numbers that differ by magnitudes. This advantage is illustrated even more clearly in Figure

5.2.7. The curve of employment in the electricity, gas, and heat sector is nearly one with the x-

axis in the upper figure, while the corresponding curve is clearly represented in the lower figure

and lies distinctly separate from the x-axis.

The lower part of Figure 5.2.7 illustrates, in addition, that total employment was constant after

1966. However, this constant level of employment does not appear in the upper figure. This

illustrates the weakness of the semi-logarithmic scale. It is often not useful for illustrating

smaller changes that can be important in the investigation of certain issues.

Page 44: Guidelines for Writing Papers in De

38

Figure 5.2.7. Employment in the electricity, gas, and water sector and in Denmark in general, 1966-2006.

0

500

1000

1500

2000

2500

3000

1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

1000

per

sons

Total

Electricity, gas and w ater

1

10

100

1000

10000

1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006

1000

per

sons

(log

sca

le)

Total

Electricity, gas and water

Source: As in Figure 5.2.4.

Page 45: Guidelines for Writing Papers in De

39

5.2.3. Bar and pie diagrams

In bar diagrams, data is represented by a column or part of a column. There are two types of bar

diagrams. One type uses qualitative variables and has no scale. Municipalities, gender,

countries, etc., are qualitative variables, where specific values for gender are male/female,

specific values for countries are Saudi Arabia, Iran, etc. Normally, a ratio (e.g., male/female) is

measured along the y-axis and the qualitative variable is measured along the x-axis. Because

there is no scale along the x-axis, the bars can be placed freely, and one uses normal room

between the bars to create clarity. The bars can also be placed as extensions of each other or

next to each other.

The other type of bar diagram is called a histogram and is used for illustrating quantitative

variables. Quantitative variables are age, income, length of marriage, company profits, size of

farms, etc. The quantitative variable is divided up into intervals placed on the x-axis only after

the scale has been determined. Next, the unit intervals must be chosen. This can be seen in the

following example.

Table 5.2.1. Number of divorces, by length of marriage, 2005.

Under 1 year

1 year

2 years

3 years

4 years

5 years

6-7 years

8-9 years

10-14 years

15-19 years

20-24 years

25 yearsand over

Total

169 568 872 1088 1277 1107 1763 1416 2816 1832 1008 1383 15299 1) Excluding 1 case for which duration of marriage was not given. Source: www.statistikbanken.dk/SKI107.

In Table 5.2.1, the intervals have different widths. This is not accounted for in the upper part of

Figure 5.2.8 which gives the impression that a large number of marriages ended in divorce after

10-14 years of marriage.8 This is not the case. The problem is that the width of that interval is not

consistent with the unit interval (which is one year) and therefore overrepresents the importance

of divorces for the time period. By plotting instead the number of divorces per year of marriage

as in the lower part of Figure 5.2.8, the correct picture of the relationship between the two

variables will be made. It can be seen that the greatest number of divorces occurs after four years

8 The interval measures marriages that have lasted 14.999 years; that is, the interval measures up to, but does not include, the 15th year.

Page 46: Guidelines for Writing Papers in De

40

of marriage.9

Figure 5.2.8. Number of divorces, by length of marriage, 2005.

0

500

1000

1500

2000

2500

3000

0 5 10 15 20 25 30

Duration of marriage, years

Num

ber o

f div

orce

s

0

200

400

600

800

1000

1200

1400

0 5 10 15 20 25 30

Duration of marriage, years

No.

of d

ivor

ces

pr. y

ear o

f mar

riage

25 years and over

Source: As in Table 5.2.1.

9 This conclusion should be taken with reservation because it is not clear how many marriages make up the basis for the divorce data. In other words, one must know the divorce rate for the individual years of the marriage's duration. to be able to determine how many years of marriage are associated with the breaking up of the greatest number of marriages.

Page 47: Guidelines for Writing Papers in De

41

The frequency in a histogram is determined by the area of the bar; the number of divorces in the

interval 10-14 years is the column height (563.2) multiplied by the ratio of the interval width to

the unit interval (5/1) which is equal to 2816. The column height can be calculated as the

frequency divided by the number of times the interval width is larger than the unit interval.10

It is clear that changing the widths of intervals should not change a figure completely when

wishing to illustrate the relationship between two variables. The figure should look like what one

would have if all the information about the number of divorces occurring for each length of

marriage were available. This ideal is not entirely fulfilled when the distribution is skewed.

Figure 5.2.8 is skewed to the right (the tail is on the right). It must, then, be assumed that there

are more observations in the first half of an interval (prior to the peak) than in the last half of an

interval.

The frequency for the interval at 25 years and over is expressed as a line in the upper part of

Figure 5.2.8 without closure to indicate that the interval is open. One could choose to close the

interval at, for example, 50 years and then calculate the height of the column by dividing the

number of divorces by 25. The open interval can also be represented as a rectangle placed

appropriately in the figure, as in the lower part of Figure 5.2.8, where a rectangle is drawn in

corresponding to 1383 divorces. This area can be immediately compared with the other areas in

the figure.

Another example is the number of tax-paying persons, arranged according to size of taxable

income, as in Table 5.2.2. Income is a quantitative variable, so the frequency will also be plotted

by unit interval, chosen to be 25,000 DKK in Figure 5.2.9.

10 The various interval widths make it a little difficult to use a graphics programme. One can use a scatter diagram which represents the relationship between two variables with points, cf. Section 5.2.4. The points are used to draw the columns and the points are erased after the columns are drawn in.

Page 48: Guidelines for Writing Papers in De

42

Table 5.2.2. Number of tax-paying persons, by size of taxable income, 2005

Income, DKK 1000

persons

Total income

mill. DKK

No. of persons

%

No of persons

accumulated%

Income Acc., %

0,5B(A+(A+C))

< 25,000 312 1,975 7.1 7.1 0.2 0.71 25,000 - 49,999 125 4,625 2.9 10.0 0.8 1.45 50,000 - 74,999 177 11,150 4.0 14.1 2.2 6.00 75,000 - 99,999 355 31,463 8.1 22.2 6.1 33.62 100,000 - 124,999 502 57,243 11.5 33.7 13.2 110.98 125,000 - 149,999 495 67,801 11.3 45.0 21.7 197.19 150,000 - 174,999 426 69,008 9.8 54.8 30.3 254.80 175,000 - 199,999 381 71,500 8.7 63.5 39.1 301.89 200,000 - 224,999 365 77,477 8.4 71.8 48.8 369.18 225,000 - 249,999 309 73,302 7.1 78.9 57.9 378.79 250,000 - 299,999 414 112,722 9.5 88.4 71.9 616.55 300,000 - 349,999 202 64,966 4.6 93.0 80.0 349.37 350,000 - 399,999 107 39,904 2.5 95.4 85.0 206.25 400,000 and over 199 120,968 4.6 100.0 100.0 425.50 Total 4369 804,103 100.0 3252.26

Source: Statistikbanken.dk/IF13 and IF23.

There are a number of persons whose taxable income equals zero. This is without doubt the most

typical income to the extent the material is divided up by very small income intervals. Here, an

open interval is used for income levels at 25,000 and under.

The interval 250,000 to 299,999 DKK is two times the unit interval. This means that the

column height in Figure 5.2.9 is only 207, rather than 414, as is shown in the table. The figure

shows a sharp fall in the column height after 250,000 DKK. There is hardly doubt that there are

more taxpayers between 250,000 and 274,999 than between 275,000 and 299,999, so the figure

is drawn somewhat incorrectly, cf., the discussion of the examples regarding length of marriage

and divorce. Therefore, material should be reported with as small an interval width as possible. If

the same interval width is used throughout, it implies that the unit interval is equal to the width

of the interval. There is no need to list the unit interval on the axis label in this case. The

remaining data in the table will be used in section 5.2.5.

Page 49: Guidelines for Writing Papers in De

43

Figure 5.2.9. Number of tax-paying persons, by size of taxable income, 2005.

0

100

200

300

400

500

600

0 50 100 150 200 250 300 350 400 450

Taxable income in 1000 DKK

1000

per

sons

, uni

t int

erva

l 25,

000

DK

K

Source: As in Table 5.2.2.

Population pyramids are a special form of bar diagrams, where the frequency is placed on the x-

axis instead of on the y-axis. Circle diagrams are used for illustrating per cent distributions.

Figure 5.2.10 uses a circle diagram to show the distribution of global oil reserves among the

eight countries or country groups used in Figures 5.2.2 and 5.2.3.

For clarity's sake, it is recommended that labels for the individual sections of the circle be

written in proximity to the respective areas, as in the upper part of the figure, instead of written

elsewhere, as in the lower part of the figure. In addition, you are warned against using too much

detail in the circle diagram and against using too many of them. It is apparent that 10 circle

diagrams or more in a paper to illustrate the development in global oil reserves since 1960 is not

at all sensible. One table or two curve diagrams (four countries/country groups in each diagram)

would be much more preferable.

Page 50: Guidelines for Writing Papers in De

44

Figure 5.2.10. World oil reserves, by country, 2005, in per cent.

Saudi Arabia

Iran

Iraq

KuwaitThe Emirates

Venezuela

Non-OPEC

Other OPEC countries

Saudi Arabia

Iran

Iraq

Kuwait

The Emirates

Venezuela

Non-OPEC

Other OPEC countries

Source: www.opec.org – Annual Statistical Bulletin.

Page 51: Guidelines for Writing Papers in De

45

5.2.4. Scatter diagrams

As mentioned earlier in the section on formulating the statement of the problem, one normally

includes explanatory material in a paper. This is often done by comparing the base material (the

explained variable) and the explanatory variable in a curve, where time is on the x-axis. One

looks for a pattern in the material that indicates a relationship between the two variables.

In many cases, this pattern can be illustrated using a plot of the values (most usually values

from the same year) of the explained variable and the explanatory variable in a so-called scatter

diagram.

The scatter diagram is used in Figure 5.2.6 with GNI per capita as the explanatory variable

and fertility as the explained variable. Of course, a lot of variables are correlated with income.

5.2.5. Lorenz curves

GNI per capita differs significantly among the countries of the world. This variation or

skewness in global income distribution can be illustrated using a histogram, where GNI per

capita is divided up into intervals, and the number of countries falling within the individual

intervals is, then, the value that determines the size of the columns. One could choose to let the

value for each country be determined by population size and assume that all persons in a given

country have an income corresponding to the respective country's GNI per capita. In this case,

China, with a population approximately 225 times that of Denmark, would be represented by a

column that is 225 times larger than that of Denmark.

The skewness in a distribution can also be illustrated in a Lorenz curve, as in Figure 5.2.11.

The Lorenz curve is constructed in the following way: First, the countries are arranged

according to size of GNI per capita. Next the countries' per cent share of world GNI and world

population are calculated, respectively, and, after that, accumulated. Finally, the two

accumulated per cent shares are plotted in a scatter diagram. From this, one can read how large a

share of the world's GNI accrues altogether to the poorest 50% of world population. The dotted

line from 50% on the x-axis to the curve indicates on the y-axis that the poorest 50% receive

approximately 5% of the world's GNI. One can also read from the curve that the poorest 80% of

the population receive about 16% of the world's GNI. This means that the distribution of the

Page 52: Guidelines for Writing Papers in De

46

world's GNI is very much skewed. Note that no consideration is taken for the spread of income

within the individual countries.

Figure 5.2.11. GNI, by world population, 2004, Lorenz Curve.

% of GNI, accumulated

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

% of population, accumulated Source: The World Bank: World Development Indicators

A totally even income distribution means that the per cent share of GNI and population are the

same over the entire curve, as illustrated by the straight line (totally even distribution) in Figure

5.2.11. A totally uneven income distribution means that the person with the highest income has,

in fact, all the income. This distribution is represented by a horizontal curve following the x-axis

up to 100% and the vertical line from 100% to the top of the diagram. The closer the Lorenz

curve lies to the line illustrated totally even distribution, the more equal the distribution. By

drawing a Lorenz curve for several years of data, one can determine with a picture if the

direction of change has been toward greater global equality or the opposite.

Table 5.2.2 showed the distribution of taxable income for taxpayers and Figure 5.2.9 showed

Totally uneven distribution

Totally even distribution

Page 53: Guidelines for Writing Papers in De

47

a histogram of this distribution. The data can also be represented in a Lorenz curve, as in Figure

5.2.12.

Figure 5.2.12. Taxable income, by size of taxable income, 2005, Lorenz Curve.

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

% of taxpayers, acc.

% o

f tax

able

inco

me,

acc

.

A

B

C

Source: As in Table 5.2.2.

It is apparent that the distribution of data in Figure 5.2.14 is not nearly as skewed as the

distribution shown in Figure 5.2.13. 80% of the taxpayers with the lowest income received

approximately 60% of taxable income. In Figure 5.2.13, the corresponding amount was

approximately 16%.

It seems as if world income distribution is more skewed than income distribution in Denmark.

Such a conclusion should be taken with precaution given that different definitions of income

have most likely been used in calculating income distribution. In addition, the different income

definitions each have weariness that affects a true calculation of income distribution. A

substantial criticism can, in any case, be made against using GNI as a measure of prosperity, and

Page 54: Guidelines for Writing Papers in De

48

taxable income can be reduced when deductions are taken into account. The value of taxable

income does not take into consideration that income varies over lifetimes. Two persons who

have the same lifetime income will most likely receive largely different taxable incomes for a

given year.

The skewness in the distribution can also be illustrated using the the Gini coefficient which

measures the ratio of the area bounded by the line AB (totally even distribution) and the Lorenz

curve, divided by the area of the triangle ABC = 0,5*100*100 = 5000. The larger the Gini

coefficient, the larger the skewness in the distribution. It is apparent that an equal income

distribution results in a Gini coefficient of 0 and a totally unequal income distribution results in a

coefficient of 1. The Gini coefficient for the curve in Figure 5.2.12 is calculated to be 0.35. Calculation of the Gini coefficient.

Yi Yi

Xi

Gi

Start by calculating the area between the Lorenz curve and ACB (totally uneven distribution). Area income interval i = Yi * Xi + 0,5 (Xi*Gi) = 0,5Xi (Yi + (Yi + Gi)). Income interval 75.000 – 99.999 DKK: = 0,5 * 8.1 (2.2 + 6.1) = 33.62, cf. Table 5.2.2. Total area is then the sum of the area for all the income intervals = 3252.26, cf. Table 5.2.2. The area bounded by AB and the Lorenz curve = 5000 – 3252.26 = 1747.74. Gini coefficient = 1747.74/5000 = 0.35.

Deciles and quartiles are often used in income and wealth statistics. As with the construction

of Lorenz curves, taxpayers are arranged according to size of income. The first decile (10%

fractile) is the value of income below which 10% of the observations lie. The second decile

(20% fractile) is the income below which 20% of the observations lie. The first quartile

corresponds to the 25% fractile and the upper quartile corresponds to the 75% quartile. The 50%

Page 55: Guidelines for Writing Papers in De

49

quartile is also called the median. The median is equal to the taxable income of the person who

is located just in the middle of the distribution when the taxpayers are arranged according to size

of taxable income.

5.3. Using comparative and explanatory material

In discussing the formulation of the statement of the problem, it was seen that it is of great value

to put the base material in a context using comparative material, in many cases. Comparisons

can be carried out by assembling the base and comparative data in the same table or figure and

by using an accompanying text to highlight the differences in the two.

In an analysis of the industry structure in the County of Ringkøbing, it would be natural to

compare that with the industry structure in Denmark in general. This comparison could be made

based on the per cent distribution of employment within each industry, as in Table 5.3.1. But the

use of a simple calculation can often be an advantage when comparing distributions. In Table

5.3.1, the coefficients in the last column are calculated by dividing the per cent value for

Ringkøbing by the corresponding value for the whole country. If the coefficient is greater than

1, a relatively large number of persons are employed in the respective industry in the County of

Ringkøbing. The table shows that there are relatively many employed in agriculture, fishing and

manufacturing (especially manufactory of textile and leather) in the County of Ringkøbing

compared to Denmark as a whole.

In this example, two distributions of the same kind are compared, that is, employment

distributed by industry. But these relative coefficients can also be used with advantage when

relating distributions of different kinds to each other. In energy analyses, relative energy

intensities can be calculated for different industries by dividing the share of energy consumption

for the industry (as a proportion of that for all industries) with the share of production for the

industry. This results in a measure of the relative energy demand for the production of

individual industries.

No further comment of Table 5.3.1 will be made here. It should be pointed out, however, that

all material used in the paper must be processed and worked on so that the relevant comparisons

appear clearly in the report.

Page 56: Guidelines for Writing Papers in De

50

Table 5.3.1. Employment, by industry, County of Ringkøbing and Denmark, January 1, 2005.

Source: www.statistikbanken.dk

In the previous discussion on causal analysis (and in that on formulating the statement of the

problem), it was shown that an event or an effect is normally caused by a long series of factors.

If the wish is to determine the relationship between C and E, it is apparent that the background

factors B should be seriously considered. An example was given where C was occupation and E

was mortality. The B's were then, other factors that had contributed to death, such as gender,

age, alcohol misuse, etc.

Denmark Ringkøbing Relative coefficient

------------------------ % ------------------- Agriculture, horticulture and forestry 3.12 6.01 1.93 Fishing 0.16 0.69 4.44 Mining and quarrying 0.14 0.05 0.39 Manufactory of food, beverages and tobacco 2.72 4.36 1.61 Manufactory of textiles and leather 0.37 2.51 6.78 Manufactory of wood products, printing and publication

2.10 3.85 1.83

Manufactory of chemicals and plastic products 1.87 1.45 0.77 Manufactory of other non-metallic mineral products 0.57 0.62 1.09 Manufactory of basic metals and fab. of metal products

6.19 9.78 1.58

Manufactory of furniture, manufacturing n.e.c. 0.99 1.94 1.96 Electricity, gas and water supply 0.53 0.43 0.81 Construction 6.27 6.15 0.98 Sale and repair of motor vehicles sales of auto. Fuel 2.25 2.37 1.05 Wholesale except of motor vehicles 5.80 6.48 1.12 Retail trade and repair work exc. of m. vehicles 6.96 6.93 1.00 Hotels and restaurants 3.07 2.43 0.79 Transport 4.28 3.36 0.79 Post and telecommunications 1.87 1.13 0.60 Finance and insurance 2.71 1.96 0.73 Letting and sale of real estate 1.67 1.25 0.75 Business activities 9.74 5.72 0.59 Public administration 5.46 3.89 0.71 Education 7.54 6.37 0.85 Human health activities 5.81 4.57 0.79 Social institutions etc. 12..08 11.27 0.93 Associations, culture and refuse disposal 5.29 4.07 0.77 Activity not stated 0.44 0.37 0.83 Total 100.0 100.0 1.0

Page 57: Guidelines for Writing Papers in De

51

The background factors must be taken into account when clustering the data. That is, a

comparison is made of groups that are equivalent with respect to background factors. In the

example with mortality, you would, then, compare mortality between occupations u and x for

those groups that are equivalent with respect to gender, age, or alcohol misuse. If the most

important B's are included in the clustering, the remaining difference in mortality can probably

be ascribed to occupation.

Clustering is based simply on a division of the total into groups. This could be called an

additive analytical method. Deaths are divided up into groups in which different values of the

characteristics relevant for an analysis of mortality are represented. If you are to analyze energy

consumption, it would be natural to divide this consumption up into sectors or purposes

consisting of sub-sectors or sub-purposes in which development in consumption is dependent on

the same explanatory factors. Industries' energy consumption is dependent on factors other than

factors affecting energy consumption in the heating of private households. Industries' energy

consumption is, to a great extent, dependent on industry production, while private household's

energy consumption is very much dependent on explanatory factors such as disposable income,

the relative price of energy, etc.

The multiplicative analytical method can be used to advantage for other problems being

investigated. If you are analyzing petrol consumption, it is natural to include the following

explanatory factors: petrol consumption/mile, miles/car, number of cars/GDP (constant prices)

and GDP (constant prices). All these factors multiplied by each other result in petrol

consumption. The first factor is a measure of energy intensity for petrol-driven cars. This factor

is dependent on petrol prices, among other things. The higher the price of petrol, the higher is

the willingness to drive in cars that get high mileage per gallon.

The second factor is a measure of how much the car has been used. This factor can also be

assumed to be dependent on petrol prices and disposable income. The third factor links the

number of cars together with the current measure for economic development. An increasing GDP

will, ceteris paribus, result in a larger fleet of cars. These explanatory factors can be compared in

a table or figure with data for a series of years, and one can calculate the individual factor's

contribution to the change in petrol consumption.

The multiplicative method of analysis is used also in the calculation of the standardized mean,

which is discussed in the next sub-section.

Page 58: Guidelines for Writing Papers in De

52

5.4. Standardizing means

Let us say the result, E, can be explained by the multiplication of two factors, B1 and B2' and one

wishes to know how much one factor influences the result when the other factor is held constant

(i.e., the calculation is standardized).

In a note to Table 5.4.1, it is indicated that the proportion of women employed in manufacturing

in 1979 was somewhat larger in the County of Ringkøbing than in the rest of Denmark. The

differences in the proportion of women employed can partly be explained by the differences in

manufacturing structure and partly by the differences in the proportion of women in the various

manufacturing branches. The data can be clustered using the proportion of women employed in

the individual manufacturing branches for Denmark as the standard, as in Table 5.4.1. It is

calculated, then, how many women would be employed in the County of Ringkøbing if the

manufacturing groups in Ringkøbing employed the same proportion of women that the

manufacturing groups in Denmark employ in general. This calculation yields 11,635 women,

corresponding to a proportion of woman equal to (100 x 11,635/35,415) 32.9. But the actual

share of women was 29.9.11 That is, the proportion of women employed in the individual

branches of manufacturing in the County of Ringkøbing was lower than that for those branches

in the rest of Denmark. It is interesting to note, however, that the County of Ringkøbing is the

industrial centre for industries in which the share of women employed was (is) high, for

example, the textile, and leather industry.

11 Statistisk Årbog 2006, SÅ 2005 Table 114.

Components of the standardized mean

B1

B2

E Result

Page 59: Guidelines for Writing Papers in De

53

Table 5.4.1. Proportion of women employed in manufacturing, by sector, 2005.

1. Number of employed women x 100/total employed. 2. The share of women employed in all of Denmark multiplied by the number of men and women employed in Ringkøbing/100. Source: Statistikbanken.dk

An analogous example could be made for the changes in energy intensity. Using data from 1973

through 2005, energy intensity is measured by energy consumption divided by GDP in constant

basic prices. The change in the total intensity can partly be explained by changes in the intensity

for various industries and partly by the shift in the relative significance of industries, as measured

by their contribution to GDP at basic prices. GDP at basic prices, distributed by industries in

1973, can be chosen as the standard. By multiplying this standard by the energy intensities in

2005, the energy consumption can be calculated as if there had not been changes in industries'

contribution to GDP at basic prices from 1973 to 2005. The calculated energy consumption for

2005 is then compared with the actual energy consumption for 1973, and if the calculated

consumption is less than the actual, the energy intensity for industries has fallen during the

period. The calculated consumption can be expressed in per cent of the actual consumption, and

the conclusion could be that energy consumption has fallen by x% as a result of the industries'

falling energy intensity.

It might be easier to sketch the explanatory factors and the calculated results in the following:

Proportion of women

Denmark1

Total numbers employed in county of

Ringkøbing

Calculated proportion of employed women in the country of Ringkøbing2

Mining and quarrying Food, beverages and tobacco Textiles and leather Wood products, printing and publication Chemicals and plastic products Other non-metallic mineral products Basic metals and fabric metal products Furniture, manufacturing n.e.c.

13.0 40.6 54.0 32.3 41.6 18.5 23.8 33.2

78 6287 3626 5545 2088 898

14095 2798

10 2550 1959 1792 869 166

3360 929

Total 31.4 35415 11635

Page 60: Guidelines for Writing Papers in De

54

By using the above diagram, it should be clear what the differences are and which factors are

responsible for these differences. When comparing the results in the upper row, the standard is

GDP at basic prices distributed by industries in 1973. The difference in results should, then, be

assigned to the other factor, here energy intensity for the various industries. Energy intensity

could also be used as the standard and a calculation could be made for how much energy

consumption changed as a result of the development in GDP at basic prices.

A corresponding diagram can be worked out for the proportion of women employed in

industry in the County of Ringkøbing and in Denmark in general.

An analogous calculation of standardized means can be made using population statistics. The

total fertility rate is dependent on both the inclination of women to give birth as well as the

number of women in the child-bearing years. When measuring women's fertility, you would need

to neutralize the age structure. Said in another way, the age structure must be standardized when

illustrating the development in fertility. This is done by calculating total fertility for 1000 women

going through the child-bearing years. This measure is, then, independent of the number of

women in the child-bearing years.

The national accounts contain a standardized share of wages. The total wage share (total

wages/GDP at basic prices) is dependent on both the wage share for the individual industries as

well as the industry structure. By standardizing the industry structure, the development in the

wage share can be illustrated.

There are many examples in which the final value is the result of the multiplication of two

factors. In all of these examples, standardized calculations can be used to advantage.

Energy intensity

1973 2005

1973 Actual CalculatedGDP at basic prices distributed by industries

2005 Calculated Actual

Diagram for standardized means

Page 61: Guidelines for Writing Papers in De

55

5.4.1. Price and quantity (volume) indexes

The most frequently used techniques for standardization appear in connection with the

calculation of price and volume indexes. Movements in values depend on both price and quantity

changes. It is interesting to know, for example, if increased domestic consumption is caused by

both increased consumer prices and increased consumption in terms of quantity. If you wish to

isolate the pure price movement, the quantities must be used as a standard. In this section, index

and other related calculations will be demonstrated.

The price index represents a total expression of the movement in prices for several goods or

services. In the following, the discussion is limited to goods only. The problem with index

calculations is how to determine the weights that appropriately represent price movements for

individual goods used in the summary price index. The weight problem is solved in different

ways in the following three, most popular index formulas.

Laypeyres price index: , ,

:, ,

100

i t i oLA i

t oi o i o

i

p qP

p q

×= ⋅

×

∑∑

The budget method: i = l,....,m, p = prices, q = quantity, and Bi,o = the budget share for good i in year o.

The numerator indicates the expenditure on the quantities of m goods bought in the index base

year (year o) valued at the prices in the final year (year t). When this is expressed in relation to

the same quantities valued at the prices of the base year of the index (the denominator), the result

is the price increase for the m goods from year o to year t.12 The Laspeyres index uses the

quantities of the base year of the index as the standard, and this means that this index measures

price movements for a fixed goods combination.

The index can also be calculated using the (equivalent) budget method, in which the price

increase for the individual good is weighted by the share of the expenditure on that good in the

budget for the base year of the index. The greater the weight given the good consumed, the

, , ,,

, , ,

100,

i t i o i o

i o i,oi i o i o i o

i

p p qB B

p p q×

= ⋅ ⋅ =×∑ ∑

Page 62: Guidelines for Writing Papers in De

56

stronger is the representation of the price increase of this good in the consumer price index.

Paasche price index: , ,

:, ,

100

i t i tPA i

t oi o i t

i

p qP

p q

×= ⋅

×

∑∑

The Paasche price index uses up-to-date weights, which means the weights derive from the

current time period. The calculation of the Laspeyres and Paasche indexes are illustrated in Table

5.4.2.

Table 5.4.2. Calculation of the beer and wine index.

Beer Wine Year q p q p :

LAt oP :

PAt oP

0 100 4 50 10 100 100 1 75 4 75 10 100 100 2 100 4 50 12 111 111 3 135 4 30 12 111 107 4 50 4 100 12 111 117

Table 5.4.2 shows that the two indexes do not react on pure quantity changes, e.g., the first year.

The table also shows that the two indexes are identical when no quantity changes have taken

place, e.g., in the second year. This relationship is evident when the two indexes are identical,

i.e., when qo and qt are identical. The table further shows that the price increase calculated by the

Laspeyres index is larger than that calculated by the Paasche index when consumption of the

good that has become relatively cheaper increases (beer consumption increases relative to wine

consumption; the beer price has fallen relative to the wine price), e.g., the third year.

Normally, consumption of a good will increase relative to the consumption of all other goods

when the price of that good decreases relative to the prices of all other goods. The Laspeyres

index does not take into consideration this substitution that takes place when relative prices

change. This results in a numerator that is too high because prices and quantities relate to

different years. Therefore, the index overestimates the real price increase.

In contrast, the Paasche index underestimates the real price increase (the denominator is too

high). Table 5.4.2 shows, however, that the two indexes "exchange places" when consumption

increases relatively for the good for which the price has increased relatively, e.g., the fourth year.

12 The ca1cu1ation can, of course, be made on a period less than l year. The time dimension given in the example is used for pedagogical reasons.

Page 63: Guidelines for Writing Papers in De

57

It should be stressed that the Laspeyres index only overestimates, and the Paasche index only

underestimates, the real price increase when normal substitution takes place, ie., away from the

good that has become relatively expensive.

Table 5.4.3 shows the calculation of the price movement from 1996 to 2006 using three

components of the consumer price index, using the Laspeyres formula.

Table 5.4.3. Calculation of the "housing index".

1) Year 2000=100.

25.130

10093

10922.650.747.22

22.681

12122.650.747.22

50.791

11622.650.747.22

47.221996:2006

=

⋅⎥⎦⎤

⎢⎣⎡ ⋅

+++⋅

+++⋅

++=LAP

Source: www.statistikbanken.dk

In an empirical paper, it can often be necessary to calculate partial indexes of the price index.

The calculation of these partial indexes is normally made using the budget formula of the

Laspeyres index in that the prices are indexed and the share of the budget is provided. These

elements are sufficient for calculating the price index. In the budget formula, you only need to

know the relative price (pt/po) and not the individual prices in the two years.

In Table 5.4.3, the weights are not taken from the base year of the index, the year in which the

index equals 100. The weights derive from the values in 2003. When the year from which the

weights are taken for the index lies between the base year and the most current year in the data

series, it is not possible to claim that the index overestimates or underestimates the real price rise

when substitution takes place.

Danmarks Statistik changes the weight basis used in calculating the Laspeyres index on a

continuous basis, and they also change the base year of the index once in a while. If you use a

price index in a paper covering a longer period of time, a linkage between the indexes will often

be necessary. This type of linking is illustrated in Table 5.4.4.

If you want the price index for 2006, using 1990 as the base year of the index, you must first

calculate the index for the year in which the link is being made (2003), with 1990 set equal to

Weight distribution Consumer price index1)

2003, % 1996 2006 Rent housing 22.47 91 116 Electricity and fuel 7.50 81 121 Furniture, furnishings, households service, etc. 6.22 93 109

Page 64: Guidelines for Writing Papers in De

58

100. Next, the index for 2006 is calculated, with 2003 set equal to 100. Finally, the two indexes

are multiplied, yielding the price rise from 1990 to 2006.

Table 5.4.4. Linking consumer price indexes.

1990 2003 1980=100 177.4 234.6 1990=100 100 132.3 (234.6*100/177.4) 2003 2006 2000=100 107.0 112.3 2003=100 100 104.9 (112.3*100/107.0) 1990 2006 1990=100 100 138.8 (132.3*104.9/100) Source: www.statistikbanken.dk

Since the Laspeyres index normally overestimates the real price rise and the Paasche index

normally underestimates it, it seems natural to calculate an index that lies between the two

indexes. One such intermediate index is the Fisher index, which calculates a geometric average

of the two other indexes.

Fisher price index: : : : FI LA PAt o t o t oP P P= ×

The Fisher index is used for calculating export and import price indexes in trade statistics. Data

from these statistics are used for creating Table 5.4.5.

Table 5.4.5. Denmark's import of new petrol-driven cars from Germany, by motor size, 2000-2005.

2000 2005 p05 * q00 p00 * q05 Quantity Value

1000 DKK

Price (DKK)

Quantity Value 1000 DKK

Price (DKK)

1 * 6 3 * 4

1 2 3 4 5 6 1000 DKK ≤ 1000 cm3

1000-1500cm3

≥ 1500 cm3

88 4,127

446

4,115 214,668 153,552

46,762 52,015

344,287

131 9,493 1,071

3,637 492,911 389,033

27,770 51,924

363,242

2,444 214,289 162,006

6,126 493,782 368,731

Total 4,661 372,335 79,883 10,695 885,582 82,803 378,739 868,640

72.101100335,372739,378100

0000

000500:05 =⋅=⋅

⋅⋅

=∑∑

qpqp

P LA

Page 65: Guidelines for Writing Papers in De

59

95.101100640,868582,885100

0500

050500:05 =⋅=⋅

⋅⋅

=∑∑

qpqp

P PA

84.10195.10172.10100:0500:0500:05 =⋅=⋅= PALAFI PPP Source: www.statistikbanken.dk

The table shows a price increase of cars with motor sizes above 1500 m3 and a price decrease

elsewhere. If normal substitution occurred during the period, the import of cars with motor sizes

above 1500 m3 would fall relatively. This is not the case. The import of cars with motor sizes

above 1500 m3 makes up about 10% of the import measured in quantities in 2000 as well as in

2005. The import of small cars decreased from 1.8% to 1.2% of the import measured in

quantities even though the price decreased relatively much. An abnormal substitution has taken

place. The Paasche index, therefore, increased just a little bit more than the Laspeyres index.

A price index based solely on the import of new cars in total can be calculated using the

numbers from Table 5.4.5: 82,803 x 100/ 79,883 = 103.66. This index shows a larger price rise

than the other indexes, which analytically are the best. An "in total" price index is actually not a

proper price index because it is influenced also by quantity changes. The index is based on an

average price (total import value/quantity of imports) for one year divided by average price for

another year. Therefore, the quantities, as well as the prices, are from two different years.

This price rise of cars with motor size above 1500 m3 may not be real. Given the product

groups in the trade statistics, there has, perhaps, been a shift toward the most luxury cars. In

other words, no account has been taken for a shift within the individual product groups. The table

illustrates the quality problem in index calculations based on these statistics. The price rise can

be based on both price increases and quality changes.

Totally analogous to these price indexes, there are three corresponding quantity or volume

indexes. In these indexes, the prices are standardized:

, , , ,

: : : : :, , , ,

100, 100, i o i t i t i t

LA PA Fi LA PAi it o t o t o t o t o

i o i o i t i oi i

p q p qQ Q Q Q Q

p q p q

× ×= ⋅ = ⋅ = ×

× ×

∑ ∑∑ ∑

Using base year prices is a problem for periods far apart from the base year. Therefore, one

may give preference to chain indices as a measure of real changes in quantities.

Page 66: Guidelines for Writing Papers in De

60

Chain Laspeyres' volume index: :0 1:0 2:1 : 1.......LA LA LA LAt t tQ Q Q Q −= × × ×

The quantity indexes show the real changes in quantities, or the changes given constant prices.

Using the various index formulas, it can easily be shown that:

, ,

: : : : : : :, ,

100i t i t

PA LA LA PA Fi Fiit o t o t o t o t o t o t o

i o i oi

p qV P Q P Q P Q

p q

×= ⋅ = × = × = ×

×

∑∑

V is a value index that relates the value in year t to the value in year o. If you know V as well as

a price index, the quantity index can be easily calculated. For example, the Fisher price index is

calculated in Table 5.4.5 to be 186.3. V can be calculated in the following way: 885,582 x 100 /

372,335 = 237.85. The Fisher quantity index is then: 237.85 x 100 /101.84 = 233.55. If you find

the quantity change by considering only the number of cars (each car counting as l), the result is:

10,695 x 100/46.61 = 229.46, which lies below the result measured using the quantity index.

In the system of national accounts, the material is reported in both constant as well as current

prices. Dividing the value index by the quantity index produces the (implicit) price index. You

can calculate this implicit price index for many of the indexes presented in the national accounts.

If you know the value index and a price index, the quantity index can be calculated by

dividing the value index by the price index. Such a calculation is called deflating the index.

There is often a need to deflate in a paper because it is the real change or movement that is of

interest. If you have, for example, hourly earnings, income, or private consumption in current

prices and a Laspeyres price index, real quantity movements can be calculated, as in Table 5.4.6.

Table 5.4.6. Index of average hourly earnings in Danish manufacturing B nominal and real changes, 1989-2005.

.

1989

2005

Index of average hourly earnings:

1980=100 181 323 1989=100 100 178

Consumer price index:

1900=100 4142 5790 1989=100 100 140

Index of real hourly earnings (quantity index):

1989=100 100 1271

1) 178x100/140.

Source:www.statistikbanken.dk

Page 67: Guidelines for Writing Papers in De

61

Deflating must be made with careful thought! It is necessary, when deflating, to use a price index

that is relevant for the given relationship. The deflating used in empirical papers is often

unsuitable. For example, export values in the trade statistics are deflated using the consumer

price index. The calculated result cannot be interpreted meaningfully because consumer prices

are influenced by the price movements of goods and services that are not at all inc1uded in the

export of goods and services. Consumer prices are influenced also by indirect taxes, which also

do not affect export goods.

If the weight of goods in the consumption basket of employees in manufacturing industries

differs substantially from the weights used in the consumer price index, deflating with the

consumer price index can be a problem. If the prices for the goods weighted heavily in the

consumption basket of employees in manufacturing industries have increased relatively greatly,

deflation with the consumer price index will overestimate the movement in the real hourly wage

since this division is with a price index that has risen too little with respect to the consumption

choices of employees in manufacturing industries. A corresponding problem applies to retired

individuals. If the value of goods consumed by retired individuals is deflated using the consumer

price index, the result will most likely be incorrect since retired individuals have another

consumption pattern than that of the population in general.

Often, a price index will be used to deflate another price index to illustrate the relative or real

price movement. For example, a price index for oil can be deflated with a price index for

exported manufactures. Such an index can be interpreted meaningfully in that it shows the

movement in purchasing power for a barrel of oil measured in manufactured goods.

5.5. Analyzing time series data

Many economic indicators, such as GDP, are reported for a given time period. When these

values are available over several time periods, a time series is produced: observations over time

for a given variable, where the time distance between observations is identical. For example,

GDP is often discussed as if it were only available on a yearly basis, that is, that the time series

consisted of only one observation per year. However, some time series are available on a

quarterly, monthly, weekly, and daily basis, depending on the frequency with which the data is

collected.

Page 68: Guidelines for Writing Papers in De

62

For some economic data, the activities behind the data are carried out over a period of time,

and the measurement of the data concerns the activity for that entire period. GDP is one example

of this type of indicator. There will often be a lower bound with respect to the length of period.

For example, Statistics Denmark publishes quarterly data for GDP in addition to the annual data.

For other variables, you can imagine that observations relate to a particular time period, for

example, bond interest rates, currency rates, etc. where the price formation through "electronic

trades" occurs continuously. Data for economic variables, such as currency rates, will typically

appear as daily data. That is, an average of the day's prices or a price at a particular time (for

example, the currency rate at 12:00 p.m.) might form the basis for the respective observation

value.

5.5.1. The elements of a time series

In a time series, there is often dependency between the observed value in the current time period,

Xt, and the value in the previous period, Xt-1. This dependence must be analyzed when the

movement of a given economic variable is estimated over time. In general, it is likely that

fluctuations in economic time series can result from movements in one or more of the following

components:

Trend (T): Long-term movements in the respective variable. It can be either positive or

negative, i.e., the values of the variable are either increasing or decreasing in general.

Cycle (C): Movement over the course of the business cycle, i.e., normal1y over more than one

year, where peaks and troughs in a business cycles cause cyclical swings in a number of

economic variables. These swings are not necessarily 'even', i.e., the swings are not necessarily

identical in magnitude nor in duration. Swings that typically last several years can be difficult to

distinguish from a possible trend (you can work with a trend/cycle component; refer to the

following section of seasonal corrections).

Seasonal swings (S): Movements that repeat themselves within a given period (typically a

year), i.e., a particular pattern in variation emerges over a time period that is observed in other

periods too. A pattern observed in a time series based on monthly data will repeat itself every

year. For example, sales of certain vegetables are always greatest in the summer months, car

sales are largest in the spring months, etc. For these examples, you could possibly work with

Page 69: Guidelines for Writing Papers in De

63

quarterly or monthly data and still observe the seasonal variations over the year.

Moreover, you should consider the number of work days per year when economic conditions

are being analyzed. For example, if you have a time series with monthly data, you might choose

further to correct for the uneven number of (work) days in each month, given that the number of

Sundays and holidays, etc. are unevenly distributed over the months.

Irregular swings (I): Coincidental swings (noise) that appear after consideration is made for

the T, C, and S components and get allocated to residual variation. These stochastic fluctuations

are unpredictable and can be due to political-economical interference in the economy, natural

catastrophes, etc.

A time series (Y) can be modelled with the help of the T, C, S, and I components in two ways:

Multiplicative formula: Y = T x C x S x I

Additive formula: Y = T + C + S + I

In analyzing economic data, the multiplicative model is often used. For example, when the trend

is increasing, seasonal swings of 10% in a given month mean that the absolute swing will

become larger and larger over time. This can be totally reasonable considering that the increasing

trend implies increasing levels for the respective variable. On the other hand, the additive model

implies identical, absolute seasonal swings. This can be reasonable in certain cases, for example,

with the seasonal correction of unemployment numbers.

The purpose of time series analysis is to estimate the dynamic or time structure in the data of

interest, i.e., to divide the time series up into the above stated, possible components. It is sensible

to start with a graphical analysis of the time series, i.e., construct a figure with the observed

values as a function of time, and to make a first assessment. If a given time series is valued in

current prices, the data must be deflated, because it is normally the real change in the data that is

of interest.

The following Section 5.5.2 presents a discussion of the calculation technique for the so-called

moving average, which can be used in connection with the determination of the trend component

in the above-mentioned models. The moving average is used also as a central element in the

seasonal correction of data, which is treated in Section 5.5.3. Finally, the analysis ends with a

discussion of the T and C components of the model in Section 5.5.4.

Page 70: Guidelines for Writing Papers in De

64

5.5.2. Moving averages

A method for smoothing time series consists of the calculation of a so-called moving average,

where the idea is to modify a given period's observation using an average of the time-related

observations just prior and after that in focus. Using this method can make it easier to determine

a possible trend in the time series because more short-term, inc1uding coincidental swings, are

smoothed out. The method for calculating can be illustrated using numbers for GDP at factor

prices (1966-1980) and a moving average that here is based on five terms and calculated as:

Y1' = (Yt-2 + Yt-l + Yt + Yt+1 + Yt+2) /5

The first value in the 5-term moving average can be calculated for 1968 (average of 1966-1970),

the value for 1969 becomes the average of the next five periods, etc., and the last calculation

taken is for 1978. If only data for the period 1966-1980 is available, values in the beginning and

at the end of that period will be missing. In the case here, data for the years after 1980 is

available, and therefore the values for 1979-1980 can be calculated. Figure 5.5.1 shows the result

when the period is extended from 1966 to 2002.

Table 5.5.1. Agriculture's contribution to GDP at factor prices (in millions of 1995-DKK) and the 5-term moving average, 1966-1980.

Original time series Yt

5-term moving average

1966 13662 1967 13473 1968 13469 13283 1969 13919 13224 1970 11894 13281 1971 13363 13183 1972 13760 13426 1973 12981 13775 1974 15133 13551 1975 13636 13683 1976 12247 14038 1977 14416 13919 1978 14756 14201 1979 14540 1980 15048 Source: Statistikbanken.dk/NAT07 (Statistics Denmark).

Page 71: Guidelines for Writing Papers in De

65

With yearly data, like that used in Figure 5.5.1, a 5-term moving average will smooth out all

swings with respect to those 5 years, and this results in a c1earer picture of the long-term trend in

the time series. For example, the development in the agricultural sector during the time Denmark

joined the EU in 1972 can be clearly seen; a prior falling trend in agriculture's contribution to

GDP at factor prices was reversed rather strongly.

Figure 5.5.1. Agriculture's contribution to GDP at factor prices (in billions of 1995-DKK) and the 5-year

moving average, 1966-2002.

8

12

16

20

24

28

32

36

1966 1970 1974 1978 1982 1986 1990 1994 1998 2002GDP GDP95

Billion kr.

Note: GDP95 is a 5-term moving average of real GDP (1995-DKK).

Source: Statistikbanken.dk/NAT07 (Statistics Denmark).

Individual observations can strongly influence a moving average, for example, a large fall in

agriculture's contribution to GDP 1969-1970 is inc1uded in the calculations for all the years

1968-1972. This can be c1early seen in the figure. A correction for this could be to use a

weighted moving average where different weights are used for the yearly values.13 That is, the

greatest weight is given to Yt, and declining weights are given to the remaining values.14 For

example, an extension of the calculation period to a 7-year moving average would in this case

not greatly change the already shown 5-year average.

When using an equal number of periods in the moving average, you must use a technique that

13 For the earlier shown average all the yearly values have identical weights (0.2 in this case).

Page 72: Guidelines for Writing Papers in De

66

centres the calculated average around the statistic in focus. Suppose you wish to smooth out a

time series based on quarterly data for the period 1990-1992. A 4-term moving average seems

most reasonable, since one observation from each of the four quarters will be used in the

calculation of a given value in the moving average,15 e.g., an average where the calculation uses

data from the third quarter 1990 through and inc1uding the second quarter 1991. In this last

mentioned example, the calculated value will reflect a value for the middle of the calculation

period, which is January 1, 1991. To obtain a value in the middle of a quarter, however, the

calculation can be made using 5 terms and letting the first and last periods enter with a weight

equal to 0.5. That is, the first quarter 1991 is calculated using observations from the third quarter

1990 until and inc1uding the third quarter 1991 (using the weights 0.5, 1, 1, 1, 0.5). This results

in a centred moving average.

5.5.3. Seasonal correction

In using time series data where the distance between the observations is less than one year, for

example, quarterly, monthly, or daily data, it may be necessary to further process the data to

estimate the potential seasonal elements (S-components from the earlier used model). You can

carry out seasonal correction using a reasonably simple calculation technique, which will be

illustrated in the following sub-section. The purpose is to remove the more or less systematic

swings, for example, over the year when they often are irrelevant for an analysis of fundamental

long-term trends. On the other hand, the purpose can also be to establish the pattern of seasons.

With respect to the course of correction over the months of the year, a calculation is made for the

value of a given month as if it were a normal month. Seasonally corrected data will be a big help

in judging actual business cyc1e developments. An overview of the time series data, on which

Statistics Denmark makes seasonal corrections and publishes, is found via the home page

(www.dst.dk).

14 The low values for agriculture's GDP in 1970 will then enter with a smaller weight in the calculations for the surrounding periods, but concerning 1970, the observation will enter with larger weight than before. 15 The length of the calculation period implies here that all swings within a year (four quarters) are smoothed out, that is, the seasonal movements are eliminated which is why the method is often used in connection with seasonal corrections. The 1ength of the period can in this way be uniquely determined from the formula (e.g., seasonal correction), where the length of the moving average in other contexts (e.g., the five terms in Figure 5.5.1) must be determined from more subjective considerations.

Page 73: Guidelines for Writing Papers in De

67

Year to year comparisons and moving averages

A very simple and often used method for estimating, for example, a given monthly value is to

compare that value with the value from the same month the previous year. By comparing values

from the same month over the period of analysis, the seasonal element ought to be removed. But

the method is very crude and sensitive to incidental movements in the months under

consideration and changing growth rates in the trend.

This can be illustrated with data for GDP and the seasonally-corrected GDP (quarterly figures,

calculated by Statistics Denmark, DS), cf. Figure 5.5.2, which shows these two time series for

1992-1994. If you estimate GDP in the third quarter 1993 in the context of the corresponding

quarter from the year before, you should conc1ude that there is a decreasing trend in GDP. If you

look, however, at the development from the second to the third quarters 1993 in the seasonally-

corrected series, you would reach another conclusion, namely a stable development in the data.

Without these seasonally-corrected time series, therefore, you would wind up drawing the wrong

conclusions in certain cases.

Figure 5.5.2. GDP and seasonally-corrected GDP (billion 1995-DKK), 1992-1994.

225

230

235

240

245

250

255

260

92:1 92:2 92:3 92:4 93:1 93:2 93:3 93:4 94:1 94:2 94:3 94:4

GDP95 GDP95S GDP95(4)

Billion kr.

Note: GDP95S is the seasonally corrected series of the original data (GDP95). GDP95(4) is a 4-term centred moving

average.

Source: Statistikbanken.dk/NAT07 (Statistics Denmark).

Page 74: Guidelines for Writing Papers in De

68

Another simple method that can eliminate or reduce seasonal swings is the calculation of the

moving average, as earlier described. With quarterly data, a 4-term moving average (12 terms

for monthly data) will smooth out swings over a year, that is, remove the seasonal fluctuations.

To illustrate this, a calculation is made on the indicated GDP figures. Here, quarterly data for

GDP is given for a period longer than 1992-1994, which is why a 4-term centred moving

average can be easily calculated for all the quarters in the designated period, cf., GDP95(4) in

Figure 5.5.2. In this case, a 4-term moving average apparently leads to a smoothing of both

seasonal and irregular swings that is more powerful than the seasonal corrections made by

Danmarks Statistik.

Seasonal indices and the X-11 procedure

A seasonal index for the year's 12 months states the seasonal swings over the year in index form

and is calculated so that the index's average value is 100. A value for July equal to 96 means

that, for that month, the observations are expected to lie 4% under what the trend and cycle

components are in general.

To establish a seasonal index, you must first estimate the seasonal component in the time

series, which is not so easy and can only be approximately determined, as ought to be obvious

from the previous discussion. In the following, a multiplicative relationship is assumed among

the T, C, S, and I components, and seasonal correction, etc. will be illustrated using monthly

data of retailers' sales of food, beverages, and tobacco.16 This is a quantity index, which is why

deflating is not necessary. The calculations are made for the period 1990:01-2003:10, and to

make the resulting construction of the method easier to follow, individual data from the time

series as well as some of the results from the calculations are shown in Table 5.5.2.

16 Statistics Denmark's seasonal correction of this time series is based also on an assumption of a multiplicative relationship.

Page 75: Guidelines for Writing Papers in De

69

Table 5.5.2. Calculation of the seasonal index for food, beverages, and tobacco.

1990 1991 2002 2003 Yt Yt' YSI Yt Yt' YSI … Yt Yt' YSI Yt Yt' YSI

Monthly ave. of

YSI

Seasonal index

Jan 90.20 92.70 100.99 91.79 99.50 107.69 92.40 104.92 109.43 95.87 92.00 92.02Feb 86.00 87.60 101.38 86.41 93.80 107.95 86.89 97.04 109.57 88.56 88.06 88.08Mar 98.80 103.40 101.53 101.84 110.53 108.10 102.25 103.73 109.53 94.70 99.42 99.44Apr 99.10 97.80 101.70 96.16 104.92 108.32 96.86 110.74 109.63 101.01 99.07 99.09May 106.70 109.20 101.89 107.17 114.74 108.58 105.67 115.17 104.01 104.03Jun 102.90 101.70 101.98 99.73 107.07 108.60 98.59 107.83 100.64 100.67Jul 101.70 100.10 101.59 108.10 102.08 105.89 110.74 108.78 101.80 113.98 103.80 103.82Aug 102.30 100.28 102.02 105.10 102.27 102.77 114.52 109.14 104.93 114.63 102.15 102.17Sep 93.80 100.53 93.30 94.70 101.96 92.88 102.43 108.99 93.98 101.46 94.64 94.66Oct 97.50 100.67 96.85 100.80 101.80 99.02 109.13 108.95 100.16 112.47 98.48 98.50Nov 100.00 100.72 99.28 101.20 101.95 99.27 110.96 109.22 101.60 98.40 98.43Dec 121.00 100.78 120.07 121.80 102.05 119.36 124.34 109.26 113.80 119.06 119.08Sum 1199.73 1200.00Note: Yt indicates the quantity index for sales of food, beverages, and tobacco (1990=100). Yt' is a centred 12-month moving average and YSI =(Yt/ Yt') x 100. Source: Statistikbanken.dk/DETA2 (Statistics Denmark).

The original time series, as well as the 12-term centred moving average, which is assumed to

smooth out seasonal swings, is shown in Figure 5.5.3.

There is a clear seasonal pattern in retail sales of food, beverages, and tobacco, where the

largest sales occur in December. This pattern disappears totally in the moving average values,

which show the course of the trend and cycle.

Figure 5.5.3. Quantity index and the 12-term moving average for sales of food, beverages, and tobacco, January 1990-December 1995.

80

90

100

110

120

130

140

90:1 90:7 91:1 91:7 92:1 92:7 93:1 93:7 94:1 94:7 95:1 95:7

Q Q(12)

Note: Q(12) is a centred 12-term moving average of the quantity index of the sales of food etc. (Q).

Source: Statistikbanken.dk/DETA2 (Statistics Denmark).

Page 76: Guidelines for Writing Papers in De

70

If you assume that the moving average only contains the trend and cycle components, the

following division17 of those components in Figure 5.5.3 can be shown as:

YSI = (T x C x S x I) / (T x C)

The total index is divided by the moving average.18 The result becomes an index (time series),

cf. YSI in Table 5.5.2, that − apart from the irregular components − only contains the seasonal

component. Because the trend and cycle swings are eliminated in the new index, for the most

part, the average for the 12 months in each of the years is approximately 100.

Here the calculations are carried out for January 1990 to October 2003, which means thirteen

observations for each month. Because YSI can contain irregular elements (I), an average is

computed on the basis of these observations for each month so that the final result becomes an

index that only represents the S component;19 cf., the seasonal index in Table 5.5.2 and Figure

5.5.4.

Figure 5.5.4. Seasonal index of sales of food, beverages, and tobacco (constant prices).

80

90

100

110

120

130

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Source: Table 5.5.2.

17 An additive relationship between T, C, S, and I means using an addition/subtraction similar to the procedure here. 18 You can multiply by 100 throughout the calculation if you want to maintain an index level in that form. The sketched method is called the "ratio-to-moving-average method". 19 The average for the year comes close to 100 but can deviate a little (due to rounding of numbers and incomplete smoothing of all non-season determined components), in which case the index is level-adjusted as in Table 5.5.2, where 1200 / 1199.7 is multiplied by the monthly average of YSI.

Page 77: Guidelines for Writing Papers in De

71

Given this seasonal index, the original time series can now be seasonally corrected. By dividing

the total index by the seasonal index, the movement is cleared of seasonal swings. This is

shown in Figure 5.5.5, together with that of Statistic Denmark's published seasonally-corrected

quantity index for the same data.

Figure 5.5.5. Seasonally-corrected quantity index of sales of food, beverages, and tobacco, "calculated" and DS, 2000-2001.

90

95

100

105

110

115

120

125

130

J F M A M J J A S O N D J F M A M J J A S O N D

Original data DS Calculated

2000 2001

Note: "Calculated" is indicated by the seasonally-corrected index as given in the text. Danmarks Statistik's

seasonally-corrected index is shown as well as the original uncorrected time series.

Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and Table 5.5.2.

For clarity's sake, only the values for 2000-01 are shown. The calculated seasonally-corrected

index deviates a little from that of Statistics Denmark's published index, partly because the

method for calculation is relatively simple, but also because the calculations here are carried out

on data that only covers the period 1990-2003.

Seasonal correction is made at Statistics Denmark (and usually also applied at other statistical

agencies) with the help of a programme called X-11 – or X-12, which is the latest version of the

programme - developed by the U.S. Bureau of the Census in the 1960's. This is capable of

seasonally correcting quarterly and monthly data. The programme separates a time series into a

trend and cycle component (TC), a seasonal component (S), and an irregular component (I). All

unevenness in the number of work and business days over the year can be corrected for.

Page 78: Guidelines for Writing Papers in De

72

The calculation procedure builds on the technique using moving averages, a centred 12-month

moving average for establishing the T-C components. This is used − as was earlier shown − to

obtain a first estimate of the S-I components. Going through several iterations of calculations,

using various forms of the moving average, yields an adjusted (final) estimate of the season

component. In this connection, an attempt is made to isolate the I component, and extreme

values (outliers) are given less weight so that their influence is reduced. Given the output

possibilities in X-11, the original time series can be divided up into a trend-cyclical component,

a seasonal component, the irregular component, and of course, a seasonally-corrected time

series.

As illustration of the last, the time series used in Figure 5.5.3 is seasonally-corrected with the

help of X-11. Only the period 1990-2003 is used, and a correction has not been made for the

number of work days, which is why the results will deviate from the earlier shown seasonally-

corrected numbers from Statistics Denmark. For clarity sake, only the results for 2000-01 are

presented again, cf., Figure 5.5.6.

Figure 5.5.6. Seasonally-corrected quantity index of sales of food, beverages, and tobacco, "calculated"

and X-11, 2000-01.

103.2

104.4

105.6

106.8

108.0

109.2

110.4

111.6

J F M A M J J A S O N D J F M A M J J A S O N D

X11 Calculated

2000 2001

Note: The calculations are carried out using the time series programme SAS/ETS. The index "calculated" is as

shown in Figure 5.5.5., and X-11 is calculated on data covering the period 1990-2003 (corresponding to the data set

that the original 12-month average was computed from).

Source: Statistikbanken.dk/DETA2 (Statistics Denmark), and the X11 procedure.

Page 79: Guidelines for Writing Papers in De

73

There is a nice merging between the result from X-11 and the earlier manually-computed index

− the differences have no practical significance. Correspondingly, the seasonal index produced

by X-11 (not exhibited) is nearly totally identical with that presented in Figure 5.5.4.

5.5.4. Trends and cycles

An example of a trend is seen in Figure 5.5.7, where GDP in constant factor prices is shown

from 1900 to the present. For certain sub-periods, the course is somewhat smoothly increasing,

that is, at a constant growth rate over time. As shown earlier in Section 5.5.2, a development

that can be described by an exponential function − as that in Figure 5.5.7 resembles

approximately − will have a constant growth rate. It is seen in the figure here that, by applying a

logarithmic scale to GDP, the graph becomes partly linear − the slope is determined by the GDP

growth rate. For the indicated period, there are cyclical swings and reactions to certain events,

such as wars and oil price shocks. When yearly data is used, seasonal variation will be

eliminated (which can be an advantage in that there is one less component to isolate in a given

time series).

The trend in a time series can be of different types − the exponential function has already

been mentioned. A second type is a simple, linear relationship over time. A third type is the

logistical curve, which has an S-shape.

Figure 5.5.7. GDP at factor prices, 1900-2002 (in billions of 1995-kroner).

0100200300400500600700800900

10001100

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

Kr. billion

Page 80: Guidelines for Writing Papers in De

74

10

100

1000

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

Kr. billion (logarithmic scale)

500

50

Source: Sv. Aa. Hansen: Økonomisk vækst i Danmark (Economic Growth in Denmark) Copenhagen, 1974;

Adam's databank; Statistikbanken.dk/NAT07.

If a time series consists of, for example, monthly data for which a seasonal index has been

computed, the trend and cycle components can be established based on this seasonal index. This

can be illustrated with data used earlier for sales of food, beverages, and tobacco. In order to

make a judgement about the trend, the time period must be relatively long, and in the present

case data from 1990 has been used.

First, a seasonal correction is made on the entire series by dividing the monthly values for the

individual years by the seasonal index, as shown earlier in Figure 5.5.4. The result appears in

Figure 5.5.8.

The seasonally-corrected values exhibit significantly fewer fluctuations than the original

series. At the same time, it can be seen that the projection of a trend forward from the period

1990-2003 appears less favourable since the increasing trend here does not seem to apply to the

future periods (the upper part of Figure 5.5.8).

The seasonally-corrected curve contains only T, C, and I components. The second step in the

analysis is to evaluate the trend, which here has led to the assumption of a constant growth rate

over the whole period (exponential growth). With the help of regression analysis,20 this trend is

determined from the seasonally-corrected values, which are sketched in the lower diagram of

20 Under the assumption about exponential growth, the trend is determined as the curve for which the sum of the squared distance between the observations and the curve is minimized.

Page 81: Guidelines for Writing Papers in De

75

Figure 5.5.8.

Figure 5.5.8. Quantity index of sales of food, beverages, and tobacco, January 1990 - October 2003 (1990=100).

Original data

84

90

96

102

108

114

120

126

132

90:1 91:1 92:1 93:1 94:1 95:1 96:1 97:1 98:1 99:1 00:1 01:1 02:1 03:1

Seasonally-corrected data

92

96

100

104

108

112

116

90:1 91:1 92:1 93:1 94:1 95:1 96:1 97:1 98:1 99:1 00:1 01:1 02:1 03:1

Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and own calculations.

A similar exercise can be done with the data from Figure 5.5.3 where a 12-term centred

moving average did seem to smooth out the seasonal pattern. With data for the period 1990-

2003 the (seasonal) adjusted data and the trend are exhibited in Figure 5.5.9. One interpretation

of the cycles or fluctuations around the linear trend will be that this illustrates the business cycle

component of the original series.

Page 82: Guidelines for Writing Papers in De

76

Figure 5.5.9.Trend and cyclical components of the sales of food, beverages, and tobacco (Index,

1990=100).

100

102

104

106

108

110

112

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003

Source: Statistikbanken.dk/DETA2 (Statistics Denmark) and own calculations.

With this, the decomposition of the original time series is finished. But you must still remember

that behind the calculations are some (self chosen) assumptions, which is why the final

decomposition may not represent the "true" picture. This means that the trend in the last figure

can be rightfully criticized, because it does not explain much of the variation in the time series.

A corresponding trend with a constant growth rate would be a much better fit for the GDP data

in Figure 5.5.7.

6. Making commentaries

All tables and figures included in the paper must be discussed in the paper. The purpose of this

section is to provide some guidelines as to what types of comments are appropriate.

The paper must be written in clear language that is easily read and does not use complicated

and twisted sentence construction. Banal language like slang, catchwords, or clichés are to be

avoided, just as the "I" and "we" form, as well as other diction, should be avoided. The

following sentences have been taken (and translated) from previous empirical papers as

examples of what has not worked, mainly because of the lack of substance in the words:

Page 83: Guidelines for Writing Papers in De

77

o Everyone knows that sales have been nothing, but ideal until now.

o It has always been a popular, vogue, fashionable, favourite phenomenon to compare us

with each other here in Scandinavia.

o Now it all hangs together ...

o One thing is imports and exports − a relatively positive experience − but what about

interest rates?

In addition, you must avoid exaggeration and assertions that are not covered in the material

used in the paper. The following sentences have been taken (and translated) from previous

papers as examples of what has not worked, mainly because of the lack of documentation

behind the statements:

o We don't have to go far into the future before energy becomes a scarce commodity.

o It is a known fact that youth are more environmentally-minded than the elderly.

o Bank failure on the Faro Islands has been an everyday affair.

o The development has been full of swindles and diverse allegations.

With the help of few well chosen sentences, comments must point toward the patterns that the

tables and figures reflect, without repeating the data itself in the text. An example of an

appropriate comment concerning Table 5.1.3 follows:

Table 5.1.3 shows first that, in the period 1970-2006, a shift occurred in age distribution resulting in relatively more elderly individuals and relatively fewer children and young adults, both male and female. Second, the table shows that, during this period, the population increased. And third, the table shows that there are more females than males among the elderly and fewer females than males among the children and young adults.

The comment is short and precise. The type of material used to analyze the issues under

investigation influences the type of commentaries that should be made. If the focus of the paper

has been, for example, on public sector expenditures, comments should be directed toward the

effect the stated changes in age distribution could have on these public sector expenditures. So

Page 84: Guidelines for Writing Papers in De

78

the comment should not just be short and correct, it must be relevant to the problem under

investigation.

An important aspect in the comment might be to point toward what might be lacking in terms

of available material (or material adequacy) in light of the chosen formulation of the statement

of the problem. The material is adequate for the statement of the problem when it allows a

substantial analysis of the problem.

Material inadequacy can also result from a lack of material concerning explanatory factors. If

a decisive explanatory factor has not been accounted for in the analysis, you will most likely

make incorrect conclusions, as earlier mentioned. Therefore, you should include in the comment

the lack of material for an important explanatory factor, if this is the case. In general, the points

in the previous discussion of causal analysis are all relevant for evaluating the adequacy of the

material used in the paper.

In conclusion, a good commentary

1. is written in precise and concise language − remember to use correct punctuation.

2. highlights the patterns in the material without a long-winded discussion of the individual

elements.

3. to a greater or lesser extent, addresses the adequacy of the material with respect to both

conceptual understanding and as well as a lack of material.

4. contains assessments of the explanatory value of the included material seen in the context

of the statement of the problem and therefore contributes to continuity between the

sections.

7. Construction of the report

This section covers the formal demands, not previously mentioned, for preparing the empirical

paper.

The paper begins with a title page. After the title page comes a table of contents, which

overviews the sections included in the paper, presenting the section title, number and the page

on which the respective section begins, cf., the table of contents for these guidelines.

The first section is called the introduction and is used for a discussion of, and a justification

Page 85: Guidelines for Writing Papers in De

79

for, the chosen statement of the problem and of the chosen delimitations. A start in setting the

delimitations might be the defining of the central concepts. It should be pointed out that you

should not bother to define concepts that the audience is expected to be familiar with. The

introduction should also be used to point out aspects of the problem that could possibly be

relevant/interesting, but which are not to be treated.

The introduction binds the succeeding sections together in that these sections present relevant

material that is first introduced in the introduction. As mentioned, the statement of the problem

is the control mechanism for the succeeding phases of work. The introduction is , therefore, the

control mechanism for all succeeding sections of the report. It should be emphasized that the

introduction must not be a verbalization of the table of contents, and you should not start by

saying that the purpose of the paper is to give an account of that which stands in the title. This

ought to be obvious and is, therefore, unnecessary to mention. Finally, it should be mentioned

that data material does not normally appear in the introduction.

In a paper 15 pages in length, where the choice of method, etc., does not require an in-depth

discussion, the introduction will typically fill one page maximum.

In the sections following the introduction, the statement of the problem is addressed using the

collected data and information. Normally, the base material is located in the second section. The

remaining sections are used, then, to account for the development in this material. Both sections

and sub-sections can be used. Every section treats a sub-problem of the statement of the problem

and will, as a rule, comprise at least one-half of a page. The sections must follow each other in a

logical order and with a reasonable weight, determined relative to the problem at hand. The order

and weighting given to the paper is given large consideration in the evaluation of the paper. A

sensible weighting involves also the choice of which material is to be visualized and in what

form it is to be visualized. Note, it is by exception that data already presented in one visual form

is again presented in another. For example, rather than treating the same data in both figure and

table forms, you might instead include additional explanatory material and thereby reach a

greater depth in the analysis.

Use short and precise section titles and avoid having tables and/or figures follow immediately

after each other. Comments should be used, instead, to "encircle" the tables or figures that are

being referred to. A section should not, under normal circumstances, begin with a table or

figure, but rather a text. Text and tables and figures are separated with an extra line so there is

Page 86: Guidelines for Writing Papers in De

80

space between, for example, a table's source and the surrounding text. However, to avoid large

empty spaces (typically at the bottom of the page), it might be necessary to separate comments

and tables or figures from each other in the text.

The final section in the paper is called the conclusion and is used to summarize the most

important conclusions reached in the text. Reading just the introduction and the conclusion

should be enough to give the reader the essence of the report − this is a great advantage for the

busy reader. New aspects or information must not be treated in the conclusion, and all clichés

about what the future might bring should be avoided − they are subjective predictions about

future development that have no basis in the material. The sections must be numbered, and the

section titles must be marked clearly, for example, with underlining or by using bold or italic

type.

After the conclusion comes the reference list, which appears on a separate page and presents

an unambiguous overview of the utilized sources. That the reference list is unambiguous means

that, since each source is unique, each source must be cited with enough information to uniquely

identify the source.

The reference list includes enough information to be able to uniquely identify sources. The

following rules should be used: books are written with author(s), title, location of publisher,

publisher, year − for example, Andersen, T. M, et al.: The Danish Economy, 2. edition, DJØF

Publishing, Copenhagen, 2006. Note that the author' s name is written last name first and that, as

in the example given, only one author's name is written, followed by 'et al.' when three or more

authors are associated with a given text. With two or three authors, the first name appears as

noted above, and the following names appear with first name first. Note further, that the title of

the work might appear in italic type (although style may dictate that it appears in bold).

Periodicals are written with author(s), title of work, title of periodical, year, volume (if any),

issue, and page number(s) − for example, Bentzen, J.: An empirical analysis of gasoline demand

in Denmark using cointegration techniques, Energy Economics, Vol. 16, No. 2, 1994, pp. 139-

143.

Statistics are referenced by organization issuing the statistics, title, year, number – for

example, OECD: Annual National Accounts – Volume 1 – Main aggregates, 2007.

Normally, homepage and database addresses are placed at the end of the reference list.

If there are appendices, they come after the reference list. The appendices contain the raw data

Page 87: Guidelines for Writing Papers in De

81

and other information that was used to establish the base, comparative, and explanatory material

used and presented in the tables and figures in the text. The appendices should not include

copies of tables from the statistical sources. The appendices might also contain an account of the

calculations used to obtain further representations of the data then used in the text. Presentation

of these results gives the reader the chance for replicating the presented material. A legal text

can also be included in the appendix material. In general, the appendices should be used for the

material that ties the material used in the text back to the original form of the data and for that

material which was not directly used in analyzing the problem. In this vein, the appendix

material is not directly discussed in the text, but perhaps referred to.

References

Adam's databank, Statistics Denmark

Andersen. T.M. et al.: The Danish Economy. DJØF Publishing. Copenhagen 2006.

Danmarks Statistik: NYT, No. 321, 1993.

Danmarks Statistik: Statististisk tiårsoversigt (Statistical ten-year review) (STO).

Danmarks Statistik: Statistisk årbog (Statistical yearbook) (SÅ), 2006.

Danish Energy Agency: Energistatistik (Energy statistics), 2002.

Hansen, Sv.Aa.: Økonomisk vækst i Danmark (Economic Growth in Denmark) Copenhagen,

1974.

Meadows, D, et al.: The Limits to Growth, The New American Library, Inc., 1972. Middle East

OECD: Energy Balances of OECD Countries, 2003 Edition.

The World Bank: World Development Report, 2006.

The World Bank: World Development Indicators.

www.statistikbanken.dk

www.opec.org – Annual Statistical Bulletin 2006.

www.bp.com – Review of World Energy. 2001.