32
Traditional and newly emerging data quality problems in countries with functioning vital statistics: experience of the Human Mortality Database United Nations Expert Group Meeting on the methodology and lessons learned to evaluate the completeness and quality of vital statistics data from civil registration New York, November 3-4, 2016 Dmitri A. Jdanov Domantas Jasilionis Vladimir M. Shkolnikov on behalf of the HMD team

Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Traditional and newly

emerging data quality

problems in countries with

functioning vital statistics:

experience of the Human

Mortality Database

United Nations Expert Group Meeting on the

methodology and lessons learned to evaluate

the completeness and quality of vital statistics

data from civil registration

New York, November 3-4, 2016

Dmitri A. Jdanov

Domantas Jasilionis

Vladimir M. Shkolnikov

on behalf of

the HMD team

Page 2: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

The Human Mortality Database

• Joint project of the Department of Demography at the University of

California at Berkeley (USA) and the Max Planck Institute for

Demographic Research in Rostock (Germany)

• Work began in autumn 2000, launched online in May 2002 with 17

country series

• Now: a leading data resource on mortality in developed countries.

• Includes 38 countries and 8 regions, 30,000+ users

Main advantages:

• Comparability across time and space

• Continuous, long-term series without gaps or ruptures

• Data by age, year, cohort, in age-time formats 1x1, 5x1, 1x5, 5x5 etc.

• Detailed documentation on origins and quality of the data

However, one of the main principles of the HMD is to include countries

with reliable population statistics, especially requiring a full coverage of

registration of vital events.

Page 3: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Reliable population data

First questions:

Is there civil registration system and reliable vital statistics?

Is there reliable population estimates?

Is it possible to get all these data?

Preliminary quality checks in the HMD:

• Accuracy: coverage, completeness, proportion of missing data

• Availability and relevance: access to detailed enough tabular data

• Comparability across space and time

For example, using UN and WHO sources a naïve user might conclude that data for

Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and

births - coverage >=90%).

This might be not correct according to the HMD criteria.

Page 4: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

UN assessment of coverage by death registration (Dec 2014)

Source: UN Population Division

(http://unstats.un.org/unsd/demographic/CRVS/CR_coverage.htm)

Page 5: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Censuses and assessment of

the population denominator

Page 6: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Censuses and inter-censal population estimates

Assuming good quality of census

data:

After a new census the post-censal

population estimates should be

replaced by inter-censal estimates

(backward from this census). Four

components:

• Census counts

• Death counts

• Births

• Migration

Developed countries with high

quality vital registration system

which do NOT produce inter-censal

estimates: Germany, Italy, Czech

Republic, ….

Census

years

Males

Females

4000000

4200000

4400000

4600000

4800000

5000000

5200000

5400000

1947

1951

1955

1959

1963

1967

1971

1975

1979

1983

1987

1991

1995

1999

2003

Pop

ulat

ion

Czech Republic, Official Population

Estimates as of December 31

Page 7: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Inter-censal population estimates: the HMD approach

The standard HMD methodology

(Wilmoth et al., 2007) for the cases

when inter-censal population

estimates are either not available

or unreliable is based on the

assumption of uniform distribution

of migration across the entire inter-

censal period. This assumption

works well in many conventional

situations, but may be violated in

the case of special events (e.g. the

collapse of the USSR and abrupt

social-economic changes in

Eastern Europe, the EU

enlargement in 2004, the financial

crisis in 2008-2009).

x

x + 1

x + 2

x + 3

x + 4

x + 5

x + 6

Age

t t+1 t+2 t+3 t+4 t+5 Time

P(x+5,t+5)

P(x,t)

DU(x+1,t+1)

DL(x+4,t+3)

x

n

i

LU

nitixDitixDxCntnxP

5),1(),()(),(

1

0

1

4

0

12 ),1(),()()5(i

LUx itixDitixDxCxC

C1 C2

Page 8: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Bulgaria: correction of population data

(inter-censal estimates)

The standard HMD inter-censal method is not applicable to the period 1985-1992

because of an irregular pattern of out-migration. In 1985-8, international migration

was very restricted in Bulgaria. After the collapse of communism in 1989 - mass

emigration (mostly of the Turkish minority) over the next several years.

HMD Solution: official population estimates were used for 1985-8, but new population

estimates were calculated for the latter period. The year 1988 was treated as a

“pseudo-census point” as the beginning of the inter-censal interval.

1985

(census year)

2001

census year

1984

2000

1992

(census year)

1991

3500000

3700000

3900000

4100000

4300000

4500000

4700000

1961

1963

1965

1967

1969

1971

1973

1975

1977

1979

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2001

2003

MALES

FEMALES

3500000

3700000

3900000

4100000

4300000

4500000

4700000

1980 1985 1990 1995 2000

Females

Males

Trends in the total number of males and females. Bulgaria, 1961-2003. Official population estimates (left) and HMD

data (right). Source: Jasilionis D., Jdanov D.A. Human Mortality Database: Background and Documentation for Bulgaria

Page 9: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Germany: three decades between censuses

Before the 2011 census, East Germany had a census 30 years ago and West

Germany - 24 years ago. Whereas before the 2011 census Germany's population was

estimated to be 81.7 million, the census corrected this down to 80.2 millions, a

difference of 1.5 million people (~ 1.8%). The statistical office of Germany decided not

to produce adjusted inter-censal population estimates by age.

-1.00

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

-5000

0

5000

10000

15000

20000

25000

30000

0 10 20 30 40 50 60 70 80 90

Rel

ativ

e di

ffer

ence

Abs

olut

e do

ffer

ence

Age

males, abs

females, abs

males, rel

females, rel

Figure:

the difference

between current

population

estimates and the

census counts of

2011

Page 10: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

External outmigration intensities by age for the

German lands, males

In the 2000s statistical offices of many German lands have made efforts to eliminate from their

populations non-existent residents (erroneous cases). Using information from local tax offices they

removed erroneous cases by creating external outmigration events in the year of “cleaning”.

Page 11: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

The HMD inter-censal estimates for Gerrmany

1) Using additional migration data and cubic spline interpolation for migration trends across

cohorts we removed the population changes due to the earlier “cleaning” by the statistical offices.

2) We distributed the accumulated error (not the net migration!) uniformly over the adjustment

period of 24 years (30 years for East German lands):

Page 12: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Changeable population definitions

across time

Page 13: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Numerator-denominator bias: case of Moldova

Source: Penina, Jdanov, Grigoriev (2015)

* Since 1998 official population counts do not include

Transnistria region

The problem: systematic bias (deaths and

births refer to the de facto population, (i.e.

occurred within the country, while

population estimates also include long-term

emigrants - Moldavian citizens living

abroad). Results in under-estimation of

mortality and fertility.

The solution: population estimates were

corrected using data on border crossings and

additional data collected at the census of

2004

Page 14: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Changes in the definition of population: Poland

Figure: Official and adjusted (Tymicki et al. , 2015)

estimates of population of Poland

14,000,000

15,000,000

16,000,000

17,000,000

18,000,000

19,000,000

20,000,0001

96

01

96

21

96

41

96

61

96

81

97

01

97

21

97

41

97

61

97

81

98

01

98

21

98

41

98

61

98

81

99

01

99

21

99

41

99

61

99

82

00

02

00

22

00

42

00

62

00

82

01

02

01

22

01

4

Pre- and post-censal population estimates according to the 2002

Post-censal population estimates calculated according to the 1988

census

Post-censal population estimates calculated according

to the 1970 census

Post-censal population estimates calculated according to the 1960

census

FEMALES

MALES Post-censal population estimates according to the

2011 census

Unfofficial inter-censal estimates

based on the 2011 census

In the 2000s, Poland faced a

massive out-migration that followed

the EU enlargement of 2004. It was

expected that the population counts

will be corrected downward after

the next population census of 2011.

But Statistics Poland has

unexpectedly decided to change

the official definition of the

population status from the

permanently resident (acting in

2010 and earlier) to the usually

resident (from 2011 onward).

Statistics Poland did not re-

estimate age-specific population

counts back to previous census.

Due to irregular migration pattern

the standard HMD inter-censal

method for reconstruction of annual

population estimates is not

applicable.

Page 15: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Change in the definition of ethnicity: New Zealand Māori

For New Zealand, HMD has separate data series for non-Māori and Māori populations.

Change in definition of Māori in the census of 1991 from the one based on ethnicity of parents to

the one based on self-identification. The new definition caused a jump in Māori population, but the

death counts were not corrected simultaneously. The respective change in definition of ethnicity for

death and birth was introduced only in September 1995.

15 of 32

Figure: Life expectancy at birth of Māori, Non-Māori, and total population of New Zealand calculated from the

official (unadjusted) data (left panel) and adjusted HMD data (right panel). Source: (Jasilionis et al., 2015)

Page 16: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

In 1974 Portugal lost its African colonies, 500 000-700 000 returned to Portugal in

the second half of the 1974 and in 1975

Impact of migration at working/reproductive ages on

mortality and fertility estimates Population exposures by age group, Portugal, females

Red line – official current estimates, blue line – HMD inter-censal

Current

Page 17: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Inter-censal population estimates and fertility indicators:

Portugal in 1975-76

0 20 40 60 80 1000

1

2

3

4

5

6

7

8

x 104

Age

Pop

ulat

ion

Population counts, year 1976

0 20 40 60 80 1000

1

2

3

4

5

6

7

8

x 104

Age

Pop

ulat

ion

Population counts, year 1976

Current population estimates HMD inter-censal estimates

1975

1976

1975

1976

Cohort childlessness at age 40 (1-CCF1_40), Portugal

0

1

2

3

4

5

6

7

8

1954 1956 1958 1960 1962 1964 1966 1968 1970

cohort

perc

en

t

annual

intercensal

17 of 32

Page 18: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Mortality at advanced ages

Page 19: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Mortality estimates at old ages

• Internationally comparable high quality demographic data on old-age

populations remain insufficient.

• The HMD is the only major demographic database which provides such data.

Population estimates for ages 80+ in the HMD are recalculated using

extinct/almost extinct cohort and survival ratio methods.

Evaluation of data quality at old ages

The standard set of methods includes tests on

• age overstatement (e.g. the ratio of the total person-years lived above age

100 to the total person-years lived above age 80);

• precision of age reporting with the UN age-sex accuracy index;

• age heaping with the Whipple's Index of age accuracy.

The comparison to other countries with reliable statistics may be also used for

evaluation of data quality.

Page 20: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Germany: old ages

West Germany1

95

6

19

60

19

64

19

68

19

72

19

76

19

80

19

84

19

88

19

92

19

96

20

00

20

04

20

08

Year

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

0.42

m9

0

MalesFemales

East Germany

19

56

19

60

19

64

19

68

19

72

19

76

19

80

19

84

19

88

19

92

19

96

20

00

20

04

20

08

Year

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

0.42 MalesFemales

Trends in death rates at age 90+, calculated from the official population estimates,

for the West and East Germany, males and females, 1956-2008.

Page 21: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Germany: old ages (cont.)

Ratio of DRV records by age based on own pensions to estimates based on official data, West

and East Germany, 2009.

West Germany

70 75 80 85 90 95 100

Age

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

Ra

tio

DR

V / H

MD

po

pu

latio

n e

stim

ate

s MalesFemales

East Germany

70 75 80 85 90 95 100

Age

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3 MalesFemales

To correct population estimates for West Germans at older ages in 2010, the HMD

team used data by the Deutscher Rentenversicherung Bund (DRV), the German

Pension Scheme.

Page 22: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Life expectancy and probability of death for the corrected

and the original data, West Germany, 1990-2008 e80

1990 1994 1998 2002 2006

Year

5.5

6

6.5

7

7.5

8

8.5

9

9.5

e8

0

q80

1990 1994 1998 2002 2006

Year

0.04

0.05

0.06

0.07

0.08

0.09

0.1

q8

0

e90

1990 1994 1998 2002 2006

Year

3

3.25

3.5

3.75

4

4.25

4.5

4.75

e9

0

q90

1990 1994 1998 2002 2006

Year

0.12

0.14

0.16

0.18

0.2

0.22

0.24

q9

0

Males correctedFemales corrected

Males originalFemales original

e80

1990 1994 1998 2002 2006

Year

5.5

6

6.5

7

7.5

8

8.5

9

9.5

e8

0

q80

1990 1994 1998 2002 2006

Year

0.04

0.05

0.06

0.07

0.08

0.09

0.1

q8

0

e90

1990 1994 1998 2002 2006

Year

3

3.25

3.5

3.75

4

4.25

4.5

4.75

e9

0

q90

1990 1994 1998 2002 2006

Year

0.12

0.14

0.16

0.18

0.2

0.22

0.24

q9

0

Males correctedFemales corrected

Males originalFemales original

Page 23: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Relative difference (per cent): HMD (2011 Census) vs. official population

estimates and HMD (2011 Census) vs. HMD (DRV correction at ages 90+)

Page 24: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Chile: old ages

In the HMD/HFD data series for

Chile starts in 1992

Page 25: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Costa-Rica: death rate ratios, males

mx ratio, CRI/JPN, males

Year

Age

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

1970 1980 1990 2000 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

mx ratio, CRI/SWE, males

Year

Age

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

1970 1980 1990 2000 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

mx ratio, CRI/USA, males

Year

Age

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

1970 1980 1990 2000 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Costa-Rica / Sweden Costa-Rica / Japan Costa-Rica / USA

Page 26: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Costa-Rica: life expectancy at advanced ages

Page 27: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Infant mortality

Page 28: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

28

Underestimation of infant mortality due to a restrictive

definition of live birth and death undercount

Adjustment is made

by correction of the

monthly mortality

curves. The

adjustment brings

these curves to

certain “golden-

standard” curve.

Source:

Kingkade & Sawyer, 2001.

28 of 32

Page 29: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Underestimation of infant mortality: adjustment by mortality trend

Figure: Infant mortality rate in Moldova before and after correction prior

to 1973, both sexes, Moldova, 1959–2014. Source: (Penina et al., 2015)

An abrupt increase in

the infant mortality that

occurred in all of the

Soviet republics at the

beginning of the 1970s

was interpreted by

Anderson and Silver

(1986) as a result of

improvements in the

registration rather than

a real deterioration in

survival of the

newborns.

Page 30: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

South Korea: preliminary analysis

Infant mortality before 2000 is not reliable

After 2000:

Substantial differences between pop estimates for 2000 and 2005 and census

counts for the same years. Perhaps census counts excludes foreigners? These

differences are important for the ages 0,1 and also for other ages (including adult

and old ages). It can be also seen that some smoothing was used to produce pop

estimates (fluctuations observed at some ages in census data are absent in pop

estimates - this cannot be explained by a simple exclusion of foreigners).

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

0 3 6 9 12151821242730333639424548515457606366697275788184

2005 census

2005 estimate

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84

2000 census

2000 estimate

Page 31: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

Conclusion

• Data are of high quality if they are “Fit for Use” in their intended

operational, decision-making and other roles (Juran and Godfrey,

1999). This is why the understanding of problems hidden in the data is

important in any demographic estimation, forecast or study.

• We discussed several approaches which allow us to increase

significantly utility of the data even if data quality is problematic.

• Standard demographic methods which work well with data from

developing countries or historical data series are often not applicable to

problematic data from countries with functioning statistical systems.

• Country-specific approaches in combination with usage of additional

and alternative data sources are needed. They should be combined

with certain general principles that are applied in all countries to ensure

comparability of HMD data series across time and space.

Page 32: Traditional and newly emerging data quality problems in ... · Moldova, Chile, Costa-Rica are O.K. (last two censuses – complete, death and births - coverage >=90%). This might

32 of 32

Thank you!