Upload
joel-cunningham
View
221
Download
3
Tags:
Embed Size (px)
Citation preview
hist.umn.edu/~rmccaa/ipums-europe
1
IPUMS-Europe, 2004-2008: IPUMS-Europe, 2004-2008: Restricted-access, anonymized Restricted-access, anonymized
microdata for scientific and microdata for scientific and policy researchpolicy research
* * ** * *Robert McCaa, University of Minnesota Robert McCaa, University of Minnesota
Population CenterPopulation CenterNikolai Botev, UN-ECE Population Activities Unit Nikolai Botev, UN-ECE Population Activities Unit
(Geneva)(Geneva)
www.hist.umn.edu/~rmccaa/ipums-europewww.hist.umn.edu/~rmccaa/ipums-europe
hist.umn.edu/~rmccaa/ipums-europe
2
OutlineOutline
» PAU 1990s projectPAU 1990s project
» IPUMS-International means: IPUMS-International means: Restricted access, anonymized Restricted access, anonymized microdatamicrodata
» IPUMS-Europe: sister project (Latin IPUMS-Europe: sister project (Latin America), connections with PAUAmerica), connections with PAU
» IPUMS-International partnersIPUMS-International partners
» Principles: integration, disseminationPrinciples: integration, dissemination
hist.umn.edu/~rmccaa/ipums-europe
3
Population Activities Unit Population Activities Unit 1990 census round harmonization 1990 census round harmonization
project:project:focused on Agingfocused on Aging
» Begun 1992: PAU/UNECE, UNFPA, US-NIA Begun 1992: PAU/UNECE, UNFPA, US-NIA
» Microdata acquired for 15 countriesMicrodata acquired for 15 countries
» Harmonized Harmonized 26 core person variables plus 13 optional; 26 core person variables plus 13 optional; 10 dwelling/household variables, 18 optional10 dwelling/household variables, 18 optional
» Extensive metadata: Extensive metadata: questionnaires, nomenclatures, questionnaires, nomenclatures, classificationsclassifications
» Progressive over-sampling with ageProgressive over-sampling with age
hist.umn.edu/~rmccaa/ipums-europe
4
Population Activities Unit Population Activities Unit 1990 census round harmonization 1990 census round harmonization
project:project:focused on Agingfocused on Aging
hist.umn.edu/~rmccaa/ipums-europe
5
Population Activities Unit, Population Activities Unit, 1990 census round harmonization 1990 census round harmonization
project:project:focused on Agingfocused on Aging
» General release: General release:
samples for 8 countriessamples for 8 countries
» Samples for the other 7 countries available Samples for the other 7 countries available
under more restrictive conditionsunder more restrictive conditions
» Dissemination: CDs or other media; no Dissemination: CDs or other media; no
online access online access
» Sustainability: ICPSR (U. of Michigan) Sustainability: ICPSR (U. of Michigan)
hist.umn.edu/~rmccaa/ipums-europe
6
Problems with PAU effort:Problems with PAU effort:
» Sample design too complexSample design too complex
» Need for time seriesNeed for time series
» Lacked legal authorityLacked legal authority
» Inadequate fundingInadequate funding
» Insufficient computing infrastructure Insufficient computing infrastructure and human resourcesand human resources
» Antiquated distribution systemAntiquated distribution system
» Sustainability problematicSustainability problematic
hist.umn.edu/~rmccaa/ipums-europe
7
Population Activities Unit: Population Activities Unit: samples of older persons based on samples of older persons based on
the 2000-round of censusesthe 2000-round of censuses» Tightly integrated with IPUMS-EuropeTightly integrated with IPUMS-Europe
» Based on the same coding schemes, Based on the same coding schemes,
nomenclatures, and classifications nomenclatures, and classifications
» Utilize the same anonymization techniques Utilize the same anonymization techniques
and approaches; same data access modalitiesand approaches; same data access modalities
» Ensure sustainability through the integration Ensure sustainability through the integration
with IPUMS-Europe: ICPSR & European Data with IPUMS-Europe: ICPSR & European Data
CentersCenters
hist.umn.edu/~rmccaa/ipums-europe
8
Population Activities Unit: Population Activities Unit: samples of older persons based on samples of older persons based on
the 2000-round of censusesthe 2000-round of censuses» Sample design:Sample design:
- - sample of households not included in the core IPUMS-Europe sample, where at least one member is over age 60 (recommended sampling density: 5 percent);- geography to match that of core samples;
» Advantages:Advantages:- - more straightforward than the design used for 1990s;- in line with the practice of national statistical offices (e.g. PUMS-A and PUMS-O of the US Census Bureau);
hist.umn.edu/~rmccaa/ipums-europe
9
From IPUMS-USA (1989-) From IPUMS-USA (1989-) & PAU-Aging (1992-) & PAU-Aging (1992-)
to IPUMS-International (1999-) and beyond
to IPUMS-International (1999-), to IPUMS-International (1999-), Latin America (2003-), Europe Latin America (2003-), Europe
(2004?) and beyond(2004?) and beyond
hist.umn.edu/~rmccaa/ipums-europe
10
IPUMS-International means IPUMS-International means Restricted access, Anonymized Restricted access, Anonymized
microdatamicrodata
» Should be “IRAMS” not IPUMSShould be “IRAMS” not IPUMS
» Who are IPUMS-International users? Who are IPUMS-International users?
Those who:Those who:
» Have a demonstrated need for the data Have a demonstrated need for the data
(project abstract)(project abstract)
» Agree to abide by the restrictions of useAgree to abide by the restrictions of use
» Place themselves under the jurisdiction Place themselves under the jurisdiction
of Institutional Review Boardsof Institutional Review Boards
hist.umn.edu/~rmccaa/ipums-europe
11
AANNOONNYYMMIIZZEESS
II
PP
UU
MM
SSii
» » Suppress geographical detail Suppress geographical detail (NUTS2/3?)(NUTS2/3?)» » Corrupt the data! (just a Corrupt the data! (just a little…)little…)» » Blur/aggregate sensitive codesBlur/aggregate sensitive codes» » Convert dates to ages (blur key Convert dates to ages (blur key vars.) vars.) » » Swap cases between districts! Swap cases between districts! (just a few…)(just a few…)
» » Scramble order of unit recordsScramble order of unit records
Using the most demanding Using the most demanding standards:standards:
legal & administrative legal & administrative
as well as technical:as well as technical:
hist.umn.edu/~rmccaa/ipums-europe
12
» 1. Suppress geographical variables below 1. Suppress geographical variables below
communecommune
» 2. Convert 2. Convert » Dates of birth, marriage, immigration to agesDates of birth, marriage, immigration to ages
» Band small groupsBand small groups
» 3. Suppress sensitive codes for small groups: 3. Suppress sensitive codes for small groups: » CitizenshipCitizenship
» Year of immigration to ItalyYear of immigration to Italy
» Commune of work/study Commune of work/study
Anonymization example: Anonymization example: Italy, 1991
First assessmentFirst assessmentNote: population uniques are anonymized Note: population uniques are anonymized
after integrationafter integration
hist.umn.edu/~rmccaa/ipums-europe
13
EUROSTAT statistical anonymity EUROSTAT statistical anonymity standardsstandards
(Thorogood, 1999)(Thorogood, 1999)--all accepted by IPUMS---all accepted by IPUMS-
InternationalInternational» 1. small sample size1. small sample size» 2. limited geographical detail2. limited geographical detail» 3. top and bottom coding of unique categories3. top and bottom coding of unique categories» 4. signed non-disclosure agreement4. signed non-disclosure agreement» 5. 5. prohibit redistribution of datasets to third prohibit redistribution of datasets to third
partiesparties» 6. 6. prohibit attempts to identify individuals or prohibit attempts to identify individuals or
the making of any claim to that affectthe making of any claim to that affect» 7. 7. require users to provide copies of require users to provide copies of
publicationspublications
hist.umn.edu/~rmccaa/ipums-europe
14
EUROSTAT statistical anonymity EUROSTAT statistical anonymity standardsstandards
(Thorogood, 1999)(Thorogood, 1999)--all accepted by IPUMS--all accepted by IPUMSii and more and more» 8. Age (constructed from birth date, where necessary)8. Age (constructed from birth date, where necessary)» 9. Never identify date of birth9. Never identify date of birth» 10. Never identify place of birth10. Never identify place of birth» 11. Migration: timing and place not identified in 11. Migration: timing and place not identified in
detaildetail» 12. Place of residence identified by major civil division 12. Place of residence identified by major civil division
(pop>60k, 120k, 250k, 1 million--national rule) (pop>60k, 120k, 250k, 1 million--national rule) » 13. Sensitivity analysis of variables by national experts13. Sensitivity analysis of variables by national experts» 14. Confidentiality assessment by national experts 14. Confidentiality assessment by national experts
hist.umn.edu/~rmccaa/ipums-europe
15
Sister-project: IPUMS-Latin Sister-project: IPUMS-Latin America: America:
17 countries, ~500 million pop., 5 17 countries, ~500 million pop., 5 census roundscensus rounds
80+ samples, 100+ million person 80+ samples, 100+ million person recordsrecords» Scope: Scope: Latin AmericanLatin American census microdata, 1960-present
» Work Plan (Work Plan ( funded by National Institutes of funded by National Institutes of Health)Health)» 22222 2 n l ns n r m nts w t o l ig ice i g ag ee e i h fficia
n sage cie» 2002: Obtain funding from U.S. NIH2002: Obtain funding from U.S. NIH» 2003: Develop/translate microdata & metadata 2003: Develop/translate microdata & metadata » 2004: Country expert teams design national integrations2004: Country expert teams design national integrations» 2005: MPC/expert teams design regional integration2005: MPC/expert teams design regional integration» 2006: MPC anonymizes/integrates microdata and metadata2006: MPC anonymizes/integrates microdata and metadata» 2007: MPC disseminates to bona fide researchers who sign 2007: MPC disseminates to bona fide researchers who sign
non-disclosure license. non-disclosure license. National census/data/research institutes may National census/data/research institutes may distribute national versions via CDs/web.distribute national versions via CDs/web.
hist.umn.edu/~rmccaa/ipums-europe
16
IPUMS-Europe Partnership: IPUMS-Europe Partnership: More…More…
» CensusesCensuses: 1960s – 2000, where microdata : 1960s – 2000, where microdata
existexist
» CountriesCountries: >350 million population, : >350 million population,
16, inclined at present (16, inclined at present (* * = signed): = signed):
Austria, Bulgaria, Czech RepublicAustria, Bulgaria, Czech Republic**, ,
FranceFrance**, Germany, Greece, Ireland, , Germany, Greece, Ireland,
Israel, HungaryIsrael, Hungary**, Poland, Portugal, , Poland, Portugal,
Romania, SloveniaRomania, Slovenia**, Spain, Spain**, ,
Switzerland, TurkeySwitzerland, Turkey
» ResearchResearch: more knowledge, more users: more knowledge, more users
hist.umn.edu/~rmccaa/ipums-europe
17
IPUMS-Europe Partnership: IPUMS-Europe Partnership: More uniformity…More uniformity…
» LegalLegal: signed memorandum of understanding: signed memorandum of understanding
» AdministrativeAdministrative: restricted to approved users; : restricted to approved users;
strong enforcement procedures strong enforcement procedures
» Sample designSample design: every n: every nth th household household
» AnonymizationAnonymization: includes corrupting data: includes corrupting data
» IntegrationIntegration: more variables, composite coding: more variables, composite coding
» Dissemination: extract custom-tailored : extract custom-tailored
datasets, never entire samplesdatasets, never entire samples
hist.umn.edu/~rmccaa/ipums-europe
18
Advantages…Advantages…proven record of proven record of
accomplishments:accomplishments:
» Uniform legal protocols Uniform legal protocols
» Substantial institutional infrastructureSubstantial institutional infrastructure
» Experienced census microdata integratorsExperienced census microdata integrators
» Cost-effective academic environmentCost-effective academic environment
» Sustained funding from National Science Sustained funding from National Science
Foundation, National Institutes of HealthFoundation, National Institutes of Health
» Successful web-based distribution system: Successful web-based distribution system:
users!!
hist.umn.edu/~rmccaa/ipums-europe
19
Advantages of IPUMS-Advantages of IPUMS-InternationalInternational
» Comparability: Comparability:
data are rigorously integrated; data are rigorously integrated;
documentation is extensive, both primary documentation is extensive, both primary
(from NSIs) and integrated (from MPC)(from NSIs) and integrated (from MPC)
» AccountabilityAccountability: :
reports on users, usage and publications reports on users, usage and publications
advisory board of statisticians and advisory board of statisticians and
scientists scientists
» SustainabilitySustainability: MPC, ICPSR: MPC, ICPSR
hist.umn.edu/~rmccaa/ipums-europe
20
IPUMS-Europe, 2004-2008: IPUMS-Europe, 2004-2008: coveragecoverage
~20 countries, representing ~20 countries, representing ~400m. people~400m. people
» Scope: Scope: European European census microdata, 1950-presentcensus microdata, 1950-present» Work Plan (Work Plan (contingent upon fundingcontingent upon funding))
» 2003: Sign licensing agreements with census agencies2003: Sign licensing agreements with census agencies Obtain funding from US NIH Obtain funding from US NIH
» 2004: Develop/translate microdata & metadata 2004: Develop/translate microdata & metadata » 2005: Country expert teams design national integrations2005: Country expert teams design national integrations» 2006: MPC/expert teams design regional integration2006: MPC/expert teams design regional integration» 2007: MPC integrates microdata and metadata2007: MPC integrates microdata and metadata» 2008: MPC disseminates to bona fide researchers who sign2008: MPC disseminates to bona fide researchers who sign
non-disclosure license. non-disclosure license. National census/data/research institutes via CDs/web. National census/data/research institutes via CDs/web.
hist.umn.edu/~rmccaa/ipums-europe
21
IINNTTEERRNNAATTIIOONNAALL
II
PP
UU
MM
SS
» » Easy-to-use web-Easy-to-use web-interfaceinterface» » Highest scientific Highest scientific standardsstandards» » Proven, powerful Proven, powerful integrationintegration» » A quantum leap in usageA quantum leap in usage
Imagine a new statistical Imagine a new statistical productproduct: : scientifically anonymized, scientifically anonymized, integrated census microdata integrated census microdata samples made up of samples made up of unidentifiable individuals...unidentifiable individuals...
» » 1998: 1 country signed1998: 1 country signed
» » 1999: 3 countries1999: 3 countries
» » 2000: 9 2000: 9
» » 2001: 15 2001: 15
» » 2002: 32; first release, 6 2002: 32; first release, 6
countriescountries
hist.umn.edu/~rmccaa/ipums-europe
22
RREESSCCUUEESS
UN Demographic Center for Latin UN Demographic Center for Latin America America
(CELADE, Santiago, Chile)(CELADE, Santiago, Chile)~3000 microdata tapes recovered~3000 microdata tapes recovered
IIPPUUMMSSii
and metadata (documentation)and metadata (documentation)
hist.umn.edu/~rmccaa/ipums-europe
23
PPAAYYSS
II
PP
UU
MM
SSii
»Assembling microdata and Assembling microdata and documentationdocumentation
»Developing samplesDeveloping samples» to minimize confidentiality risksto minimize confidentiality risks
» and to maximize robustnessand to maximize robustness
»Designing national integration Designing national integration planplan
»census-by-censuscensus-by-census
»concept-by-conceptconcept-by-concept
»code-by-codecode-by-code
»Writing integrated Writing integrated documentation documentation
National experts in each National experts in each country are contracted country are contracted
to assist with:to assist with:
hist.umn.edu/~rmccaa/ipums-europe
24
PPAARRTTNNEERRSSHHIIPP
Photos from Colombia Photos from Colombia integration projectintegration project, February, February--
March, 2000:March, 2000:4 experts from DANE (census 4 experts from DANE (census
office)office)+7 academics (3 universities)+7 academics (3 universities)
IIPPUUMMSSii
Standard:UN/Standard:UN/Eurostat Eurostat Principles & Principles & Recs...Recs...
Census Census documentation documentation compiled for compiled for Colombian Colombian microdatamicrodata
hist.umn.edu/~rmccaa/ipums-europe
25
IPUMSIPUMSii integration principles integration principles
» 1. 1. RespectRespect absolute anonymity and confidentiality absolute anonymity and confidentiality» 2. 2. PreservePreserve all original data, except adjustments all original data, except adjustments
to insure privacy (top codes, blurrings, masking, to insure privacy (top codes, blurrings, masking, re-ordering, etc.)re-ordering, etc.)
» 3. 3. HarmonizeHarmonize codes using international codes using international standardsstandardsoccupation: ISCO-88occupation: ISCO-88 (detailed, general) (detailed, general)education: ISCED education: ISCED “ “ “ “family: IPUMS, etc. family: IPUMS, etc. “ “ “ “
» 4. 4. EnhanceEnhance with constructed variables with constructed variables
hist.umn.edu/~rmccaa/ipums-europe
26
Composite coding scheme Composite coding scheme example:example:
marital statusmarital status
Code Label 64 73 85 93 62 68 75 82 90 89 99 60 70 90 00 60 70 80 90 89 99
100 SINGLE/NEVER MARRIED X X X X X X X X X X X X X X X X X X X X X
MARRIED/IN UNION
210 Married (not specified) X X X X X X X X X X X . . . . X X X X X X
211 Civil . . . . . . . . . . . X X X X . . . . . .
212 Religious . . . . . . . . . . . X X X X . . . . . .
213 Civil and religious . . . . . . . . . . . X X X X . . . . . .
214 Polygamous . . . . . . . . . X X . . . . . . . . . .
220 Consensual union X X X X . . . . . . . X X X X . . . . . .
SEPARATED/DIVORCED/SPOUSE ABSENT
310 Separated or Divorced . X X X . . . . . . . . . . . . . . . . .
320 Separated . . . . . . . . . X X . X X X X X X X X X
330 Divorced . . . . X X X X X X X X X X X X X X X X X
340 Married, spouse absent (n.s.) X X X X X X X X X X X . . . . X X X X X X
341 MSA, civil . . . . . . . . . . . X X X X . . . . . .
342 MSA, religious . . . . . . . . . . . X X X X . . . . . .
343 MSA, civil and religious . . . . . . . . . . . X X X X . . . . . .
344 MSA, polygamous . . . . . . . . . X X . . . . . . . . . .
350 Consensual union, spouse absent X X X X . . . . . . . X X X X . . . . . .
400 WIDOWED X X X X X X X X X X X X X X X X X X X X X
999 UNKNOWN/MISSING . X X X . . . . . X X X X X X . . . . X X
Coding Scheme and Category Availability for Marital Status
Colombia France Kenya Mexico United States Vietnam
hist.umn.edu/~rmccaa/ipums-europe
27
Occupation: the ISCO standard, Occupation: the ISCO standard, preliminary releasepreliminary release: “1” digit: “1” digit
finalfinal: 2-3 or 4 digit, depending : 2-3 or 4 digit, depending upon countryupon country
OCCUPATION, ISCO
01 Legislators, senior officials and managers X X . . X X X X X . X X X X X X X X X . X
02 Professionals X X . . X X X X X . X X X X X X X X X . X
03 Technicians and associate professionals X X . . X X X X X . X X X X X X X X X . X
04 Clerks X X . . X X X X X . X X X X X X X X X . X
05 Service workers and shop and market sales X X . . X X X X X . X X X X X X X X X . X
06 Skilled agricultural and fishery workers X X . . X X X X X . X X X X X X X X X . X
07 Crafts and related trades workers X X . . X X X X X . X X X X X X X X X . X
08 Plant and machine operators and assemblers X X . . X X X X X . X X X X X X X X X . X
09 Elementary occupations X X . . X X X X X . X X X X X X X X X . X
10 Armed forces X . . . X X X X X . X X X X X X X X X . .
98 Unknown X X . . . . . . X . X X X X X . . . . . .
99 N/A X . . . X X X X X . X X X X X X X X X . X
Code Label 64 73 85 93 62 68 75 82 90 89 99 60 70 90 00 60 70 80 90 89 99
Coding Schemes and Category Availability for Occupation
Colombia France Kenya Mexico United States Vietnam
hist.umn.edu/~rmccaa/ipums-europe
28
Variable availability, preliminary Variable availability, preliminary releaserelease
64 73 85 93 62 68 75 82 90 89 99 60 70 90 00 60 70 80 90 89 99Geography and internal migration
Place of usual residence x x x x x x x x x x x x x x x x x x x x xPlace of birth x x x x x x x x x x x x x x x x x x x . .Duration of residence x x . . . . . . . . x x x . . x x x x . .Place of previous residence x x . . . . . . . . . x x . . . . . . . .Place of residence at a specified date in the past . . x x x x x x x x x . . x x x x x x x x
Household and family structureRelationship to head of household/householder x x x x x x x x x x x x x x x x x x x x x
Demographic and socialSex x x x x x x x x x x x x x x x x x x x x xAge x x x x x x x x x x x x x x x x x x x x xMarital Status x x x x x x x x x x x x x x x x x x x x xCitizenship . . . . x x x x x x x x . . . . x x x . .Religion . . . . . . . . . . x x x x x . . . . . xLanguage . . . . . . . . . . . . x x x . . x x . .National and/or ethnic group . . . x . . . . . x x x . . x x x x x x x
Fertility and mortalityChildren ever born . x x x . . . . . x x x x x x x x x x x xChildren living . x x x . . . . . x x . . . x . . . . x xDate of birth of last child born alive . x . x . . . . . x x . . . x . . . . x xDeaths in the past 12 months . . . . . . . . . . . . . . . . . . . . xMaternal or paternal orphanhood . . . . . . . . . x x . . . . . . . . . .Age, date or duration of first marriage . . . . . . . . . . . . . . . x x x . . .
EducationLiteracy x x x x . . . . . x . x x x x . . . . x xSchool attendance . x x x . . . . . x x . x x x x x x x x xEducational attainment x x x x x x x x x x x x x x x x x x x x xField of education and educational qualification . . . . x x . . . . . . . . x . . . . x x
EconomicsActivity status x x x x x x x x x x x . x x x x x x x x xTime worked x . x . . . . . . . . x x x x x x x x . .Occupation x x . x x x x x x . x x x x x x x x x xIndustry x x . x x x x x x . . x x x x x x x x x xStatus in employment x x x x x x x x x x . x x x x x x x x . .Income . . . . . . . . . . . . x x x x x x x . .Institutional sector of employment . . . . x x x x x . . . . . . . . . . . xPlace of work . . . . x x x x x . . . . . x x x x x . .
International migrationCountry of birth x x x x x x x x x x x x x x x x x x x . .Citizenship . . . . x x x x x . . . . . . x x x x . .Year or period of arrival . . x . . . . . . . . . . . . . x x x . .
DisabilityDisability . . . x x . . . . . . . . . x . x x x . .Cause of disability . . . . x . . . . . . . . . x . . . . . .
Selected Variable Topic Availability, by Country and Census Year
Colombia France Kenya Mexico United States Vietnam
hist.umn.edu/~rmccaa/ipums-europe
29
DDIISSSSEEMMIINNAATTEESS
II
PP
UU
MM
SSii
Legally-binding license Legally-binding license agreement agreement
»protects privacy and protects privacy and confidentialityconfidentiality»assures proper useassures proper use»new sanction: loss of employment.new sanction: loss of employment.
Researcher selects Researcher selects »countriescountries»censusescensuses»cases/sub-populationscases/sub-populations»variables variables »sample densitiessample densities
»Facilitates comparative Facilitates comparative researchresearch
Web-based extraction Web-based extraction system system
hist.umn.edu/~rmccaa/ipums-europe
30
additional information at:additional information at:
www.hist.umn.edu/~rmccaa/ipums-www.hist.umn.edu/~rmccaa/ipums-europeeurope
contact:contact:
[email protected]@umn.edu
* * * * * * * * * *
Thank youThank you
hist.umn.edu/~rmccaa/ipums-europe
31
IPUMS-Europe, 2004-2008: IPUMS-Europe, 2004-2008: coveragecoverage
~20 countries, representing ~20 countries, representing ~400m. people~400m. people
» Scope: Scope: EuropeanEuropean census microdata, 1950-present» Work Plan (Work Plan (contingent upon fundingcontingent upon funding))
» 2003: Sign licensing agreements with census agencies Obtain funding from US NIH Obtain funding from US NIH
» 2004: Develop/translate microdata & metadata 2004: Develop/translate microdata & metadata » 2005: Country expert teams design national integrations2005: Country expert teams design national integrations» 2006: MPC/expert teams design regional integration2006: MPC/expert teams design regional integration» 2007: MPC integrates microdata and metadata2007: MPC integrates microdata and metadata» 2008: MPC disseminates to bona fide researchers who 2008: MPC disseminates to bona fide researchers who
signsign non-disclosure license. non-disclosure license. National census/data/research institutes via National census/data/research institutes via CDs/web.CDs/web.