Upload
jordan-johnson
View
213
Download
1
Embed Size (px)
Citation preview
11
ArchivingArchiving
Michael J. LevinMichael J. LevinHarvard Center for Harvard Center for
Population and Development Population and Development StudiesStudies
[email protected]@yahoo.com
22
Two types of “Archiving”Two types of “Archiving”
I.I. DataData
II.II. MetadataMetadata
33
I. Data archivingI. Data archiving
• Every effort must be made to keep all Every effort must be made to keep all versions of the data set. versions of the data set.
• Separate series of data sets need to be Separate series of data sets need to be preserved for the pilot, the census itself, preserved for the pilot, the census itself, and the PES. and the PES.
• For the census, this data archiving needs For the census, this data archiving needs to start with the output of the scanning or to start with the output of the scanning or keying operation – the completely keying operation – the completely unedited data.unedited data.
44
Why preserve the unedited Why preserve the unedited datadata• The most important reason to keep unedited The most important reason to keep unedited
data is because they are closest to the data is because they are closest to the respondents, and, therefore represent the respondents, and, therefore represent the “thoughts and feelings” before coding, editing, “thoughts and feelings” before coding, editing, and tabulation operations. and tabulation operations.
• As staff edit the data, they can refer back to As staff edit the data, they can refer back to this data set, as needed, to see changes are this data set, as needed, to see changes are being made, at the individual level, and being made, at the individual level, and through frequency distributions, at the through frequency distributions, at the aggregate level.aggregate level.
55
Another reason to keep the Another reason to keep the original, unedited dataoriginal, unedited data
• As new demographic and other direct As new demographic and other direct and indirect techniques are developed, and indirect techniques are developed, they can be tested on these data. they can be tested on these data.
• Without the original data, techniques Without the original data, techniques developed to alleviate systematic developed to alleviate systematic problems in these data, or census data problems in these data, or census data in general, cannot be tested as easily – in general, cannot be tested as easily – or, in some cases, at all. or, in some cases, at all.
66
Keeping original responses on Keeping original responses on the recordsthe records
• Original responses should be kept on Original responses should be kept on the population and housing records the population and housing records as part of the editing process. as part of the editing process.
• In this way, both original and edited In this way, both original and edited responses are always available to responses are always available to staff and researchers staff and researchers
• For some items – e.g., fertility – For some items – e.g., fertility – intermediate values also keptintermediate values also kept
77
FlagsFlags
• Countries use flags to indicate Countries use flags to indicate changes in individual items include changes in individual items include these on the final, archived data these on the final, archived data
• – “– “no/yes” flag no/yes” flag
• -- a more complicated scheme-- a more complicated scheme
88
The final data setThe final data set
• The final data set should be named in a The final data set should be named in a strong, unambiguous way for current and strong, unambiguous way for current and future stafffuture staff
• A country may choose to have several A country may choose to have several “final” data sets. “final” data sets.
• For most purposes, neither the original For most purposes, neither the original data nor the flags are needed for daily data nor the flags are needed for daily work in the office, that is, answering user work in the office, that is, answering user requests. requests.
99
De facto and De Jure data De facto and De Jure data setssets
• Three groups: Three groups: (1)(1) respondents resident in the household, respondents resident in the household, (2)(2) visitors to the household, and visitors to the household, and (3)(3) persons usually resident in the household but away persons usually resident in the household but away
on the reference date. on the reference date.
• So, the de facto file would have the persons So, the de facto file would have the persons indicating (1) or (2) and indicating (1) or (2) and
• the de jure data set would have those indicating (1) the de jure data set would have those indicating (1) and (3). and (3).
• And, no “Universe” would need to be selected for And, no “Universe” would need to be selected for these runs.these runs.
1010
II. Meta dataII. Meta data
• ““Meta data” – basically “data about data” Meta data” – basically “data about data” of any sort in any medium. of any sort in any medium.
• Meta data – text, tables, charts, maps, and Meta data – text, tables, charts, maps, and other images that describe what users other images that describe what users want or need to know about the census or want or need to know about the census or survey. survey.
• The users include individuals and groups. The users include individuals and groups.
• The census meta data – aids in clarifying The census meta data – aids in clarifying and finding the actual data. and finding the actual data.
1111
More on metadataMore on metadata
• The meta data include the definitions of the items, The meta data include the definitions of the items, their use, their interactions, information about the their use, their interactions, information about the pretest and the post-enumeration survey, daily pretest and the post-enumeration survey, daily records of progress, weekly reports, monthly records of progress, weekly reports, monthly reports, and reports by activity.reports, and reports by activity.
• The data-processing metadata include the structure The data-processing metadata include the structure of the data dictionary, the keying screens for keyed of the data dictionary, the keying screens for keyed data and verifying screens for scanned data, data and verifying screens for scanned data, structure and content edits, the tabulations, and structure and content edits, the tabulations, and dissemination plans and activities. dissemination plans and activities.
• And, the procedural history of the census below. And, the procedural history of the census below.
1212
The Procedural HistoryThe Procedural History
• Crucial to the complete success of a Crucial to the complete success of a census census
• Without it, even the best tables could Without it, even the best tables could become lost become lost
• As well as the ability to make As well as the ability to make subsequent tables after the end of subsequent tables after the end of the initial processing.the initial processing.
1313
Each step in the processEach step in the process
• From the very beginning of the census operations. From the very beginning of the census operations. • ““what we did we do the last time”what we did we do the last time”• Each operation needs to be recorded Each operation needs to be recorded when it starts, when it starts, when it ends, when it ends, what is expected to be done, what is expected to be done, what is actually done, what is actually done, problems encountered, and problems encountered, and knowledge gained. knowledge gained. • Sometimes a form is created to allow for filling in Sometimes a form is created to allow for filling in
the blanks as individual operations take place. the blanks as individual operations take place.
1414
Dedicated staffDedicated staff
• Group of staff (or in very small operations, a Group of staff (or in very small operations, a single staff member) should be assigned to single staff member) should be assigned to collect for each operation the:collect for each operation the:
questionnaires, questionnaires, forms and manuals, forms and manuals, dictionaries and screens, dictionaries and screens, edits and tabulations, and edits and tabulations, and metadata metadata • These various pieces of information need to be These various pieces of information need to be
put in a data base or umbrella directory (like the put in a data base or umbrella directory (like the TRS) and indexed for easy access both during TRS) and indexed for easy access both during the census and subsequently. the census and subsequently.
1515
Documenting table seriesDocumenting table series
Include:Include:• The item or itemsThe item or items• DefinitionsDefinitions• How the question was asked How the question was asked • How the information derived from this How the information derived from this
question is used for planning and policy question is used for planning and policy formation.formation.
• Limitations of the data item or items Limitations of the data item or items • Compatibility with other censuses and Compatibility with other censuses and
surveys is also helpful to users. surveys is also helpful to users.
1616
Finally!Finally!
• All metadata, including All metadata, including Publicity announcements – both on paper Publicity announcements – both on paper
and electronic announcements – and electronic announcements – notes, memos, emails, and so forth notes, memos, emails, and so forth • need to be saved and organized, by date need to be saved and organized, by date
and topic. and topic. • It is only by being able to see the scope It is only by being able to see the scope
and flow of work, that the best planning and flow of work, that the best planning can be done. can be done.