16
1 Archiving Archiving Michael J. Levin Michael J. Levin Harvard Center for Harvard Center for Population and Development Population and Development Studies Studies [email protected] [email protected]

1 Archiving Michael J. Levin Harvard Center for Population and Development Studies [email protected]

Embed Size (px)

Citation preview

Page 1: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

11

ArchivingArchiving

Michael J. LevinMichael J. LevinHarvard Center for Harvard Center for

Population and Development Population and Development StudiesStudies

[email protected]@yahoo.com

Page 2: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

22

Two types of “Archiving”Two types of “Archiving”

I.I. DataData

II.II. MetadataMetadata

Page 3: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

33

I. Data archivingI. Data archiving

• Every effort must be made to keep all Every effort must be made to keep all versions of the data set. versions of the data set.

• Separate series of data sets need to be Separate series of data sets need to be preserved for the pilot, the census itself, preserved for the pilot, the census itself, and the PES. and the PES.

• For the census, this data archiving needs For the census, this data archiving needs to start with the output of the scanning or to start with the output of the scanning or keying operation – the completely keying operation – the completely unedited data.unedited data.

Page 4: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

44

Why preserve the unedited Why preserve the unedited datadata• The most important reason to keep unedited The most important reason to keep unedited

data is because they are closest to the data is because they are closest to the respondents, and, therefore represent the respondents, and, therefore represent the “thoughts and feelings” before coding, editing, “thoughts and feelings” before coding, editing, and tabulation operations. and tabulation operations.

• As staff edit the data, they can refer back to As staff edit the data, they can refer back to this data set, as needed, to see changes are this data set, as needed, to see changes are being made, at the individual level, and being made, at the individual level, and through frequency distributions, at the through frequency distributions, at the aggregate level.aggregate level.

Page 5: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

55

Another reason to keep the Another reason to keep the original, unedited dataoriginal, unedited data

• As new demographic and other direct As new demographic and other direct and indirect techniques are developed, and indirect techniques are developed, they can be tested on these data. they can be tested on these data.

• Without the original data, techniques Without the original data, techniques developed to alleviate systematic developed to alleviate systematic problems in these data, or census data problems in these data, or census data in general, cannot be tested as easily – in general, cannot be tested as easily – or, in some cases, at all. or, in some cases, at all.

Page 6: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

66

Keeping original responses on Keeping original responses on the recordsthe records

• Original responses should be kept on Original responses should be kept on the population and housing records the population and housing records as part of the editing process. as part of the editing process.

• In this way, both original and edited In this way, both original and edited responses are always available to responses are always available to staff and researchers staff and researchers

• For some items – e.g., fertility – For some items – e.g., fertility – intermediate values also keptintermediate values also kept

Page 7: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

77

FlagsFlags

• Countries use flags to indicate Countries use flags to indicate changes in individual items include changes in individual items include these on the final, archived data these on the final, archived data

• – “– “no/yes” flag no/yes” flag

• -- a more complicated scheme-- a more complicated scheme

Page 8: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

88

The final data setThe final data set

• The final data set should be named in a The final data set should be named in a strong, unambiguous way for current and strong, unambiguous way for current and future stafffuture staff

• A country may choose to have several A country may choose to have several “final” data sets. “final” data sets.

• For most purposes, neither the original For most purposes, neither the original data nor the flags are needed for daily data nor the flags are needed for daily work in the office, that is, answering user work in the office, that is, answering user requests. requests.

Page 9: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

99

De facto and De Jure data De facto and De Jure data setssets

• Three groups: Three groups: (1)(1) respondents resident in the household, respondents resident in the household, (2)(2) visitors to the household, and visitors to the household, and (3)(3) persons usually resident in the household but away persons usually resident in the household but away

on the reference date. on the reference date.

• So, the de facto file would have the persons So, the de facto file would have the persons indicating (1) or (2) and indicating (1) or (2) and

• the de jure data set would have those indicating (1) the de jure data set would have those indicating (1) and (3). and (3).

• And, no “Universe” would need to be selected for And, no “Universe” would need to be selected for these runs.these runs.

Page 10: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

1010

II. Meta dataII. Meta data

• ““Meta data” – basically “data about data” Meta data” – basically “data about data” of any sort in any medium. of any sort in any medium.

• Meta data – text, tables, charts, maps, and Meta data – text, tables, charts, maps, and other images that describe what users other images that describe what users want or need to know about the census or want or need to know about the census or survey. survey.

• The users include individuals and groups. The users include individuals and groups.

• The census meta data – aids in clarifying The census meta data – aids in clarifying and finding the actual data. and finding the actual data.

Page 11: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

1111

More on metadataMore on metadata

• The meta data include the definitions of the items, The meta data include the definitions of the items, their use, their interactions, information about the their use, their interactions, information about the pretest and the post-enumeration survey, daily pretest and the post-enumeration survey, daily records of progress, weekly reports, monthly records of progress, weekly reports, monthly reports, and reports by activity.reports, and reports by activity.

• The data-processing metadata include the structure The data-processing metadata include the structure of the data dictionary, the keying screens for keyed of the data dictionary, the keying screens for keyed data and verifying screens for scanned data, data and verifying screens for scanned data, structure and content edits, the tabulations, and structure and content edits, the tabulations, and dissemination plans and activities. dissemination plans and activities.

• And, the procedural history of the census below. And, the procedural history of the census below.

Page 12: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

1212

The Procedural HistoryThe Procedural History

• Crucial to the complete success of a Crucial to the complete success of a census census

• Without it, even the best tables could Without it, even the best tables could become lost become lost

• As well as the ability to make As well as the ability to make subsequent tables after the end of subsequent tables after the end of the initial processing.the initial processing.

Page 13: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

1313

Each step in the processEach step in the process

• From the very beginning of the census operations. From the very beginning of the census operations. • ““what we did we do the last time”what we did we do the last time”• Each operation needs to be recorded Each operation needs to be recorded when it starts, when it starts, when it ends, when it ends, what is expected to be done, what is expected to be done, what is actually done, what is actually done, problems encountered, and problems encountered, and knowledge gained. knowledge gained. • Sometimes a form is created to allow for filling in Sometimes a form is created to allow for filling in

the blanks as individual operations take place. the blanks as individual operations take place.

Page 14: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

1414

Dedicated staffDedicated staff

• Group of staff (or in very small operations, a Group of staff (or in very small operations, a single staff member) should be assigned to single staff member) should be assigned to collect for each operation the:collect for each operation the:

questionnaires, questionnaires, forms and manuals, forms and manuals, dictionaries and screens, dictionaries and screens, edits and tabulations, and edits and tabulations, and metadata metadata • These various pieces of information need to be These various pieces of information need to be

put in a data base or umbrella directory (like the put in a data base or umbrella directory (like the TRS) and indexed for easy access both during TRS) and indexed for easy access both during the census and subsequently. the census and subsequently.

Page 15: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

1515

Documenting table seriesDocumenting table series

Include:Include:• The item or itemsThe item or items• DefinitionsDefinitions• How the question was asked How the question was asked • How the information derived from this How the information derived from this

question is used for planning and policy question is used for planning and policy formation.formation.

• Limitations of the data item or items Limitations of the data item or items • Compatibility with other censuses and Compatibility with other censuses and

surveys is also helpful to users. surveys is also helpful to users.

Page 16: 1 Archiving Michael J. Levin Harvard Center for Population and Development Studies Michael.levin@yahoo.com

1616

Finally!Finally!

• All metadata, including All metadata, including Publicity announcements – both on paper Publicity announcements – both on paper

and electronic announcements – and electronic announcements – notes, memos, emails, and so forth notes, memos, emails, and so forth • need to be saved and organized, by date need to be saved and organized, by date

and topic. and topic. • It is only by being able to see the scope It is only by being able to see the scope

and flow of work, that the best planning and flow of work, that the best planning can be done. can be done.