196
Hold on, Jerry! The great dataset movie Screenplay and direction: Reto Hadorn Production: SIDOS and EU (MetaDater project) Co-production: DDI, CSDI, ICRC, Swiss household panel First show: 2005 IASSIST festival Edinburgh, 27 May 2005 Status: Conference draft

Hold on, Jerry!

  • Upload
    lesley

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Hold on, Jerry!. The great dataset movie. Screenplay and direction: Reto Hadorn Production: SIDOS and EU (MetaDater project) Co-production: DDI, CSDI, ICRC, Swiss household panel First show: 2005 IASSIST festival Edinburgh, 27 May 2005 Status: Conference draft. Plan. Introduction - PowerPoint PPT Presentation

Citation preview

  • Hold on, Jerry!The great dataset movie

    Screenplay and direction: Reto HadornProduction: SIDOS and EU (MetaDater project)Co-production: DDI, CSDI, ICRC, Swiss household panel

    First show: 2005 IASSIST festivalEdinburgh, 27 May 2005

    Status: Conference draft

  • Plan

    IntroductionThe movieIssues and puzzles

  • Introduction

  • This show is part of a bundle of documents, which describe in abstract terms the handling of the single datasets in the context of a repeated comparative study. This bundle includes: text diagram this movie.

    The stance is that the whole story is not a complicated one, as far as the definitions of the involved objects are precise and one problem is handled at a time. Yet the communication of abstract ideas is not unproblematic. Starting with a textual description of the construct, the presentation has been expanded to a diagram supposed to include all aspects and, since that diagram is unfortunately static, with the present movie, in which the diagram is built up step by step.

    The movie has been conceived at the start for a presentation of the issue at the 2005 ISSIST Conference in Edinburgh. A meeting of the Expert group of the DDI-Alliance has take place just before the congress itself, where a grouping concept was presented, which was discussed in depth in the comparative datasets working group and several informal and casual meetings. The discussions have shown how important a clear presentation of the relationships between the elements of the so called repeated comparative study are. This was an additional motivation for making a more didactical presentation of the whole story.It remains that the ideas presented here are especially related to the MetaDater project, which needs a metadata model to put it in action, and to the SIDOS prototype for a variable level relational database, where relationships between questions and variables belonging to various country datasets (cross-national studies) or waves (panel study) were tested.

    Efforts were made to keep all three documents consistent. Some inconsistencies may nevertheless subsist, since they have all their own story, being partly developed in distinct contexts.

    To have a full view of the issue, it is recommended to refer to all three documents cited above and, in addition, to the following ones:Workflow diagram for new questions Description of dataset selection process description of question typology (2 documents).

    Lets now turn back to the presentation.

  • The backgroundMetaDater project: creating metadata structures accounting for the production of repeated cross-national datasetsThe CSDI: create better documentation of the cross-national programsThe DDI: building metadata structures supporting the identification of comparable data

  • The challengesSupport the life-cycle (long term)Definition of the research programCreation of the research instrument (multilingual)Completing the field workProcessing data and metadataDistributing data and metadataRepurposing

  • The challengesEconomy in metadata capture and managementNormal formRe-use information by referenceCopy information for edition in the case it is just varying

  • The challengesBest documentation of:Relationships between the studies involvedSeries of questions and variablesPossible variations within seriesThe construction of new variables for harmonization

  • The challengesDocument by doing!Data are handled from within the metadata systemDocumentation of the operations is largely a byproduct of doing themCorrections on data fileVariable constructionDefining harmonized variablesComputation of harmonized variables

  • The challengesTo support

    analysis of the variations among simple datasets to be integrated or cumulatedintegration of country datasets, cumulation of waves and cumulation of integrated datasetspublication of metadata at any stage

  • The challengesMake several forms of metadata publication possible:The sets of datasets (country DS or waves) as they are collected

    The integrated dataset (space), the cumulated dataset (time)

  • The challengesMake several forms of metadata publication possible:The sets of datasets (country DS or waves) as they are collected full navigable documentationThe integrated dataset (space), the cumulated dataset (time) all documentation drawn from the single datasets and harmonization operations into a synthetic presentation

  • The Movie

  • At the beginning was the studyConceptsData schemaSampleDataStudy

  • A study can be funded successively under several research projects,identified by their title and described by a summaryConceptsData schemaSampleDataStudyFundsProject

  • Usually, the funding goes to services and authorsConceptsData schemaSampleDataStudyFundsAuthorsProject

  • Lets forget about them for the momentConceptsData schemaSampleDataStudyProject

  • Maybe the Concepts should be migrated to the project; usually they arebest expressed in terms of what researchers plan to do

    Data schemaSampleDataStudyConceptsProject

  • Let this question open until we will have to distribute elementsAmong the main objects: here already project and Study, laterperhaps moreConceptsData schemaSampleDataStudyProject

  • A study can be funded successively under several research projects(association)ConceptsData schemaSampleDataStudyProject 1Project 2Project 3

  • There are also activities related to the study, which depend onthe stations the data go through over life:ConceptsData schemaSampleDataStudyProject 1Project 2Project 3Data collectionData processingData publicationData repurpousingData depositData distributionEtc.Some repeated,others not

    Actors, AuthorityTime, Content inVarious formsActivities:

  • Lets forget the activities and turn back to the projectsConceptsData schemaSampleDataStudyProject 1Project 2Project 3Data collectionData processingData publicationData repurpousingData depositData distributionEtc.Some repeated,others not

    Actors, AuthorityTime, Content inVarious formsActivities:

  • Lets forget the activities and turn back to the projectsConceptsData schemaSampleDataStudyProject 1Project 2Project 3

  • Lets forget the activities and turn back to the projectsConceptsData schemaSampleDataStudy

  • Most studies are actually produced within one single project:ConceptsData schemaSampleDataStudyProject

  • ConceptsData schemaSampleDataConceptsData schemaSampleDataIn some research projects, you will actually fund more than one studyConceptsData schemaSampleDataStudyProjectso we need a good model for the project-study relationschip!

  • ConceptsData schemaSampleDataConceptsData schemaSampleDataIn some research projects, you will actually fund more than one studyConceptsData schemaSampleDataStudyProjectso we need a good model for the project-study relationschip!Survey ofteachersSurvey of parentsSurvey ofpupils

  • ConceptsData schemaSampleDataConceptsData schemaSampleDataThe overall concepts would be expressed on the level of the projectConceptsData schemaSampleDataStudyProjectso we need a good model for the project-study relationschip!Survey ofteachersSurvey of parentsSurvey ofpupilsOverall conceptsand methodology

  • ConceptsData schemaSampleDataConceptsData schemaSampleDataThis is already a bit complicated, so lets forget about the project..ConceptsData schemaSampleDataStudyProject

  • ConceptsData schemaSampleDataConceptsData schemaSampleDataConceptsData schemaSampleDataStudy

  • ConceptsData schemaSampleDataConceptsData schemaSampleDataConceptsData schemaSampleDataStudyLets keep just one study

  • ConceptsData schemaSampleDataStudyLets keep just one study

  • ConceptsData schemaSampleDataStudybut let it be a cross-national study:

  • ConceptsData schemaSampleDataStudy (cross-national)but let it be a cross-national study:we need more space!

  • ConceptsData schemaSampleDataCross-national studybut let it be a cross-national study:

  • ? ConceptsData schemaSampleData

    Cross-national studyWhat happens to the contents?

  • SampleData

    Cross-national studyConcepts and data schema belong to the StudyConcepts, Data schema

  • SampleData

    Cross-national studyIn a cross-national study, on compares sets of data collected on distinct samples, specific to each countryConcepts, Data schema

  • Cross-national studyFine. Lets call those sets of data datasetsConcepts, Data schemaSampleDataSampleDataDataset Country 1Dataset Country 2Now, the samples are distinct, but must be similar:

  • We have starte with the study, say, the DDI-study, and we have nowthree main objects, which must be desinguished because of the more complex relationships in a complex study:ProjectStudyDatasetm:nn:1

  • Cross-national studySo we need a sample type on study level, which prescribes what the samples should be:Concepts, Data schema, Sample typeSampleDataSampleDataDataset Country 1Dataset Country 2Will country 2 really strictly conform to the data schema?

  • Cross-national studyWell more or less; sometimes less. They have their own view:Concepts, Data schema, Sample typeSampleDataSampleDataDataset Country 1Dataset Country 2Country 2 StudyConcepts, Data schemawhich just overlaps with the overall data schema, so

  • Cross-national studyGeneral case:Concepts, Data schema standard, Sample type, Coordination workSampleDataSampleDataDataset Country 1Dataset Country 2Country 2 StudyConcepts, Data schema,Implementation workReference

  • Cross-national studyConcepts, Data schema standard, Sample type, Coordination workSampleDataSampleDataDataset Country 1Dataset Country 2Country 2 StudyConcepts, Data schema,Implementation workThis is a catRepresent the data schema standard as a dataset:General case:

  • Dataset Country 1Cross-national studySampleDataConcepts, Data schema standard, Sample type, Coordination workDataset Country 2Country 2 StudyConcepts, Data schema,Implementation workRepresent the data schema standard as a dataset:Clean some space to put the standard:SampleDataDataset Country 2

  • SampleDataDataset Country 1The definition of the standard data schemahas the same structure type as a datasetConcepts, Sample type, Coordination workData schemaSampleDataDataset standardDataset Country 2Concepts, Data schema,Implementation work

  • SampleDataDataset Country 1Characteristics:Country name = StandardRefered to by other country datasetsConcepts, Sample type, Coordination workData schemaSampleDataDataset standardDataset Country 2Concepts, Data schema,Implementation work

  • SampleDataDataset Country 1The sample type, previously defined onstudy level, migrates to the standarddataset definition as part of the data schemaConcepts, Coordination workSample typeData definitionSampleDataDataset standardDataset Country 2Concepts, Data schema,Implementation workNow, we still have to go deeper into it forget of the studies!

  • SampleDataDataset Country 1The definition of the standard data schemaHas the same structure type as a datasetSample typeData definitionSampleDataDataset standardDataset Country 2

  • SampleDataDataset Country 1Sample typeData definitionSampleDataDataset standardDataset Country 2Forget even of the country datasets.

  • Sample typeData definitionDataset standardThe sample type belongs to the dataset standard as a framework:

  • Dataset standardNow, forget of the dataset standard, just concentrate on the data definition:Data definitionSample typeThe data definition appears to be the content of the dataset standard

  • Now, forget of the dataset standard, just concentrate on the data definition:Data definitionThe data definition appears to be the content of the dataset standard

  • and look into it:Data definition

  • Data definition

  • VariableQuestionTextual definitionDescription of Construction processSource-QuestionnaireRegistryData repositoryAuthority-ResearcherAdministrationData collectorIn a survey, the most basic definition is the question.Lets concentrate on it.A variable needs a definition

  • VariableQuestionThe most complex question structure is the generic case, from whichSimpler forms can be obtained in a process of simplification:Questions may have various structures, which must be replicatedin the database structureSimple questionItems questionMultiple responseBy answer instanceBy answer categoryGrid question

  • and make it a symbol:So lets take a representation of a question with some complexityto represent the generic question

    QuestionVariable 1Variable 2Variable 3

    Items questionMultiple responseBy answer instanceBy answer category

  • and even more simple:

    Question/Variable

  • Keep it small:Questions may have various structures, which must be replicatedin the database structure

  • Make it the standard definition:

  • StandardNow, we have a good representation of the standard definitionIn terms of standard questions and variablesExamples:

    ISSP questionnaire + standard setup

    ESS questionnaire + standard file

  • Remember, on higher levels we have the wrapping datasetSample typeExamples:

    ISSP questionnaire + standard setup

    ESS questionnaire + standard file

  • Examples:

    ISSP questionnaire + standard setup

    ESS questionnaire + standard file and even some kind of cross-national studyConcepts, coordination work

  • StandardNow, having the standard definition in our database, how would weMost economically enter the country definitions?

  • StandardCountry 1Create the country metadata using the standard:?

  • StandardCountry 1Create the country metadata using the standard:Derive!IdStIdC1IdStA self-reference of tables Question, rsp. Variable on themselves Makes selected information inheritable: Wordings (multilingual) Value domains (multilingual) Descriptors (indexes) and some other detailsIdStIdC1 / IdStSelf-references

  • StandardCountry 1Create the country metadata using the standard:Derive?IdStIdC1IdStQuestion structures are complex; the higher the complexity, the higher isthe probability for a small change somewhere in the structure What if a wording changes in one single language? What if an Interviewer instruction changes because of a change in the structure of the questionnaire?

    Another structure is necessary for changing Q/Vs; multiple structures are a factor of complexity in all processes to be programmed.IdStIdC1 / IdStSelf-references

  • StandardCountry 1Copy functionCreate the country metadata using the standard:Copy! You may also edit the copy (general solution)IdStIdC1IdStIdC1All information for the standard is copied, even the information,which remains constant.

    Redundance

    Independent versions

    Copes with variations!

  • StandardCountry 1

    Create reference link!Where is the reference from Country 1 to the Standard?IdStIdC1IdStIdC1Reference structure

  • StandardCountry 1IdentTargetIdentSource

    LinkType

    VariationTypeVariationCommentThe data element used for reference:IdStIdC1IdStIdC1Reference structure

  • StandardCountry 1IdentTargetIdentSource

    LinkType

    VariationTypeVariationCommentThe data element used for reference:IdStIdC1IdStIdC1For both questionsand variablesReference structure

  • StandardCountry 1Structural relationshipSpaceWording(ad libitum)The data element used for reference:IdStIdC1IdStIdC1IdentTargetIdentSource

    LinkType

    VariationTypeVariationCommentReference structure

  • StandardCountry 1SpaceTimeConstruction (var)Source (quest)Variable comparisonIdentTargetIdentSource

    LinkType

    VariationTypeVariationCommentReference structure

  • StandardCountry 1IdenticalWordingItem wordingValue structureResp. category wordingIdentTargetIdentSource

    LinkType

    VariationTypeVariationCommentReference structure

  • StandardCountry 1Copy and ReferenceSynthetic view of copy function and reference structure:

  • StandardCountry 1Synthetic view of copy function and reference structure:Copy and Reference

  • Now, lets look what happens with more than one country dataset!

  • Space-compound dataset (higher level dataset, incl. Documentation of variations)Sample type

  • Space-compound dataset (higher level dataset, incl. Documentation of variations)Sample typeOriginal country Q/VOn Q/V level, we may have 3 distinct situations:Identical Q/VVariant Q/V Type of variation Comment

  • Space-compound dataset (higher level dataset, incl. Documentation of variations)Sample typeStudy added

    Metadata can readilybe published in a hyperlinked format

  • Analysis of variations

    A country Q/V may be:identical to the standardvarying on the standardspecific to the country1Now, look at the integration process

  • Integrated1. Analysis of variations2. Automatic definition of varswhen all country vars identical,definition of harmonized var when necessary12Now, look at the integration process

  • Integrated1. Analysis of variations2. Automatic definition of varswhen all country vars identical,definition of harmonized var when necessary3. Copy of harmonized definition to single countrydatasets132Now, look at the integration process(as country-Specific Q/V)

  • Integrated1. Analysis of variations2. Automatic definition of varswhen all country vars identical,definition of harmonized var when necessary3. Copy of harmonized definition to single countrydatasets4. Computing of harmonized variables143244Now, look at the integration process(as country-Specific Q/V)

  • Integrated1. Analysis of variations2. Automatic definition of varswhen all country vars identical,definition of harmonized var when necessary3. Copy of harmonized definition to single countrydatasets4. Computing of harmonized variables5. Filling integrated datasetwith data1543244Now, look at the integration process(as country-Specific Q/V)

  • IntegratedLets come back to the studies involved

  • IntegratedProgram study (concepts)Compound dataset (sample type)Data definitions

  • IntegratedProgram study (concepts)Compound dataset (sample type)Data definitionsCountry study (local concepts)This is a catReference

  • IntegratedProgram study (concepts)Compound dataset (sample type)Data definitionsCountry study (local concepts)This is a catReferenceIntegration study?

  • IntegratedProgram study (concepts)Compound dataset (sample type)Data definitionsCountry study (local concepts)This is a catReferenceNo.Just additional elementsFor the description ofThe integration work

  • IntegratedProgram study (concepts)Compound dataset (sample type)Data definitionsCountry study (local concepts)This is a catReferenceNo.Just additional elementsFor the description ofThe integration work

    Now, to summarize, a more compact presentation, which will later allow to show more

  • IntegratedProgram study (concepts)Compound dataset (sample type)Data definitionsCountry study (local concepts)This is a catReferenceNo.Just additional elementsFor the description ofThe integration workTime?

  • IntegratedThis is a catReferenceTime!

    Just keep one dataset to start with

  • Wave 1

  • Wave 2Copy functionWave 1Use the information already in the database to create the metadatafor wave 2

  • Wave 2Reference structureWave 1Structural relationshipTimeWording(ad libitum)IdentTargetIdentSource

    LinkType

    VariationTypeVariationComment

  • Wave 2Copy and ReferenceSynthetic view of copy function and reference structure:Wave 1

  • Wave 2Wave 1Lets add waves:Copy and Reference

  • Wave 4Wave 3Wave 2Wave 1Hups

  • Wave 4Wave 3Wave 2Wave 1Hups

    There is no standardin a time design

  • Wave 1Lets start

  • Wave 1Wave 2 and repeat

  • Wave 1Wave 2Some questions and variables will be repeated exactly in the same formRepeatedVariation type = Identical No comment

  • Wave 1Wave 2others will be newNew

  • Wave 1Wave 2still others will present variations, without fully breaking up the seriesVariantVariation type = Wording Comment on the impact on meaning of the change in the wording

  • Wave 1Wave 2Wave 3Add more waves

  • Wave 1Wave 2Wave 3Simplify the presentation of the references

  • Wave 1Wave 2Wave 3Wave 4Wave 5Wave 6Wave 7Wave 8Q/V2Q/V1Q/V3Q/V6Q/V4Q/V5InterruptedNewChangingSeries of questions/variables

  • Wave 1Wave 2Wave 3Wave 4Wave 5Wave 6Wave 7Wave 8Q/V2Q/V1Q/V3(Q/V6)Q/V4Q/V5InterruptedNewChangingChange documentedSeries of questions/variablesSub-serieSub-serie

  • Wave 1Wave 2Wave 3Wave 4Q/V2Q/V1(Q/V6)Q/V5Simplifying the presentation of the longitudinal studySub-serieSub-serie

  • (Q/V6)Simplifying still moreLet this series of variables be a general representationof a series of datasets over time

  • enlarge it

  • This is actually a compound dataset, a time-compound dataset (higher level dataset)

  • which may be included in the longitudinal study

  • and be presented as a whole with metadata in a hyperlinked formatFour datasets

    One hyperlinked metadata product

  • And we can readily define a cumulated datasetCumulated dataset1. Analysis of variations2. Automatic definition of varswhen vars are identical overall waves, definition of harmo-nized var when necessary3. Copy of harmonized definition to single wavedatasets4. Computing of harmonized variables5. Filling cumulated datasetwith data

  • And we can readily define a cumulated datasetCumulated dataset including some wave-specific Q/V one two or three harmonized variables for rendering the whole serie andthe two sub-series

  • The overall study is here also the reference for the cumulated datasetCumulated datasetjust add a descriptionof the cumulation work

    and integrate into themetadata publication therelevant time-dependentInformation from the singlewaves

  • And we can readily define a cumulated dataset for integrationCumulated datasetNow, combine space and time

  • Imagine the standard definition evolves in time as a longitudinal study

  • Imagine the standard definition evolves in time as a longitudinal studySeries of successive standard definitions

  • Imagine the standard definition evolves in time as a longitudinal studyStandard

  • Now imagine the countries are distributed on the horizontal axisStandardC3C2C1 still symbolizing a country-specific variable in C1,an identical variable in C2 anda variant in C3Country definitions

  • the standard for the second cross-national wave can be definedusing the relationships for the longitudinal studyStandardC3C2C1Country definitions

  • and so on:StandardC3C2C1Country definitions

  • The country datasets appear to grow like the branches of a christmas treeStandardCountry definitionsC3C2C1C4T1T2T3T4

  • The country datasets appear to grow like the branches of a christmas treeStandardCountry definitionsC3C2C1C4T1T2T3T4

  • The country datasets appear to grow like the branches of a christmas treeStandardCountry definitionsC3C2C1C4T1T2T3T4

  • StandardCountry definitionsC3C2C1C4T1T2T3T4We get a space and time hyper-compound dataset (a higher level dataset of higher degree)

  • StandardCountry definitionsC3C2C1C4T1T2T3T4which fairly well corresponds to the overall cross-national programMetadata canbe published inhyperlinked format

  • StandardCountry definitionsC3C2C1C4T1T2T3T4IntegrateIntegrated

  • StandardCountry definitionsC3C2C1C4T1T2T3T4The study description will still work as an overall wrapperbut it must include time-dependent information to account for changes in the programIntegrated

  • C3C2C1C4T1T2T3T4IntegratedLets simplify again

  • IntegratedAnd keep only the integrated datasets:

  • IntegratedWe get a nice time-compound dataset:

  • IntegratedSome of the references over time are inherited from the series of standards:T1T2T3T4

  • IntegratedAdditional references must be built for the harmonized variablesT1T2T3T4

  • IntegratedT1T2T3T4Most of the time, the computation of an integrated dataset for a new wave will take into account the choices made for the precedent ones

  • IntegratedT1T2T3T4The variables in the integrated datasets will be referenced in both the single space-compound datasets and the time-compound of integrated datasets

  • IntegratedThe cumulation of the single integrated datasets is now straightforward:T1T2T3T4Integrated-cumulated

  • IntegratedMetadata publication:T1T2T3T4The ultimate wrapper for the publication of metadataof any kind (integrated-cumulated, time-compoundintegrated, hyper-compound or wave specific space-compound will always be the Study describing thewhole program

  • IntegratedSome countries will care for their datasets over timeT1T2T3T4C3C2C1C4

  • Integrated cumulate themT1T2T3T4C3C2C1C4Cumulated

  • Integrated and care for their own study description:T1T2T3T4C3C2C1C4

  • IntegratedFor sure, that study description will refer to the programT1T2T3T4C3C2C1C4

  • IntegratedEven where no cumulation takes place, country specific study level informationmust be capturedT1T2T3T4C3C2C1C4

  • Integrated for all countries, to be included in all relevant metadata productsT1T2T3T4C3C2C1C4

  • C3C2C1C4T1T2T3T4Countries and waves serve as coordinates to navigate the dataset spaceand select the one to work onCumulatedIntegratedStandardC2/T3Int/CumIntSt/T2Int/T4

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardValues have to be added on the two scales to reach the compound datasetsTimecompoundCompC2/Comp

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardValues have to be added on the two dimensions to reach the compound datasetsTimecompoundSpace-compoundCompCompComp/T3C2/Comp

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardSummarizing the compound datasets:TimecompoundSpace+TimecompoundCompSpace-compoundCompComp/T3C2/CompComp/Comp

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandard And the one time cross-section?!!!CompComp

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandard And the one time cross-section?!!!CompCompThe intuition placesIt hereC2/T3

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandard And the one time cross-section?!!!CompCompor hereC1/T1

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandard And the one time cross-section?!CompCompActually, the one time cross-section is out ofthe dataset space for the repeated cross-national.

    In this space, it is best defined as the Single/Single,the single space / single time dataset, which isin geometrical terms the origin of the datasetspace.

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardNow, we can define in a systematic manner the types of datasets, which aredefined in the dataset space to be used for a repeated cross-national programCompComp

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardStart with the origin:CompComp

    Single timeOne timeCross-sectionSingle space

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardIf we start a cross-national program, we need a standard definitionand several country datasetsCompCompAll those DS are of thesame logical type

    Single timeOne timeCross-sectionStandard +Single Country DSSingle spaceSpace s(sample)s = Stand, 1-n

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardBy building anetwork of references from the country datasets to the standardwe make this be a space-compound dataset, a higher level datasetCompComp

    Single timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSSingle spaceSpace s(sample)s = Stand, 1-nCompound

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardIntegrationCompComp

    Single timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardRepeating the one-time cross-section, we get a series of datasetsfor a longitudinal study:CompComp

    Time tt = 1-mSpecific wave in single space

    Single timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardConstructing the references from posterior waves to anterior waves, we getthe time-compound dataset, a higher-level datasetCompComp

    CompoundTime-compound DSTime tt = 1-mSpecific wave in single space

    Single timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardCumulateCompComp

    CumulatedCumulated DSCompoundTime-compound DSTime tt = 1-mSpecific wave in single space

    Single timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardYou can also repeat a cross-national study program, so the space-specific DSsmust also be defined in timeCompComp

    CumulatedCumulated DSCompoundTime-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardWave after wave, a time-specific compound dataset is made from the single-spacedatasets by building the network of references to the standardCompComp

    CumulatedCumulated DSCompoundTime-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandard... and integrate wave after wave the compound dataset into a time-specificintegrated dataset CompComp

    CumulatedCumulated DSCompoundTime-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardWaves following the one another, the standard will grow as a space-specifictime-compound dataset, and some country datasets as wellCompComp

    CumulatedCumulated DSCompoundTime-compound DSSpace-specific time-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardThe resulting christmas tree can be seen as a hyper-compound dataset, since It is composed on two successive logical levelsCompComp

    CumulatedCumulated DSCompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandard but you would probably not integrate the hyper-compound datasetCompComp

    CumulatedCumulated DSCompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DS?Time tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardInstead, you will cumulate the time-specific datasets:CompComp

    CumulatedCumulated DSCumulated integrated DS

    CompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardSome countries will cumulate the datasets they cared for as local longitudinalstudiesCompComp

    CumulatedCumulated DSSpace-specific cumulated DSCumulated integrated DS

    CompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • C3C2C1C4T1T2T3T4CumulatedIntegratedStandardbut there will probably be no attempt at composing them nor at integratingIntegrating them.CompComp

    CumulatedCumulated DSSpace-specific cumulated DSCumulated integrated DS

    CompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • So we end up with the following typology of datasets:

    CumulatedCumulated DSSpace-specific cumulated DSCumulated integrated DS

    CompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • So we end up with the following typology of datasets:

    CumulatedCumulated DSSpace-specific cumulated DSCumulated integrated DS

    CompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • So we end up with the following typology of datasets:This one is rather atheoretical construct

    CumulatedCumulated DSSpace-specific cumulated DSCumulated integrated DS

    CompoundTime-compound DSSpace-specific time-compound DSSpace+Time hyper-compound DSTime tt = 1-mSpecific wave in single spaceSpace and time-specific DSs in RCSTime-specific space-compound DSTime-specific integrated DSSingle timeOne timeCross-sectionStandard +Single Country DSSpace-compound DSIntegrated DSSingle spaceSpace s(sample)s = Stand, 1-nCompoundIntegrated

  • Based on this typology, we can define a system of coordinates for the identificationof the dataset of reference for work or publicationSpaceTimeSingle

    GermanyFranceItalyUKetc.

    CompoundIntegratedSingle

    2001200320052007etc.

    CompoundCumulated

  • with standard values and program specific valuesSpaceTimeSingle

    GermanyFranceItalyUKetc.

    CompoundIntegratedSingle

    2001200320052007etc.

    CompoundCumulated(Standard)

    (Variable)(Standard)

  • and some forbidden combinationsSpaceTimeSingle

    GermanyFranceItalyUKetc.

    CompoundIntegratedSingle

    2001200320052007etc.

    CompoundCumulated(Standard)

    (Variable)(Standard)

  • Lets turn back to a previoius view to screen the involved studiesC3C2C1C4T1T2T3T4IntegratedIntegrated-cumulated

  • Only two types of studies are needed:

    the coordination study andthe country study

    Provided that there is a possibility for describing different activitieswithin the same study coordination implementation integration (space) cumulation (time) (and a few others)together with the respective authorityInformation, notes and citation statements

    the same information structure can beused for both types of studiesLets turn back to a previoius view to screen the involved studies

  • Thats it, as far as cross-nationalStudies by designAre concerned

  • Nowjust pick two variables

    (the harmonization study)

  • and compare the respective metadata hierarchies:Study (concept)Dataset (sample)Variable (definition)

    Variable aStudy (concept)Dataset (sample)Variable (definition)

    Variable bIf the metadata are close enough on all levels, compare the data.If not, take hands off.COMPARE

  • Choosing the two variables in your favorite metadata handling application,you should be offered the information elements to compareStudy (concept)Dataset (sample)Variable (definition)

    Variable aStudy (concept)Dataset (sample)Variable (definition)

    Variable bThis is not a matter of metadata structure. At this stage, it is a matter ofthe applicationCOMPARE

  • But you may wish to store the comparison as a re-usable relationshipin the metadata and even to compute a harmonized variableStudy (concept)Dataset (sample)Variable (definition)

    Variable aStudy (concept)Dataset (sample)Variable (definition)

    Variable bThen you will create a study for its own, describe in it your comparisonproject and let the variables refer to the variables in the original datasets.COMPARE

  • You should also be able to choose studies, which you know to be kinin methods and contents, create a virtual compound dataset, andcheck the degree of comparability on all levelsStudy (concept)Dataset (sample)Variable (definition)

    Variable aStudy (concept)Dataset (sample)Variable (definition)

    Variable bbefore defining the sets of variables, which can be harmonized overspace or over timeCOMPARE

  • The end

  • Issues

  • Abstract conceptual modelReference case:All operations handled with one instance of the application under one single authorityReal life cases:Coordination group and local implementersChanges in organization over timeExtension: multiple authorsworking on a single systemwording on communicating systems

  • LevelsSome of the mechanisms have been tested in a SIDOS owned prototype (series of questions and variables).Studies and datasets of various types, depending on their level in the structure (higher level objects composed of lower level objects)

  • Reference or inheritance?DDI V 3.0 introduces a grouping structureAssociation?Composition?Inheritance (with local override)? (o-o thinking)Relational data modelReferencesCombination of both ways of thinking? More detailed analysis necessary

  • To be continued