Dealing with variables: Resources and topics in enhancing secondary survey data

  • Published on

  • View

  • Download

Embed Size (px)


4th ESRC Research Methods Festival St Catherines College, Oxford. 5-8 July 2010. Dealing with variables: Resources and topics in enhancing secondary survey data. Paul Lambert University of Stirling DAMES research Node, - PowerPoint PPT Presentation


<ul><li><p>Dealing with variables: Resources and topics in enhancing secondary survey dataPaul LambertUniversity of Stirling DAMES research Node, of session 17 Resources (i): Resources for data management 6/JUL/20104th ESRC Research Methods FestivalSt Catherines College, Oxford. 5-8 July 2010 </p></li><li><p>Dealing with variables: Resources and topics in enhancing secondary survey data</p><p>Rigorous and vigorous approaches to dealing with variables</p><p>Three specialist topics: The GESDE services for data on occupations, ethnicity and educational qualifications</p></li><li><p>Survey research and variable analysis</p></li><li><p>*Data management applied to variables refers tothe tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis [DAMES Node..]</p><p>Usually performed by social scientists themselvesPre-analysis tasks (though often revised/updated)Inputs also from data providers Usually a substantial component of the work processBut may not be explicitly rewarded (sometimes even penalised..)</p><p> a little different from archiving / controlling data itself</p></li><li><p>*Some components in secondary survey researchManipulating dataRecoding categories / operationalising variablesLinking dataLinking related data (e.g. longitudinal studies)Combining / enhancing data (e.g. linking micro- and macro-data) Secure access to dataLinking data with different levels of access permissionFull or restricted access to detailed micro-dataHarmonisation standardsApproaches to linking concepts and measures (indicators)Recommendations on particular variable constructionsCleaning data missing values; implausible responses; extreme values</p></li><li><p>*Example recoding data [use a recode or file matching routine] </p></li><li><p>* the centrality of keeping clear records of DM activitiesReproducible (for self)Replicable (for all)Paper trail for whole lifecycleCf. Dale 2006; Freese 2007</p><p>In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata) </p><p>Syntax Examples:</p></li><li><p>Some provocative examples for the UKSocial mobility is increasing, not decreasing!!Popularity of controversial findings associated with Blanden et al (2004)Contradicted by wider ranging datasets and/or better measures of stratification positionDM: researchers ought to be able to more easily access wider data and better variables</p><p>Degrees, MScs and PhDs are getting easier{or at least, more people are getting such qualifications}Correlates with measures of education are changing over time DM: facility in identifying qualification categories &amp; standardising their relative value within age/cohort/gender distributions isnt, but should, and could, be widespread </p><p>Black-Caribbeans are not disappearing As the 1948-70 immigrant cohort ages, the Black-Caribbean group is decreasingly prominent due to return migration and social integration of immigrant descendants Data collectors under-pressure to measure large groups onlyDM: It ought to be possible to harmonise measures of ethnicity over time, and to build richer data resources with more cases (e.g. by merging survey data)</p><p>People interpreted the RAE wrongly! Most responses to the RAE 2008 involved comparing GPA scores between subject areas within and/or across institutions; but standardising relative to subject area distribution, or scaling by subject area, often gives very different results.DM: see Lambert and Gayle (2008) for a demo of alternative uses of RAE data</p></li><li><p>What might a rigorous and vigorous variable analysis look like? to debate but Id nominate: ReplicabilityFeatures a pro-active review of variablesReview a full set of alternative measuresReview alternative functional formsAttention to distribution/standardisationAttention to harmonisation </p></li><li><p>How should I make my work replicable? The concept of a workflow is a useful device for documenting a survey research project</p><p>Workflows involve organising materials as a series of interrelated but distinctive componentsIn survey research, software syntax files make excellent templates for documenting our work in component elements [Long, 2009; Treiman, 2009; Altman &amp; Franklin, 2010; Kulas, 2008]Computer science researchers have developed workflow depositories [e.g. MyExperiment] and workflow capture tools [e.g. Taverna]</p></li><li><p>Ad hoc organisation of a workflow as a master file in Stata Forthcoming workshop: Documentation and workflows for social survey research, University of Stirling, 1-2 September 2010, see</p></li><li><p>A workflow summary in Excel (following Long, 2009)</p></li><li><p>How should I review variables/functional forms/distributions/harmonisations?We tend to rely on personal expertise in particular subject domainsExpertise of the depositor of the dataExpertise of the analyst Some textbooks and other capacity building events cover these topics generically [e.g. Treiman 2009], but by and large they get unduly neglected from methodological training </p><p>Something called e-Science can help with both variable reviews and replication</p></li><li><p>The e-Social Science endeavour</p><p>see for up-to-date linksA number of UK projects seeking to improve social science research by capitalising on emerging computer science techniquesHandling distributed data; collaborative technologies; large and complex data; secure data</p><p>The Grid embodies these technologies, but more generic terms like e-Social Science &amp; Digital Social Research are increasingly preferredGESDE: Grid Enabled Specialist Data Environments *</p></li><li><p>e-Social Science, BSA2009*Example: Understanding New Forms of Digital Records (DReSS) transcribed talkaudio videodigital recordssystem logslocation </p><p>transcriptcode treevideosystem log</p><p>e-Social Science, BSA2009</p></li><li><p>*This session part-organised by the Data Management though e-Social Science nodeDAMES </p><p>ESRC Node funded 2008-2011</p><p>Aim: Useful social science provisions by exploiting tools for data management developed in computer science. Core components are: Data curation tool Data fusion tool Portals for access to data and data resources</p></li><li><p>Data curation tool collects metadata and allows data resources of different formats to be organised in an accessible depository</p></li><li><p>Data fusion tool supports merging of data files through shared variables (e.g. for recodes, aggregations, pooling data, linking related data, probabilistic linkages) </p><p>External user (micro-social data)Occ info (index file) (aggregate)Users output(micro-social data)idougsex.ougCS-MCS-FEGPidougCS11101.1106058I111060.23201.3206971II232069.33202.8743951VIIa332071.48741.487439.58742.587451.</p></li><li><p>GEMDE Example of a portal for distributing and accessing supplementry data related to ethnicity</p></li><li><p>2) Special Topics: The GESDE services for sociological classificationsKey variables in social science research are not just for sociology, but are much debated thereComplex categorical measures and variable operationalisation recommendations/debatesIndividual level measures of social positioning</p><p>GESDE = 3 related online services which are Grid Enabled Specialist Data Environments GEODE: the o is for data on OccupationsGEEDE: the e is for data on Educational qualificationsGEMDE: the m is for data on ethnic Minorities</p></li><li><p>Our contribution in GESDE..Many existing resources on these topics [See app.]Academic reviews and projects [e.g. Rose &amp; Harrison 2010; Ganzeboom, 2008; Schneider, 2008; Guveli, 2006]Service providers [e.g. ESDS variable guides; CESSDA-PPP]National Statistics Institutes guidelines [e.g.]</p><p>Itd be good if more people were engaging with and exploiting these resources to enhance their own data..! </p></li><li><p>*At the centre of this are problems of standardizing categorical data</p><p>Measurement equivalence (e.g. van Deth, 2003) is often not feasible for complex categorical measures For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence(because of non-linear relations between categories and shifting underlying distributions) (even if measurement equivalence seems possible)</p><p>Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context For categorical data, this can be achieved/approximated by scaling categories in one or more dimension of difference</p></li><li><p>*Effect proportional scaling using parents occupational advantage</p></li><li><p>What was that then? We can represent categories through positions on a scaleIn turn, we can use position in the dimension as a category score which then plugs into a further analysis (e.g. regression main and interaction effects)</p><p>..E.g. some options for data on ethnicity..Stereotyped Ordered Logistic Regression (SOR) models, summarize dimensions of difference according to regression predictor values [e.g. Lambert and Penn, 2001]Geometric data analysis for distances between people, or things [cf. Prandy, 1979; Bennett et al., 2009]Assign category scores by hand (a priori or by selected average)</p><p>*</p></li><li><p>*</p></li><li><p>2(a) Data on occupationsOccupational unit groups = standardised lists of occupational titlesE.g. via CASCOT,*</p></li><li><p> on occupations..find ways of attaching summary information about occupations to occupational unit groups*</p></li><li><p>Comparability problems =&gt; value of documenting methods &amp; comparing alternatives*</p></li><li><p>GEODE: Our contributionGEODE acts as a library style service for access to occupational information resources We encourage people to supply data theyve produced, and we upload data ourselvesResearchers are encouraged to use the portal to find and exploit suitable data Services: search, browse, deposit data, link data, user ratings*</p></li><li><p>GEODE (v1) Occupational data</p></li><li><p>Survey Network 4 June 2009*Using occupational data: Example as a measure of marked social disadvantage Lambert &amp; Gayle (2009)</p><p>Survey Network 4 June 2009</p></li><li><p>*[Example: Occupational not geographical inequality]</p></li><li><p>2(b) Data on educational qualificationsSimilar issues arise with the use of educational dataSpecialist resources exist which can enhance measures of educational dataMany users arent aware of alternative coding schemes or harmonised approaches</p><p>GEEDE acts as a service for bringing together and disseminating relevant data resources on educational measures</p></li><li><p>*Example recoding data </p></li><li><p>*Family and Working Lives Survey (54 vars per educ record) </p></li><li><p>2(c) Data on ethnicity We can conceive of similar information resources and data analysis requirements for measures of ethnicityThere are generally fewer published resources / agreed standards in this domain </p><p>GEMDE publishes resources but puts more emphasis on understanding complex ethnicity data*</p></li><li><p>working with ethnicity data in surveys is hard! - Its sparse - Its collinear (e.g. to age, location) - Its dynamic (cf. comparative research) *</p></li><li><p>*EFFNATIS sample (1999): Subjective ethnic identity [Heckman et al., 2001]</p></li><li><p>*A data management contributionPreserve information on what was done with categorical dataCommunicate information on what should/could be done </p></li><li><p>GEMDE seeks to promote replicability / transparencyDocument your own recodes Access somebody elses recodes Identify commonly used recodes (&amp; use them..!)</p><p>*</p></li><li><p>..and making complex analysis of ethnicity data easier..Organising complex categorical dataLabelling, recoding, etcEffect proportional scalingStandardisation Interaction terms </p><p>*</p></li><li><p>The GEODE model for GEMDE?.A service for MUGs and MIRs</p><p>Define/register Minority Unit Groups</p><p>Define/register Minority Information Resources</p><p>Explore data resources and obtain help in approaching analysis of complex, sparse data</p></li><li><p>What's a MIR? 'Minority Information Resource'. This is our own terminology. By a MIR, we mean any piece of information which supplies systematic data on a minority unit group (MUG) classification. We've used this term to be deliberately similar to the phrase 'Occupational Information Resources' that we used on GEODEE.g. summary statistical data about the categories from and documentation or information E.g. recodings which have been used in a particular studySocial scientists are not in general aware of the existence of MIRs (cf. wides use of popular Occupational Information Resources). In GEMDE we seek to publicise little know resources and promote their uptake: We argue that better communication and dissemination of MIRs is in fact an important step towards better scientific practice of replication and standardisation of research. In our terms, every MIR necessarily links to a MUG (but not every MUG has a MIR). </p></li><li><p>The GEMDE portalLiferay portal with access to MUGs and MIRs, first release Jan 2010, now available for general use (</p><p>Shibboleth access for registered usersGuest level access Deposit MUGs/MIRsSearch/browse deposited resources</p><p>Feedback on resources (user ratings)Review live data (e.g. pooled LFS records)Expert and user quality ratings </p></li><li><p>Screenshot here!*</p></li><li><p>Summary: Remind me how these topics enhance survey data..?Variable operationalisations can ordinarily be improved by more rigour and vigourMore transparent operationalisation/documentationBetter use of detailed dataBetter ability to include measures in suitably complex models/analysis</p><p>The GESDE approach has been to seek technological solutions to the organisation and distribution of complex variable-related information </p></li><li><p>*Data usedDepartment for Education and Employment. (1997). Family and Working Lives Survey, 1994-1995 [computer file]. Colchester, Essex: UK Data Archive [distributor], SN: 3704.Heckmann, F., Penn, R. D., &amp; Schnapper, D. (Eds.). (2001). Effectiveness of National Integration Strategies Towards Second Generation Migrant Youth in a Comparative Perspective - EFFNATIS. Bamberg: European Forum for Migration Studies, University of Bamberg.Li, Y., &amp; Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], SN: 5666.Office for National Statistics. Social and Vital Statistics Division and Northern Ireland Statistics and Research Agency. Central Survey Unit, Quarterly Labour Force Survey, January - March, 2008 [computer file]. 4th Edition. Colchester, Essex: UK Data Archive [distributor], March 2010. SN: 5851.University of Essex, &amp; Institute for Social and Economic Research. (2009). British Household Panel Survey: Waves 1-17, 1991-2008 [computer file], 5th Edition. Colchester, Essex: UK Data Archive [distributor], March 2009, SN 5151.</p></li><li><p>*ReferencesAltman, M., &amp; Franklin, C. H. (2010). Managing Social Science Research Data. London: Chapman and Hall. Bennett, T., Savage, M., Silva, E. B., Warde, A., Gayo-Cal, M., Wright, D., et al. (2009). Culture, Class, Distinction. London: Routledge.Blanden, J., Goodman, A., Gregg, P., &amp; Machin, S. (2004). Changes in generational mobil...</p></li></ul>


View more >