Upload
neissproject
View
1.547
Download
2
Tags:
Embed Size (px)
DESCRIPTION
"Handling social science data: Challenges and responses", Paul Lambert, 17th March 2010
Citation preview
Handling social science data: Challenges and responses
Paul Lambert, University of Stirling
DAMES research Node, www.dames.org.uk
DIR workshop: Handling Social Science Data17/MAR/2010 1
What is social science data?
17/MAR/2010 DIR workshop: Handling Social Science Data 2
Example: Accessing surveys via UK Data Archive
Shibboleth authentication
Download and analyse in Stata, SPSS, etc
Principal forms of data…
• ‘Large and complex social surveys’ Longitudinal; cross-national; hierarchical
• Small scale social surveys
• Administrative data (e.g. ADMIN node; ADLS; commercial data)
• Supplementary (digital) data E.g. ‘GESDE’ services at DAMES
• Qualitative material – auido / video / textual17/MAR/2010 DIR workshop: Handling Social Science Data 3
Large and complex social surveys• several thousand variables• tens of thousands of cases (micro-data)• additional complex survey data features (e.g. household clustering)
17/MAR/2010 DIR workshop: Handling Social Science Data 4
Complex data example: British Household Panel Survey dataset [SN 5151]
31877 100.00 XXXXXXXXXXXXXXXXX 17941 56.28 100.00 (other patterns) 593 1.86 43.72 11............... 631 1.98 41.86 ................1 632 1.98 39.88 ........1........ 840 2.64 37.90 ..........1...... 964 3.02 35.26 1................ 1224 3.84 32.24 ......11111...... 2032 6.37 28.40 ..........1111111 2726 8.55 22.02 ........111111111 4294 13.47 13.47 11111111111111111 Freq. Percent Cum. Pattern
1 1 2 6 9 17 17Distribution of T_i: min 5% 25% 50% 75% 95% max
(pid*year uniquely identifies each observation) Span(year) = 17 periods Delta(year) = 1 unit year: 1991, 1992, ..., 2007 T = 17 pid: 10002251, 10004491, ..., 1.794e+08 n = 31877
. xtdes, i(pid) t(year)
Total 224,624 100.00 2007 14,910 6.64 100.00 2006 15,392 6.85 93.36 2005 15,627 6.96 86.51 2004 15,791 7.03 79.55 2003 16,238 7.23 72.52 2002 16,597 7.39 65.29 2001 18,867 8.40 57.91 2000 15,603 6.95 49.51 1999 15,623 6.96 42.56 1998 10,906 4.86 35.60 1997 11,193 4.98 30.75 1996 9,438 4.20 25.77 1995 9,249 4.12 21.56 1994 9,481 4.22 17.45 1993 9,600 4.27 13.23 1992 9,845 4.38 8.95 1991 10,264 4.57 4.57 year Freq. Percent Cum.
. tab year
• This example shows BHPS being analysed in Stata. BHPS re-contacts subjects annually (since 1991)
• 4294 interviewed as adults every year for 17 years. • Analysis methods, and measurement issues over
time, are challenging.
Supplementary (digital) data
• E.g. ‘Occupational information resources’ = data files within information on occupations, which can be usefully linked to micro-data about occupations
e.g. GEODE acts as a library of OIRs, www.geode.stir.ac.uk
Such resources are oftennot widely known about,but have the ability toenhance analysis
17/MAR/2010 DIR workshop: Handling Social Science Data 6
DIR workshop: Handling Social Science Data 7
Example: Qualitative data used by ‘Digital Records for e-Social Science’ (DReSS)
• transcribed talk
• audio / video• digital
records• system logs• location
transcript
code tree
video
system log
17/MAR/2010
Three well-known challenges• We’re data rich, but analysts’ poor
• UK Data Forum (2007); Wiles et al (2009)• Under-use of suitably complex statistical models
• Coordination and communication on data processing • Recodes / Standardisation / harmonisation / documentation• Not rewarded/incentivised to researchers
• Lack of generic/accessible representation of tasks• Limited disciplinary/project/researcher cross-over when dealing with
data• Specific software orientations
These are not generally problems of scale, but of organisation
17/MAR/2010 DIR workshop: Handling Social Science Data 8
‘Managed’ responses?
• Data handling/analysis capacity-buildingESRC programmes (NCRM, RDI, RMP); training
workshops/materials; P/G funds; strategic research grant investment
• Documentation/replication policiesDale (2006)
• Software for data access and analysisNESSTAR – UK Data Archive data/metadata browserLong (2009) on the Stata softwareRemote access to data (e.g. SDS)
17/MAR/2010 DIR workshop: Handling Social Science Data 9
..train and/or constrain the analysts..
Train them ->
17/MAR/2010 DIR workshop: Handling Social Science Data 10
..constrain the analysis..
17/MAR/2010 DIR workshop: Handling Social Science Data 11
Non-hierarchical responses?
Technological collaborative services might support effective, unmanaged data access, coordination and exploitation(in principle)
UK e-Social Science investment in data oriented social science research support NeISS; E-Stat; DAMES; Obesity e-Lab; CQeSS
17/MAR/2010 DIR workshop: Handling Social Science Data 12
..some examples..
E-Stat @
National e-Infrastructure for Social Simulation
• Expert led simulation demonstrations
• Combining data resources• Workflows for the simulation
analysis Modify and re-specify existing
simulation templates
17/MAR/2010 DIR workshop: Handling Social Science Data 13
Design a tool to specify complex statistical models in generic / visual terms
Multilevel modelsMultiple data permutations and analytical alternatives
Ready access to a suite of complex modelling tools
DAMES – online services for data coordination/organisation
Tools for handing variables in social science data
Recoding measures; standardisation / harmonisation; Linking; Curating
17/MAR/2010 DIR workshop: Handling Social Science Data 14
GESDE – Search and browse supplementary data on occupations; educational qualifications; ethnicity
17/MAR/2010 DIR workshop: Handling Social Science Data 15
• Data curation tool (for collecting metadata)
17/MAR/2010 DIR workshop: Handling Social Science Data 16
Handling data: analysis-oriented data management priorities
• {Data collection or creation}• Data preservation or curation
• Data enhancement/modification
• Data analysis• Multiple permutations of related analyses• Documentation and replication
17/MAR/2010 DIR workshop: Handling Social Science Data 17
Ideas on the future of social science research data
• Enduring challenges of documentation for replication, and coordination
• More and more comparative analysis• Harmonisation and standardisation
• Data linkage and data enhancement• Models for complex multiprocess systems • Fluency – increasing uptake by more users
17/MAR/2010 DIR workshop: Handling Social Science Data 18
References and Links
• ADLS: http://www.adls.ac.uk/ • ADMIN Node: http://www.ncrm.ac.uk/about/organisation/Nodes/ADMIN/ • DAMES Node: http://www.dames.org.uk/ • DReSS: http://web.mac.com/andy.crabtree/NCeSS_Digital_Records_Node/ • Secure Data Service: http://securedata.ukda.ac.uk/ • UK Data Archive: http://www.data-archive.ac.uk/
• Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.
• Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.
• Wiles, R., Bardsley, N., & Powell, J. L. (2009). Consultation on research needs in research methods in the UK social sciences. Southampton: University of Southampton / ESRC National Centre for Research Methods, and http://eprints.ncrm.ac.uk/810/
17/MAR/2010 DIR workshop: Handling Social Science Data 19