Upload
rebecca-cruz
View
214
Download
2
Tags:
Embed Size (px)
Citation preview
Standardisation, Harmonisation and Measurement
Paul Lambert, 24-25 August 2009
Talk to the ‘Data Management for Social Survey Research’ training workshop, part of the Data Management through e-Social Science
research Node of the National Centre for e-Social Science www.dames.org.uk / www.ncess.acuk
2
Standardisation, Harmonisation and Measurement
1) The idea of measurement2) Data management and categorical data3) Standardizing categorical data4) Supporting the standardization of categorical data
Adapted from: Lambert, P. S., Gayle, V., Bowes, A. M., Blum, J. M., Jones, S. B.,
Sinnott, R. O., et al. (2009). Standards setting when standardizing categorical data. Cologne, 24-26 June 2009: Paper presented to the Fifth International Conference on Social Science Methodology, organised by GESIS and the National Centre for e-Social Science, and http://www.dames.org.uk/publications.htmlthrough
3
Ideas about measurement
Survey analysis involves scanning across cases for relations between variables
Identification of variable effects relies on structured empirical differences between cases It doesn’t follow that how a measure was defined
corresponds to that empirical identification o Example: Age and educational qualifications
It is desirable to keep an open mind over the interpretation of an empirical pattern (explore more options and test more variations)
4
Men and Women (categorical social classifications)
0.1
.2.3
.4.5
.6.7
.8.9
1R
or
pseu
do-R
ES5
E9
E6E5
E3E2
G11G7
G5G3
G2K4
WRWR9
O17 O8
o4MN
Promotion / retention Pay - bonus / increments Hours and level of monitoring
Labour contract type Subjective skill requirements
Men and Women (metric social classifications)
0.1
.2.3
.4.5
.6.7
.8.9
1R
or
pseu
do-R
CM
CFCM2
CF2CG
ISEISIOP
AWMWG1
WG2WG3
GN
Britain Sweden
(2.6) Associations - Employment Relations and Conditions
5
Example from occupational research
Broad concordance of schemes• Measures mostly measure the same thing
Generalised concepts are better
• Criterion validity is asymmetric [cf. Tahlin 2007]
Lambert, P. S., & Bihagen, E. (2007). Concepts and Measures: Empirical evidence on the interpretation of ESeC and other occupation-based social classifications. Paper presented at the International Sociological Association, Research Committee 28 on Social Stratification and Mobility, Montreal (14-17 August).
6
2) Data management and categorical data
‘categorical data’ = values in a quantitative dataset where the numeric data represents membership of groups (categories) but has no direct arithmetic meaning
A ‘Qualitative’ type of quantitative data [‘metric’=data is arithmetic]Ordinal/nominal forms, & statistics [Stevens, 1946; Agresti, 2002]
1 2 3 -9
Interest in this talk High Medium Low N.A. (ordinal)
Country of residence UK Germany Other N.A. (nominal)
7
Categorical data is important..
Principal social survey datumo Basis of most social research reports/analyses/comparisons
It’s rich and complex o We’re often interested in very fine levels of detail / differenceo We usually recode categories in some way for analysis
…how categorical data is managed is of great consequence to the results of analysis…Choices about recoding, boundaries, contrasts made[e.g. RAE analysis: Lambert & Gayle 2009]
8
UK EFFNATIS survey (1999)
[Heckmann et al 2001]
79. Islamic 0 0 1 0 1 78. Dramatic 0 0 4 0 4 77. Bollywood 0 0 1 0 1 75. Ambient 1 0 0 0 1 72. Lo-fi 1 0 0 0 1 68. Urdu 0 2 0 0 2 65. Hindi 0 1 2 0 3 64. Classical Hindi 0 1 1 0 2 63. English 0 2 0 0 2 60. Dance Floor 2 0 0 0 2 51. Goth 1 0 0 0 1 50. Break Beat 1 1 0 0 2 48. Swing 0 1 0 0 1 47. Blues 1 0 0 0 1 46. Love songs / ball 0 2 0 0 2 45. Rock 'n' Roll 1 0 0 0 1 44. Soft Rock 2 1 0 0 3 43. Alternative 2 0 0 0 2 41. Drum 'n' Bass 3 0 0 0 3 40. All 19 5 2 0 26 38. Asian Pop Music 0 2 0 0 2 37. Jungle 1 0 0 0 1 36. Bangra 0 5 2 0 7 35. Indian 0 11 3 1 15 31. Punk 3 0 0 0 3 30. Country 2 0 0 0 2 28. Reggae 0 3 2 1 6 27. Easy Listening 3 0 0 0 3 26. Grunge 1 0 0 0 1 24. Eighties 2 0 0 0 2 23. Seventies 3 0 0 0 3 22. Sixties 4 0 0 0 4 21. Classical 4 2 1 1 8 20. Jazz 2 1 1 0 4 18. Folk 1 0 0 0 1 17. Motown 1 0 0 0 1 16. Rock / Heavy Meta 32 2 2 0 36 15. Indie / Britpop 46 4 3 1 54 14. Rap 4 11 2 1 18 13. Acid Jazz 1 0 0 0 1 12. Hardcore 3 0 0 0 3 11. Garage 0 2 1 0 3 10. Techno 1 0 0 0 1 9. House 18 0 1 0 19 8. Trance 0 0 1 0 1 7. Asian Music 0 11 12 2 25 6. Hip Hop 5 6 2 0 13 5. R'n'B 19 16 5 5 45 4. Soul 8 31 10 3 52 3. Dance 94 5 9 1 109 2. Chart 27 2 0 0 29 1. Pop 107 32 20 4 163 music1b 1. Autoch 2. Pakist 3. Indian 4. Bangla Total
Total 97 601 698 4. Bangladeshi 6 14 20 3. Indian 32 56 88 2. Pakistani 58 105 163 1. Autochthonous 1 426 427 uketh2 1. Ethnic 2. No EM Total music, by ethnicity Favourite type of
9
EFFNATIS sample (1999): Subjective ethnic identity
30. English, White, Cosmopolitan (2,10, 1 0.12 86.0329. English, European and White-British 20 2.43 85.91 28. European and White-British (11,13) 7 0.85 83.48 27. White-Italian 1 0.12 82.62 26. English & White-British (2,13) 8 0.97 82.5025. White-British & Cosmopolitan (13,16 5 0.61 81.53 24. English, White-British (2,13) 41 4.98 80.92 23. English & White (2,10) 54 6.56 75.9422. Indian-British, Asian-British (7,15 9 1.09 69.38 21. Indian & British (1,4) 1 0.12 68.29 19. British Moslem 6 0.73 68.17 18. Indian, Asian (4,9) 2 0.24 67.44 17. Moslem 12 1.46 67.19 16. Cosmopolitan 7 0.85 65.74 15. Asian-British 65 7.90 64.88 13. White-British 60 7.29 56.99 11. European 6 0.73 49.70 10. White 7 0.85 48.97 9. Asian 6 0.73 48.12 8. Bangladeshi-British 22 2.67 47.39 7. Indian-British 34 4.13 44.71 6. Pakistani-British 73 8.87 40.58 5. Bangladeshi 10 1.22 31.71 4. Indian 10 1.22 30.50 3. Pakistani 25 3.04 29.28 2. English 79 9.60 26.25 1. British 137 16.65 16.65 yourself? Freq. Percent Cum. describes how you would describe Q.129 Which of the following best
90. White-British, Ukrainian 1 0.12 98.91 89. Indian, Asian-British (4,15) 1 0.12 98.7888. Indian, Black, Asian-British (4,12, 1 0.12 98.6687. Indian-British, Black-British (7,14 1 0.12 98.54 86. English, Indian (2,4) 4 0.49 98.42 85. European, White-British, Irish 1 0.12 97.93 83. Pakistani-British, Italian 1 0.12 97.8182. English, Pakistani-British, Asian ( 1 0.12 97.69 80. Human being 2 0.24 97.5779. English, Indian-British, White, Asi 1 0.12 97.33 78. Pakistani-British, Asian, Moslem 1 0.12 97.21 77. English, Pakistani (2,3) 3 0.36 97.08 73. British with Baltic-Slav origins 1 0.12 96.7272. English, White-British, Cosmopolita 2 0.24 96.60 71. Irish-English 1 0.12 96.35 70. English, White, European (2,10,11) 3 0.36 96.23 69. English, White, Irish 1 0.12 95.8768. Pakistani, Asian-British, Moslem, F 1 0.12 95.75 67. Pakistani, Asian-British (3,15) 1 0.12 95.63 66. Humanoid 1 0.12 95.50 65. Lancastrian 3 0.36 95.38 64. British, European (1,11) 2 0.24 95.02 63. Bangladeshi, Asian, Black (5,9,12) 1 0.12 94.7862. Pakistani-British, Black, Asian-Bri 2 0.24 94.65 61. English, Pakistani-British (2,6) 2 0.24 94.4160. British, White, European, Cosmopoli 1 0.12 94.1759. English, European, White-British, C 3 0.36 94.05 58. Individual 2 0.24 93.68 56. English, European (2,11) 2 0.24 93.44 55. Scottish 1 0.12 93.20 54. British, White (1,10) 2 0.24 93.07 53. British, English (1,2) 9 1.09 92.8352. English, Indian, Asian-British (2,4 1 0.12 91.7451. Pakistani-British, Indian-British ( 1 0.12 91.62 50. English, Asian-British (2,15) 2 0.24 91.49 49. English, Indian-British (2,7) 2 0.24 91.2548. English, Indian-British, Black, Asi 2 0.24 91.01 47. Indian-British, Asian (7,9) 5 0.61 90.7746. Indian, Asian, Asian-British (4,9,1 1 0.12 90.16 45. Neapolitan 1 0.12 90.0444. English, Pakistani-British, Indian- 2 0.24 89.9143. English, Indian-British, Asian-Brit 1 0.12 89.67 42. Pakistani-British, Asian (6,9) 6 0.73 89.55 41. Black, Asian-British (12,15) 2 0.24 88.8240. Pakistani-British, Asian-British (6 2 0.24 88.58 39. Pakistani-born British 1 0.12 88.34 38. Pakistani, Black (3,12) 1 0.12 88.2137. Pakistani-British, Asian, Black-Bri 1 0.12 88.0936. Pakistani-British, Black-British, A 1 0.12 87.9735. Pakistani-Moslem living in Great Br 1 0.12 87.85 34. Kashmiri 4 0.49 87.73 33. Pakistani, Asian (3,9) 3 0.36 87.2432. Pakistani-British, Asian-British (6 6 0.73 86.8831. English, Indian-British, Asian-Brit 1 0.12 86.15
10
Ethnic group in the World Values Survey - Britain
Count
18 0 0 0 18
0 1484 0 999 2483
0 0 1 0 1
15 0 0 0 15
1 0 0 0 1
0 0 3 0 3
0 0 11 0 11
0 0 1 0 1
0 0 4 0 4
0 0 12 0 12
9 0 2 0 11
0 0 7 0 7
1124 0 1044 0 2168
0 0 8 0 8
1167 1484 1093 999 4743
-5 Missing; Unknown
-4 Not asked
-1 Don´t know
40 Asian
70 Asian - Central (Arabic)
80 Asian - East (Chinese,Japanese)
90 Asian - South (Indian,Hindu, Pakistani,Bangladeshi)
130 Bangladeshi
200 Black African
210 Black-Caribbean
220 Black-Other / Black
810 Pakistani
1400 White / CaucasianWhite
8000 Other
Total
1981-1984 1989-1993 1994-1999 1999-2004
Wave
Total
11
Family and Working Lives Survey (54 vars per educ record)
12
1 11 12 13 20 21 22 23 24 31 32 33 34 41 42 51 52 61 71 72 73 74 81 82 83 91 92 93
maximum: 335
Men's jobs (frequencies)
90 86
85 84
83 82
81 80
78 76
75 74
73 72
71 70
69 66
64 63
62 61
60 59
55 54
53 52
51 50
49 48
46 45
44 43
42 41
40 39
37 36
35 34
33 32
31 30
29 28
27 26
25 24
23 22
21 20
19 18
17 16
15 14
13 12
11 10
0
1 11 12 13 20 21 22 23 24 31 32 33 34 41 42 51 52 61 71 72 73 74 81 82 83 91 92 93
maximum: 895
Women's jobs (frequencies)
Source: British Household Panel Survey, last reported current jobs of adults, waves 1-17, N Males = 10223; N Females=9934X-asis shows ISCO-88 Sub-Major group of job; Y-axis shows ISCO-88 3rd and 4th digit codes.
13
3) Standardizing categorical data
‘Standardization’ refers to treating variables for the purposes of analysis, in order to aid comparison between variables
o {In the terminology of survey research analysts}
1. Arithmetic standardization to re-scale metric values [zi = (xi – x) / sd]
2. Ex-ante harmonisation (during data production) [ensuring measures of the same concept, collected from different contexts, are recorded in coordinated taxonomies]
3. Ex-post harmonisation [adapting measures of the same concept, collected from different contexts, using a coordinated re-coding procedure]
xxxx
14
The big issue: standardization for comparisons
‘Comparisons are the essence’ [Treiman, 2009: 382]
↔ to make statements about differences [in measures] over contexts
Categorical data is highly problematic..
Can’t immediately conduct arithmetic standardization Struggle to enforce harmonised data collection
• ..which may not in any case be suitable.. Struggle to achieve ex-post harmonisation
• Non-linear relations between categories• Shifting underlying distributions
15
Two conventional ways to make comparisons [e.g. van Deth 2003]
Measurement equivalence= ex ante harmonisation (or ex post harmonisation)
Meaning equivalence= Arithmetic standardisation (or ex ante or ex post harmonisation)
Much comparative research flounders on an insufficient recognition of strategies for equivalence
(“One size doesn’t fit all, so we can’t go on”)
16
Measurement equivalence
(i) Measurement equivalence by assertion
17
(ii) Measurement equivalence example: ‘Lissification’
• Major research programme in ex-post harmonisation of Labour Force Surveys over time and between countries
www.lisproject.org
18
(iii) Measurement equivalence and social class
Show tabplot here
Unskilled
Skilled manual
Petty-bourg.
Non-manual
Salariat
Source: Females from LFS/GHS, using data from Li and Heath (2008)
percent of year category
Goldthorpe class scheme harmonised over time
19
Meaning equivalence
For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence
(because of non-linear relations between categories and shifting underlying distributions)
(even if measurement equivalence seems possible)
Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context
For categorical data, this can be achieved by scaling categories in one or more dimension of difference
20
-.5
0.5
1
1970 1980 1990 2000 2010year
Salariat Skilled non-manual Petty bourg.Skilled manual Unskilled
Scaling using male full time incomes. Points weighted by number of females in each class. Source: LFS/GHS, using Li & Heath (2008).
Goldthorpe class, standarized income means by year
21
Managers and Administrators
Professional
Associate professional and technical
Clerical and secretarial
Craft and related
Personal and protective servicesSales
Plant and machine operativesOther occupations
.
higher degree
first degree
teaching qf
other higher qf
nursing qf
gce a levels
gce o levels or equiv
commercial qf, no o levels
cse grade 2-5,scot grade 4-5apprenticeship
other qf
no qf
.white
black-carib
black-african
black-other
indianpakistani
bangladeshi
chinese
other ethnic grp
2030
4050
0 1 2 3Source: British Household Panel Survey 2007, adults aged 18+ and father's Cambridge Scale score.Points at 1-3 show category mean. Points at 0 show individual values (scaled mean=28, sd=6; pop. mean=28, sd=18).
‘Effect proportional scaling’ using parents’ occupational advantage
22
0 20 40 60 80 100
50% Med 60% Med Mean - (1SD - Skew)
MCAMSIS - BHPS adults' most recent job
0 20 40 60 80 100
50% Med 60% Med Mean - (1SD - Skew)
MCAMSIS - BHPS adults' fathers
23
A comment on ‘offsets’ and meaning equivalence- for comparisons between regressions, it is sometimes suitable to force the coefficients of some variables (e.g. controls) to have a certain fixed value
- Below example (predicting income) using ‘cnsreg’ in Stata, e.g.: regress lninc fem age femage matrix define mod1m=e(b)scalar fem_coef=mod1m[1,1]constraint def 1 fem=fem_coefcnsreg lninc fem age femage mcamsis, constraints(1)
legend: * p<0.05; ** p<0.01; *** p<0.001 ll -2381 -2306 -2205 -2183 -2309 -2210 -2192 r2 .11 .17 .246 .261 N 2127 2127 2127 2127 2127 2127 2127 _cons 6.91*** 6.29*** 6.3*** 6.21*** 6.2*** 6.15*** 6.07*** trei .0205*** .0197*** isei .0169*** .0164*** mcamsis .0133*** .0129*** femage -.00514* -.0041 -.00335 -.00433 -.00514 -.00514 -.00514 age .0151*** .0125*** .0106*** .01*** .0151 .0151 .0151 fem -.194* -.276** -.267** -.205* -.194 -.194 -.194 Variable mod1 mod2 mod3 mod4 mod5 mod6 mod7
24
What we do and what we ought to do (when standardizing categories)
Research applications tend to select a favoured categorisation of a concept and stick with it Due to coordinated instructions [e.g. Blossfeld et al. 2006] Due to perceived lack of available alternatives Due to perceived convenience
To make statistical analyses more robust we should… Operationalise and deploy various scalings and arithmetic
measures Try out various of categorisations and explore their distributional
properties … and keep a replicable trail of all these activities..
25
4) Supporting the standardization of categorical data
GE*DE projects are concerned with allowing social science researchers to navigate, and exploit, heterogeneous information resources
Occupational Information Resources Educational Information Resources Ethnic minority/Migration Information Resources
We are finding that one of the most useful contributions is in helping with the standardization of categorical data
26
What makes this ‘e-Social Science’?
Standards setting Metadata Portal framework
Liferay portal to various DAMES resources
iRODS system for ‘GE*DE’ specialist data
Controlled data access under security limits
Use of workflows
27
E.g. of GEODE v1: Organising and distributing specialist data resources (on occupations)
28
(i) Basic access to data
Services to.. search for and identify suitable information resources
{Liferay portal and iRODS file connection} allow merging these resources with own data
{Non-trivial consideration – complex micro-data subject to security constraints}
Constructing new standardized resources for UK and major cross-national surveysE.g. Effect proportional scales for ethnic groups and educational
qualifications across countries and over timeCAMSIS scales for educational homophily (cf. www.camsis.stir.ac.uk)
29
(ii) Depositing data
Services to… Allow researchers to deposit specialist information
resources to be immediately visible to others Collect basic metadata via proforma, option of
adding extended metadata (DDI structure) {Motivations are altruism; citations; reduced burdens}
{Quality control through site rankings, expert inputs}
30
(iii) Workflows for recodes and standardisations
Documenting and distributing recodes / variable transformations / file matching operations
Ready access to previously used standardizations (avoid re-inventing the wheel)
Stata and SPSS focus (principal integrated data management / data analysis software for target users)
{includes files as resources; & generate syntax log file}
31
Conclusions and considerations
DAMES services are work in progress –www.dames.org.uk
Technical issues Service delivery / Quality control
Scientific contributions Progress in standardisations and ideas of equivalence Suitable use of categorical data in social science data analysis! Documentation for replication Meta-analysis orientation
32
Data used Department for Education and Employment. (1997). Family and Working Lives
Survey, 1994-1995 [computer file]. Colchester, Essex: UK Data Archive [distributor], SN: 3704.
Heckmann, F., Penn, R. D., & Schnapper, D. (Eds.). (2001). Effectiveness of National Integration Strategies Towards Second Generation Migrant Youth in a Comparative Perspective - EFFNATIS. Bamberg: European Forum for Migration Studies, University of Bamberg.
Inglehart, R. (2000). World Values Surveys and European Values Surveys 1981-4, 1990-3, 1995-7 [Computer file] (Vol. 2000). Ann Arbor, MI: Institute for Social Research [Producer]; Inter-university Consortium for Political and Social Research [Distributor].
Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], SN: 5666.
University of Essex, & Institute for Social and Economic Research. (2009). British Household Panel Survey: Waves 1-17, 1991-2008 [computer file], 5th Edition. Colchester, Essex: UK Data Archive [distributor], March 2009, SN 5151.
33
References Agresti, A. (2002). Categorical Data Analysis, 2nd Edition. New York: Wiley. Blossfeld, H. P., Mills, M., & Bernardi, F. (Eds.). (2006). Globalization, Uncertainty
and Men's Careers: An International Comparison. Cheltenham: Edward Elgar. Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: A
methodological comment on using results from the UK Research Assessment Exercise 2008. Stirling: University of Stirling, Technical paper 2008-3 of the Data Management through e-Social Science research Node (www.dames.org.uk)
Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.
Simpson, L., & Akinwale, B. (2006). Quantifying Stablity and Change in Ethnic Group. Manchester: University of Manchester, CCSR Working Paper 2006-05.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680.
Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.
van Deth, J. W. (2003). Using Published Survey Data. In J. A. Harkness, F. J. R. van de Vijver & P. P. Mohler (Eds.), Cross-Cultural Survey Methods (pp. 329-346). New York: Wiley.