37
Dealing with key variables in social research: New options Paul Lambert, University of Stirling Presented to the ESRC DAMES node workshop on ‘Operationalising social science variables and using the GEODE, GEMDE and GEEDE services ’ University of Stirling, 30 th January 2012

Dealing with key variables in social research: New options

  • Upload
    zubin

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Dealing with key variables in social research: New options. Paul Lambert, University of Stirling Presented to the ESRC DAMES node workshop on ‘Operationalising social science variables and using the GEODE, GEMDE and GEEDE services ’ University of Stirling, 30 th January 2012. - PowerPoint PPT Presentation

Citation preview

Page 1: Dealing with key variables in social research: New options

Dealing with key variables in social research: New options

Paul Lambert, University of Stirling

Presented to the ESRC DAMES node workshop on ‘Operationalising social science variables and using the GEODE, GEMDE and GEEDE services ’

University of Stirling, 30th January 2012

Page 2: Dealing with key variables in social research: New options

Dealing with key variables in social research: New options

1) The idea of ‘key variables’

2) Contemporary challenges in analysing key variables

3) Three new contributions from e-Social Science: GEODE (www.geode.stir.ac.uk): data on occupations GEEDE (www.dames.org.uk/geede): data on education GEMDE (www.dames.org.uk/gemde): data on ethnicity

2

Page 3: Dealing with key variables in social research: New options

1) The idea of key variables

Recognition that certain factors are measured and analysed time after time in social surveys, and are routinely of relevance to statistical analyses [e.g. Burgess 1986]

By implication, there’s value in methodological reflection and consistency/re-use of measures between studies

A small range of academic reviews: Stacey (1969): Education; Family status; Income; Occupations Burgess (1986): Age, gender, ethnicity, health, education, occupational

class, employment status, leisure, politics, voluntary associations Hoffmeyer-Zlotnik and Wolff (2003): Occupation, education, age, religion,

ethnicity, household, family (cross-nationally)

3

Page 4: Dealing with key variables in social research: New options

Many resources on ‘key variables’ are by-products of data preparation or harmonisation

4

Page 5: Dealing with key variables in social research: New options

5

A daunting volume of resources... Data providers’ standards and documentation…

• Standards adopted in particular surveys: UKDA – www.data-archive.ac.uk ; CESSDA - www.cessda.org/

• Cross-national harmonisation: IPUMS www.ipums.org ; ISSP www.issp.org/ ; WVS www.worldvaluessurvey.org/ ; LIS www.lisproject.org ; ESS www.europeansocialsurvey.org

• Publications from ‘Small N’ as well as ‘Large N’ studies • [e.g. Charles & Grusky, 2004 on gender and occs; Wright 1997 on social class]

Resource providers’ recommendations/standards • ESDS – www.esds.ac.uk • Survey Network / Question Bank – Topics

(http://surveynet.ac.uk/sqb/topics/introduction.asp)• OECD - http://www.oecd.org/statsportal/ • Edacwowe - http://www.edacwowe.eu/en/• Harry Ganzeboom’s ISMF - www.harryganzeboom.nl/ismf/ismf.htm

Page 6: Dealing with key variables in social research: New options

6

Constructions of key variables in survey research

Are important… Major part of the hands-on work of survey analysis Central to many critiques of research/outputs

Existing reflections and resources Methodological comments [e.g. Stacey 1969; Burgess 1986] Validity and reliability; harmonisation and standardisation efforts Cross-nationally comparative research into ‘equivalence’

..But …. Attention to variables is marginalised in methodological

reviews, which focus on data and/or techniques [cf. Raftery 2001] Reviews/resources on variables often don’t give good advice

to those conducting complex statistical models of social processes

Page 7: Dealing with key variables in social research: New options

“Reviews/resources on variables often don’t give good advice to those conducting complex

statistical models of social processes”?

Univariate perspective and evaluations Inconvenient (categorical) functional forms

7

Unskilled

Skilled manual

Petty-bourg.

Non-manual

Salariat

Source: Females from LFS/GHS, using data from Li and Heath (2008)

percent of year category

Goldthorpe class scheme harmonised over time Large #’s of categories (e.g. 8 NS-SEC classes) Reliance on detailed source data (e.g. 351 SOCs + Emp. Stat) Ill-suited to arithmetic standardisation, modelling interaction effects, and temporal/x-national comparisons

Page 8: Dealing with key variables in social research: New options

“Reviews/resources on variables often don’t give good advice to those conducting complex

statistical models of social processes”?

Issues of interpretation:

8

0.0

5.1

.15

.2R

2 in

cre

men

t

E9E6

E5E3

E2G13

G11G10

G7G5

G3G2

K4R7

WRWR9

O17O8

O4MN

I9I99

CMCF

CM2CF2

CGISEI

SIOPAWM

WG1WG2

WG3GN1

Flexible hours in Sweden Promotion opps. in Britain

Source: BHPS and LNU 1991. Occupation-based social classifications as in Table 1. Logistic regression models predicting ERC using gender, quadratic age and occupation-based measure. Y-axis shows gain in R2 from adding occupation-based measure.

(1) Asymmetric validity evaluationse.g. occupation-based measures [Lambert/Bihagen 2012]

(2) Unrealistic forms of equivalence Assertion of ‘measurement equivalence’ is rarely convincing

Page 9: Dealing with key variables in social research: New options

2) Contemporary challenges in analysing key variables

…over the last 20 yrs… Vastly increasing volumes of microdata (esp. comparative)

Huge social surveys; routine data; x-natl./temporal comparisons

Increasing computational power/statistical optionsMultivariate statistical models; non-linear outcome models; mixed

and multi-process models

Increase in volume of academic studies Difficulty of reviewing previous approaches Proliferation, in practice, of variable operationalisations

Options for internet disseminationE.g. Journals’ ‘Electronic Supplementary Materials’

9

Page 10: Dealing with key variables in social research: New options

Parlous shortcuts: ‘Don’t get it right, get it published’?

Many of us navigate through the abundance of options by being (arbitrarily) selective, using convenient variables, and disregarding more intricate variable construction literatures

Though understandable, this isn’t good science!o Lack of documentation for replication [cf. Dale 2006]o Keep making new measures / re-inventing the wheel o Use unwieldy categorical measures (restricting models)o Misinterpret/erroneously estimate the effects of concepts

10

Claim: In our field, good science involves (a) sensitivity analysis of measures and (b) building models conceptually (rather than according to functional form of measures)

Page 11: Dealing with key variables in social research: New options

Focus on three especially important categorical variables: Education; Occupation; Ethnicity

Some standards Key correlates (r2)* Useful refs.

Educational qualifications

ISCED; GHS; years of study

year of birth (0.25); gender (0.04);

Schneider (2010); Treiman (2007)

Occupations NS-SEC; RGSC gender (0.41); age (0.05); region (0.02);

Rose and Harrison (2010)

Ethnicity ONS; Years since immigr.

year of birth (0.08); immigrant status (0.26); region (0.09)

Bosveld et al. (2006)

Problems for analysts include: Many alternative/competing measurement options (e.g. class schemes) Changing structural contexts (distributions; correlations; sparsity) Changing measurement practices (e.g. decennial revisions; admin data) Different needs in different studies (e.g. ISCO88-SOC; Grad. v’s schl quals)

* 2008 BHPS 20+yrs, qfedhi, jbsoc, ‘xeth’ using race{l} + 0.01 downwt for Wh

Page 12: Dealing with key variables in social research: New options

12

Measurement equivalence for comparisons – coding to the lowest common denominator?

Are the compatible categories equivalent? Who does the work (data distributors and/or analysts)…? …& who records it (e.g. Mohler et al., 2008)?

Page 13: Dealing with key variables in social research: New options

Educational qualification measures across UK birth cohorts

13

Still at school None or low school level or below

None Other (no more info)

CSE O-levels A-levels

School level qualifications Armed forces training

Armed forces training, CO Armed forces training, NCO

Armed forces training, private Other vocat., e.g. night school, evening cls.

Overseas post-school Other non-vocational

Other vocational Other diploma, certificate

Apprenticeship Apprenticeship only

Apprenticeship + FE qualification Other commercial

Commercial / secretarial, RSA etc Other vocational / technical

Technical City and Guilds, no more info

City and Guilds craft, ordinary City and Guilds advanced

City and Guilds full tech ONC, OND etc

Non-graduate diploma Other college technical

Other business, prof., etc (lower level) Nurse training

Graduate diploma Higher degree

Other univeristy / college Teacher Training

Other professional qualification (higher level) HNC, HND etc

Degree / attended university

1895 1905

1915 1925

1935 1945

1955 1965

1975

Birth cohort (med. age) 40 50 60 70 80 90

mAdv 1880-1910

mAdv 1970-1990

Qualfications distributions and relative change in relative gains by qualifcations. Slow Degrees dataset (social surveys 1963-2010), adults aged 25+, N=289k.

Page 14: Dealing with key variables in social research: New options

14

Meaning equivalence

For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence

(because of non-linear relations between categories and shifting underlying distributions)

(even if measurement equivalence seems possible)

Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context

For categorical data, this can be achieved/approximated by scaling categories in one or more dimension of difference

Page 15: Dealing with key variables in social research: New options

15

Managers and Administrators

Professional

Associate professional and technical

Clerical and secretarial

Craft and related

Personal and protective servicesSales

Plant and machine operativesOther occupations

.

higher degree

first degree

teaching qf

other higher qf

nursing qf

gce a levels

gce o levels or equiv

commercial qf, no o levels

cse grade 2-5,scot grade 4-5apprenticeship

other qf

no qf

.white

black-carib

black-african

black-other

indianpakistani

bangladeshi

chinese

other ethnic grp

2030

4050

0 1 2 3Source: British Household Panel Survey 2007, adults aged 18+ and father's Cambridge Scale score.Points at 1-3 show category mean. Points at 0 show individual values (scaled mean=28, sd=6; pop. mean=28, sd=18).

‘Effect proportional scaling’ using parents’ occupational advantage

Page 16: Dealing with key variables in social research: New options

Asserting that the ideal research using models involving key variables…

1) …derives and compares many plausible candidate variable operationalisations, and documents/publishes the sensitivity analysis

2) …thoroughly tests whether linear functional forms can be used, to facilitate complex modelling

Overheads include: substantial work in deriving and documenting measures against own interests to disseminate documentation hard to communicate linear effects to social scientists

..can e-Science come to the rescue..?

16

Page 17: Dealing with key variables in social research: New options

17

3) Three new contributions from e-Social Science

See e.g. Digital Social Research, www.digitalsocialresearch.net(e-Science broadly involves using emergent computer technologies with

enhanced capacities for communication/collaboration & data processing)

DAMES node – objective to support ‘data management’ in social research (e.g. organising/ manipulating/enhancing data; pre-analysis tasks; documentation/replication)

Three new ‘GESDE’ services (from www.dames.org.uk) Provide a dynamic, user-contributed library-style service for access

to specialist information resources concerning occupations, educational qualifications, ethnicity

Encourage researchers to use the three ‘portals’ to find and exploit suitable data, and contribute their own files

Page 18: Dealing with key variables in social research: New options

18

Contributions: Preserving/facilitating data management

Count

323 0 0 0 0 323

982 0 0 0 0 982

0 425 0 0 0 425

0 1597 0 0 0 1597

0 0 340 0 0 340

0 0 3434 0 0 3434

0 0 161 0 0 161

0 0 0 1811 0 1811

0 0 0 0 2518 2518

0 0 0 331 0 331

0 0 0 0 421 421

0 0 0 257 0 257

102 0 0 0 0 102

0 0 0 0 2787 2787

138 0 0 0 0 138

1545 2022 3935 2399 5726 15627

-9 Missing or wild

-7 Proxy respondent

1 Higher Degree

2 First Degree

3 Teaching QF

4 Other Higher QF

5 Nursing QF

6 GCE A Levels

7 GCE O Levels or Equiv

8 Commercial QF, No OLevels

9 CSE Grade 2-5,ScotGrade 4-5

10 Apprenticeship

11 Other QF

12 No QF

13 Still At School No QF

Highesteducationalqualification

Total

-9.001.00

Degree2.00

Diploma

3.00 Higherschool orvocational

4.00 Schoollevel orbelow

educ4

Total

Page 19: Dealing with key variables in social research: New options

DAMES provides online services for data coordination/organisation

Tools for handing variables in social science data

Recoding measures; standardisation / harmonisation; Linking; Curating

19

Page 20: Dealing with key variables in social research: New options

GESDE – Search and browse supplementary data on occupations; educational qualifications; ethnicity

20

Page 21: Dealing with key variables in social research: New options

The data curation tool

21

The curation tool obtains metadata and supports the storage and organisation of data resources in a more generic way (‘DDI’ format metadata)

It includes an ‘IRODS’ file storage system allowing users to upload files and access their own and others’ files

Page 22: Dealing with key variables in social research: New options

22

(a) Using GEODE for data on occupations

(ii) then find ways to attaching summary information about occupations to occupational unit groups

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

Vin

gtile

s

male femalemaximum: 5799

frequency

CAMSIS

routine occupations

semi-routine occupations

lower supervisory and technical

small employers and own account workers

intermediate occupations

lower managerial and professional

higher managerial and professional

male femalemaximum: 9764

frequency

NS-SEC

Source: Labour Force Survey Jan-Mar 2008, current job of employed (18yrs+)

(i) Usually start with information about detailed ‘occupational unit group’

Page 23: Dealing with key variables in social research: New options

GEODE (v1) – Occupational data

Page 24: Dealing with key variables in social research: New options

Becomes easier to derive and compare multiple variable operationalisations, and for researchers to deposit new tools on occupations, educational qualifications and ethnicity

24

0.0

5.1

.15

ES5

ES2E9

E6E5

E3E2

G13G11

G10G7

G5G3

G2K4

R7WR

WR9O17

O8O4

MNI9

I99CM

CFCM2

CF2CG

ISEISIOP

AWMWG1

WG2WG3

GN1

Increase in R-squared Increase in BIC

Britain

-.05

0.0

5.1

.15

ES5

ES2E9

E6E5

E3E2

G13G11

G10G7

G5G3

G2K4

R7WR

WR9O17

O8O4

MNI9

I99CM

CFCM2

CF2CG

ISEISIOP

AWMWG1

WG2WG3

GN1

Sweden

Source: BHPS and LNU 1991, adults aged 23-55 in work in 1991, N=4536 Britain, 2504 Sweden. Model 1: ISEI = linear age + gender ; Model 2: ISEI = (Model 1) + occupation-based social classificationGraph shows improvement in R2 for OLS regression, Model 2 v's Model 1,plus scaled BIC statistic (Model 2 BIC - Model 1 BIC / Model 1 BIC). Unweighted data.

Model of parental occupational advantage predicted by own occupation and by gender and age, Britain in 1991 (from Lambert & Bihagen 2012)

Page 25: Dealing with key variables in social research: New options

(b) Using GEEDE for data on educational qualifications

‘Educational unit groups’ are qualification listings in UK/beyond• British Qualifications (Kogan Page, 2010)• Qualifications categories of major surveys

• BHPS, LFS, Census, etc• LFS time series standard measure

• UCAS degree codes

• ISCED: International Standard Classification of Education • (cf. Schneider, 2008)

• IPUMS: Census measures over 100 years and 65 countries• LIS: LFS measures over 50 years and 30 countries

25

Page 26: Dealing with key variables in social research: New options

Project specific documentation is often well distributed

e.g.

www.lisproject.org

26

Page 27: Dealing with key variables in social research: New options

Example of a new measures: A CAMSIS model for educational qualifications

A common way of scaling occupational data is to analyse social interaction patterns between the incumbents of occupations and depict the dimension of social interaction distance as an indicator of stratification

CAMSIS approach (www.camsis.stir.ac.uk) Neutral empirical approach, independent of

occupational units, comparable across contexts (e.g. Prandy and Jones, 2001)

Same analysis could be applied to qualifications data (see Lambert 2012)

27

Page 28: Dealing with key variables in social research: New options

Model for all BHPS

28

-2 -1 0 1 2

12. no qf

11. other qf

10. apprenticeship

9. cse grade 2-5,scot grade 4-5

8. commercial qf, no o levels

7. gce o levels or equiv

6. gce a levels

5. nursing qf

4. other higher qf

3. teaching qf

2. first degree

1. higher degree

Males

All M M, 20-50 M, 51-100 N/1000

-1 0 1 2 3

12. no qf

11. other qf

10. apprenticeship

9. cse grade 2-5,scot grade 4-5

8. commercial qf, no o levels

7. gce o levels or equiv

6. gce a levels

5. nursing qf

4. other higher qf

3. teaching qf

2. first degree

1. higher degree

Females

All F F, 20-50 F, 51-100 N/1000

Source: BHPS Wave 18, correspondence analysis for cohabiting adults aged 20-100

Homogamy scale for BHPS education levels

Page 29: Dealing with key variables in social research: New options

29

Page 30: Dealing with key variables in social research: New options

Working with ethnicity data in surveys is hard…

- sparse - collinear (e.g. to age, location)

- dynamic (cf. comparative research)

30

(c) Using GEMDE for data on ethnicity

Page 31: Dealing with key variables in social research: New options

GEMDE supports replicability/transparency by promoting data on MUGs and MIRs…

Information about ‘Minority Unit Groups’ and ‘Minority Information Resources’

Document your own recodes, notes, etcAccess somebody else’s recodes/notes/metadata Identify commonly used recodes (& use them..!)

31

Page 32: Dealing with key variables in social research: New options
Page 33: Dealing with key variables in social research: New options

What's a MIR?

'Minority Information Resource'. o This is our own terminology. By a MIR, we mean any piece of information

which supplies systematic data on a minority unit group (MUG) classification. We've used this term to be deliberately similar to the phrase 'Occupational Information Resources' that we used on GEODE

E.g. summary statistical data about the categories from and documentation or information

E.g. recodings which have been used in a particular studyo Social scientists are not in general aware of the existence of MIRs (cf. wide

use of popular Occupational Information Resources). In GEMDE we seek to publicise little know resources and promote their uptake: We argue that better communication and dissemination of MIRs is in fact an important step towards better scientific practice of replication and standardisation of research.

In our terms, every MIR necessarily links to a MUG (but not every MUG has a MIR).

Page 34: Dealing with key variables in social research: New options

Screenshot here!

34

Page 35: Dealing with key variables in social research: New options

35

As well as file/info. retrieval and depositing functions, GEMDE also permits some bespoke data analysis

Page 36: Dealing with key variables in social research: New options

Summary: Dealing with key variables in social research

Plurality of variable options makes a scientific case for derivation, comparison & documentation

+ Relevance of scaling categorical data

DAMES’ GESDE services for storing metadata and facilitating variable derivation, documentation

Send your metadata into GESDE..!

See the practical session…

Various other projects also exist supporting variable analysis MethodBox (www.methodbox.org) and ADLS (www.adls.ac.uk) e-Stat project’s e-Books (www.bristol.ac.uk/cmm/research/estat/)

36

Page 37: Dealing with key variables in social research: New options

References Bosveld, K., Connolly, H., Rendall, M. S. (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for

National Statistics. Burgess, R. G. (Ed.). (1986). Key Variables in Social Investigation. London: Routledge. Charles, M., & Grusky, D. B. (2004). Occupational Ghettos: The Worldwide Segregation of Women and Men. Stanford: Stanford University

Press. Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158. Hoffmeyer-Zlotnik, J. H. P., & Wolf, C. (Eds.). (2003). Advances in Cross-national Comparison: A European Working Book for

Demographic and Socio-economic Variables. Berlin: Kluwer Academic / Plenum Publishers. Kogan Page Editorial Staff. (2010). British Qualifications 2010: A Complete Guide to Professional, Vocational and Academic Qualifications

in the UK. London: Kogan Page. Lambert, P.S. and Bihagen, E. (2012 under review) Concepts and Measures in Occupation-based Social Classifications. Lambert, P. S. (2012). Comparative scaling of educational categories by homogamy – Analysis of UK data from the BHPS . Stirling:

University of Stirling, Technical paper 2012-1 of the DAMES Node, Data Management through e-Social Science, www.dames.org.uk/publications.html.

Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester: UK Data Archive [distributor], SN: 5666.

Mohler, P. P., Pennell, B.-E., & Hubbard, F. (2008). Survey Documentation: Toward professional knowledge management in sample surveys. In E. De Leeuw, J. Hox & D. A. Dillman (Eds.), International Handbook of Survey Methodology (pp. 403-420). Hove: Psychology Press.

Prandy, K., & Jones, F. L. (2001). An international comparative analysis of marriage patterns and social stratification. International Journal of Sociology and Social Policy, 21, 165-183.

Raftery, A. E. (2001). Statistics in Sociology, 1950-2000: A selective review. Sociological Methodology, 31, 1-46. Rose, D., & Harrison, E. (Eds.). (2010). Social Class in Europe: An Introduction to the European Socio-economic Classification London:

Routledge. Schneider, S. L. (2010). Nominal comparability is not enough: (In-)Equivalence of construct validity of cross-national measures of

educational attainment in the European Social Survey. Research in Social Stratification and Mobility. Simpson, L., & Akinwale, B. (2006). Quantifying Stablity and Change in Ethnic Group. Manchester: University of Manchester, CCSR

Working Paper 2006-05. Stacey, M. (Ed.). (1969). Comparability in Social Research. London: Heineman (on behalf of the British Sociological Association). Treiman, D. J. (2007). The Legacy of Apartheid: Racial Inequalities in the New South Africa. In A. F. Heath & S. Y. Cheung (Eds.), Unequal

Chances: Ethnic Minorities in Western Labour Markets. Oxford: Oxford University Press, for the British Academy. Wright, E. O. (1997). Class Counts : Comparative Studies in Class Analysis. Cambridge: Cambridge University Press.

37