29
Accessing Statistics Canada Data and Resources Hugh McCague Valerie Preston Walter Giesbrecht Sara Tumpane

Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Accessing Statistics Canada Data and

Resources Hugh McCague

Valerie Preston

Walter Giesbrecht

Sara Tumpane

Page 2: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Outline

• Survey Terminology

• Research Data Centre (RDC)

• RDC versus Public Use Microdata Files (PUMF)

• Accessing the RDC

• Statistics Canada Surveys and Data

• Statistical Software

• Research Opportunities

• Statistical Consulting Service

• Resources

Page 3: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Some Survey Terminology

3

• Population

• Elements

• Sample: Simple Random Sample, Probability Sample

• Response Rate • Weights: Simple Weights

Page 4: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

4

• Demographics

• Strata

• Clusters (primary sampling units, PSUs)

• Complex Sample • Complex Weights, Bootstrap and Jackknife

Replicate Weights

Some Survey Terminology

Page 5: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

5

• Cross-sectional data

• Longitudinal data: periods, waves, cycles, trajectory, life course

• Attrition: attrition rate.

• Helpful reference: Ornstein, Michael. A Companion to Survey Research. London; Thousand Oaks, CA: SAGE, 2013.

Some Survey Terminology

Page 6: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Research Data Center (RDC)

• Access to Statistics Canada data and statistical software

• Microdata & administrative data

• For York students and faculty, access is free

• A “secure” environment • Researchers are “deemed employees” of Statistics Canada

• Must work in RDC

• CRDCN Network

Page 7: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

The CRDCN Network

Page 8: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

York RDC • 282 York Lanes

• Staffed by: • Analyst Sara Tumpane ([email protected])

• Assistant Theresa Kim ([email protected])

• 8 workstations

• Open 3-3.5 days/ wk

• http://www.isr.yorku.ca/rdc/

8

Page 9: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Before you apply to the RDC…

• Consider your options

• Is what you need in some more readily accessible source (either PUMF or aggregate file)

Page 10: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

RDC or PUMF? Confidential Microdata in Research

Data Centres Public Use Microdata Files accessed

online

Characteristics:

o Contains most of the original information collected during the survey

o Continuous variables are accessible o Longitudinal identifiers provided o Contains bootstrap weights used for

calculating exact variance

Characteristics:

o Manipulated by aggregating, capping, or deleting variables that could be “identifiers”; survey respondents cannot be identified

o Many continuous variables transformed into categorical variables

o Longitudinal identifiers stripped

Access is appropriate when: o Sensitive variables not provided in

PUMF o A PUMF does not exist o Longitudinal data is necessary o Analytical work is complex in

nature

Access is appropriate when: o Immediate data access is required o Analysis is for a course paper or

equivalent o Data exploration

Page 11: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

CCHS 2012 Example 1

PUMF Master File

• 1381 variables

• Sources of personal income o Employment inc.

o EI/Worker's comp

o Senior benefits

o Other

• 1815 variables • Sources of personal income

o wages and salaries o income from self-employment o dividends and interest o employment insurance o worker's compensation o CPP or QPP o job related retirement pensions o RRSP/RRIF o OAS and GIS o social assistance/welfare o child tax benefits o child support o alimony o other o none

Page 12: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

CCHS 2012 Example 2

PUMF Master File • Geography

o Province of residence of respondent-(G)

o Health Region - (G)

o B.C. Health Authority (BCHA) - (D)

• Geography o Province of residence of respondent o Postal code - (D) o Health region of residence of respondent - (D) o Sub-health region (Québec only) - (D) o Nova Scotia district health authority o British Columbia local health authority - (D) o Regional health authority (RHA) - Alberta - (D) o British Columbia health authority - (D) o Local health integrated networks - Ontario - (D) o 2006 census dissemination area o Federal electoral district - (D) o Census subdivision - (D) o Census division - (D) o Statistical area classification type - (D) o 2006 Census metropolitan area (CMA) o Health region peer group o Urban and rural areas o Urban and rural areas - 2 levels - (D) o Subzones for Alberta o Manitoba health authority - (D)

Page 13: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Accessing

PUMFs & master file metadata

• Statistics Canada Nesstar data portal o metadata only, for PUMFs and master files

o http://www62.statcan.ca/webview/

• YUL: Data & Statistics library guide o http://researchguides.library.yorku.ca/data

• <odesi> (OCUL) o http://www.library.yorku.ca/e/resolver/id/1165738

Page 14: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

http://www.andertoons.com/data/cartoon/6543/things-good-stuff-ok-i-reiterate-request-for-specific-data

Page 15: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

How to apply to an RDC and available datasets

• RDC Application Pages

• SSHRC Website

• Data available in the RDCs

Page 16: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Accessing the RDC

Action Timeline Notes

Apply through the SSHRC website

1-2 Hours Provide list of academic contributions; 5-10 page project proposal

Evaluation of the proposal

2-4 Weeks

Approval based on relevance of methods and data, and demonstrated need for microdata

Security screening process

1-3 Weeks for approval

Sign Microdata Research Contract

1-3 Weeks for approval

Page 17: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Project Proposal

• The project proposal is a maximum of ten pages and includes the following elements:

o Title of the Project

o Rationale and objectives of the study

o Proposed data analysis and software requirements

o Data requirements

o Expected project start and end dates

o Expected products

o References

Page 18: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Data at the RDC

• Canadian Community Health Survey (CCHS): 2001-2014 o Health status, health care utilization, and health determinants

• Annual Component (starting in 2001, N~130,000) • Mental Health (2002, 2012) N ~ 37,000 • Nutrition (2004) N ~ 35,000 • Healthy Aging (2008-2009) N ~ 52,000 (sample 45+)

• Canadian Health Measures Survey (CHMS): 2011, 2012, 2013 o Survey and administrative data

• Hate Crime Data (Pilot): 2010-2012

o Characteristics of hate-motivated criminal incidents, victims, and accused persons

Page 19: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Data (continued)

• General Social Survey (GSS): 1985-2014 • Health (1985, 1991)

• Time Use (1986, 1992, 1998, 2005, 2010)

• Victimization (1988, 1993, 1999, 2004, 2009, 2014)

• Education, Work and Retirement (1989, 1994)

• Family (1990, 1995, 2001, 2006, 2011)

• Caregiving and Care Receiving (1996, 2002, 2007, 2012)

• Access to and Use of Information Technology (2000)

• Social Networks/Social Identity (2003, 2008, 2013)

• Giving, Volunteering and Participating (2013)

• National Longitudinal Survey of Children and Youth (NLSCY): 8 cycles

o Development and well-being: birth - early adulthood o Follow-ups every two years to age 25

Page 20: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Data by Themes • Health and Health Care

• National Population Health Survey (NPHS)

• Participation and Activity Limitation Survey (PALS)

• Canadian Tobacco, Alcohol and Drugs Survey (CTADS)

• Occupations and Organizations • Workplace and Employee Survey (WES)

• Survey of Labour and Income Dynamics (SLID)

• Census

• Education • Youth in Transition Survey (YITS)

• National Graduates Survey (NGS)

• Race and Ethnicity • Aboriginal Peoples Survey (APS)

• Longitudinal Survey of Immigrants to Canada (LSIC)

• Ethnic Diversity Survey (EDS)

Page 21: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Pilot Data

• Canadian Cancer Registry (CCR)

• Vital Statistics

• Uniform Crime Reporting • Homicide Survey

• Hate Crime Data

• Ministry of Community and Social Services (MCSS)

• Citizenship and Immigration Canada (CIC)

Page 22: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Which Statistical Software to use at the York RDC?

Features to Consider

• SPSS 23

• SAS 9.4

• Stata 13

• R 3.0.3

Statistical Software Resources: Institute for Digital Research and Educations (idre), UCLA

http://www.ats.ucla.edu/stat/

Page 23: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

23

• Ames, M. E., Rawana J. S., Gentile P., and Morgan A. S.

“The protective role of optimism and self-esteem on depressive symptom pathways among Canadian Aboriginal youth.” Journal of Youth and Adolescence 44.1 (2013): 142-154.

• National Longitudinal Study of Children and Youth

• Complex Sample Design, Post-Stratification

• Longitudinal Linear Mixed Models with Mediation

An Example of a Psychology Research Project

at the York RDC

Page 24: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

24

• Extending methods to Complex Samples Designs

• Proper methods for the Structural Equation Modeling

of Complex Survey Data are strongly needed (Bollen et al., 2013)

• R package laavan.survey has started to address this issue (Oberski, 2014)

• Item Response Theory with Complex Survey Data needs much more development (Cyr and Davies, 2005)

A Few of Many

Quantitative Methods Research Opportunities

Page 25: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Statistical Consulting

Service (SCS)

25

• Statistical Consulting provided by a group of York faculty and graduate students with staff at the Institute for Social Research (ISR).

• Usually, no fee for York faculty and student researchers

• Online appointment scheduler

Page 26: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

http://truthfacts.com/truthfacts/2014/04/09

Page 27: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Statistical Consulting

Service (SCS)

27

• ISR/SCS Short Courses and Spring Seminar Series on data analysis, qualitative research methods, survey methods, and related software

• More details: http://www.isryorku.ca/centres/scs/

Page 28: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical

Contact Information and Resources

• http://www.isryorku.ca/qmforum

Page 29: Accessing Statistics Canada Data and Resources · • General Social Survey (GSS): 1985-2014 • Health (1985, 1991) • Time Use (1986, 1992, 1998, 2005, 2010) ... •Statistical