31
Disclosure Analysis: What do RDC Analysts do? Research Data Centre Program, Statistics Canada James Chowhan Ontario DLI Training, Queen's University 06-04-04

Disclosure Analysis: What do RDC Analysts do?

  • Upload
    ziarre

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Disclosure Analysis: What do RDC Analysts do?. Research Data Centre Program, Statistics Canada James Chowhan Ontario DLI Training, Queen's University 06-04-04. Note:. The following slides are not intended for use as documentation of disclosure risk control and practices. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: Disclosure Analysis: What do RDC Analysts do?

Disclosure Analysis:What do RDC Analysts do?

Research Data Centre Program, Statistics Canada

James Chowhan Ontario DLI Training, Queen's University

06-04-04

Page 2: Disclosure Analysis: What do RDC Analysts do?

Note:

The following slides are not intended for use as documentation of disclosure risk control and practices.

Page 3: Disclosure Analysis: What do RDC Analysts do?

Outline

What do Analysts do? role in general disclosure

Page 4: Disclosure Analysis: What do RDC Analysts do?

Role in General

Four Main Tasks:» Administration of Centre» Research Activities» Liaisons» Disclosure Risk Assessment

None of these tasks are mutually exclusive.

Page 5: Disclosure Analysis: What do RDC Analysts do?

Centre Administration

Client administration» Contract management, creating a culture of

confidentiality

Data Administration (managing data sets)» STC micro-data and other data sources

Computer Network Administration» Setting up new users, archiving, back-ups, etc…

Physical security maintenance

Page 6: Disclosure Analysis: What do RDC Analysts do?

Research Activities

Proposing, defining and carrying out research projects as an individual or as a part of a team

Contributions to STC flag-ship publications» ie. Canadian Social Trends, Canadian Economic Observer,

Health Reports, Juristat, Perspectives on Labour and Income

RDC Information and Technical Bulletin» Forum for current and prospective RDC users to exchange

practical information and techniques for analyzing datasets available at the RDCs

Page 7: Disclosure Analysis: What do RDC Analysts do?

Liaisons

Liaise: with DLI

» referrals and promotion

with researchers » consult on proposals & projects

with STC and SMA» with methodologists consultations on content and methods

Page 8: Disclosure Analysis: What do RDC Analysts do?

Disclosure

Disclosure Analysis Types of Data Overview of data confidentiality Different types of disclosure and output Some examples Facing the challenge

Page 9: Disclosure Analysis: What do RDC Analysts do?

What is Disclosure Analysis?

Disclosure Analysis concerns assessing the risk related to the attribution of information to a respondent, whether the respondent is an individual or an organization.

Page 10: Disclosure Analysis: What do RDC Analysts do?

Types of microdata

Analytical (confidential) database with direct identifiers removed

» Direct access – authorized employee/deemed employee only (RDC)

» Indirect data access (Remote Access services/Remote Data Access services) - screening

Data Reduction – e.g. PUMF

Page 11: Disclosure Analysis: What do RDC Analysts do?

Public Use Microdata File (PUMF)What are some of the differences?

Files of anonymous individual records Created for public research purposes Follows Statistics Canada’s Policy on Micro-

data Release Expect some forms of data reduction and

suppression Expect suppression of sample design

information (cluster, stratification, etc.)

Page 12: Disclosure Analysis: What do RDC Analysts do?

PUMF disclosure risk control

Suppress some indirect identifiers (e.g. small geographical code, race details, etc.)

Avoid unique combination of indirect identifiers that can disclose a response unit (such as gender, age, occupation, chronic conditions, religion, etc.)

Perform Univariate analyses and look for outliers

Sometimes maximum/minimum values are capped

And more…

Page 13: Disclosure Analysis: What do RDC Analysts do?

Why is keeping data confidential so important?

Retain and Respect Public Trust Most household/population surveys do not

have mandatory participation Respondents volunteer their time and

information Respondents trust Statistics Canada to

ensure their privacy and the confidentiality of their information

To ensure future data collection

Page 14: Disclosure Analysis: What do RDC Analysts do?

Confidentiality and Protection

Under the Statistics Act, Statistics Canada must protect the confidentiality of respondents’ data and identity.

Protections: Physical protection of the data storage area Protection of the computer systems Enforcement of data releasers’ and users’

responsibilities to protect respondent confidentiality Disclosure analysis on output that leaves the

restricted data storage area

Page 15: Disclosure Analysis: What do RDC Analysts do?

What types of Information?

Direct Identifiers (name, address, health number, etc.) that uniquely identify a respondent. These are all stripped from released data files.

Indirect Identifiers refer to variables such as age, marital status, occupation, ethnicity, postal code, type of business etc.). When combined they could be used to identify a respondent.

Sensitive variables refer to information or characteristics relating to a respondent’s private life or business which are usually unknown to others (income, illness, behaviour etc.).

Page 16: Disclosure Analysis: What do RDC Analysts do?

The concern is…

Combining indirect identifiers with sensitive variables poses a disclosure risk, but…

It is usually what researchers like to do to relate specific characteristics of some response

groups to some specific activities/characteristics and how/why they are related

Control method: restricted access, data reduction, disclosure analysis …

Page 17: Disclosure Analysis: What do RDC Analysts do?

Identity Disclosure

Identity Disclosure - When a respondent can be identified from the released data.

Combine identifier with sensitive variables

Examples: Recognition of well-known characteristic by others

(e.g. from small well-defined sample) Self-recognition (e.g., respondent identifies themselves

in released output)

Page 18: Disclosure Analysis: What do RDC Analysts do?

Attribute Disclosure

Attribute Disclosure - When confidential information is revealed and can be attributed to an individual or a group.

Such as, all persons with characteristic x have characteristic y

Examples:

People in occupation W make $ 50-60,000/year… 100% of the respondents of age W in area X reported

that they experimented with …

Page 19: Disclosure Analysis: What do RDC Analysts do?

Residual Disclosure

Residual disclosure - when confidential information is disclosed by combining previously released output and information.

Extra care is needed where risk of residual disclosure is high, such as

Subsequent cycles of longitudinal data files (e.g. NLSCY, NPHS, etc.)

Sample from dependent surveys (e.g. SLID and LFS) Research projects using the same data file Overlapping small geographical area (e.g. Health Region and

Economic Region)

Page 20: Disclosure Analysis: What do RDC Analysts do?

Related Outputs (and residual disclosure)

If PUMF as well as analytical outputs using confidential data are released for the same survey, the combined published results should not disclose sensitive information about individual respondents that was suppressed in the PUMF.

That is, from the reported results, it should not be possible to infer information that allows the identification of a PUMF respondent.

Page 21: Disclosure Analysis: What do RDC Analysts do?

Types of outputs (two main types)

Multivariate Analysis (e.g. inferential statistics/model output)

Model parameters such as, regression coefficients, etc.

Hypothesis test results such as, standard errors, p-value, t-statistics, etc.

Descriptive studies (e.g. table output) Frequencies, percentiles, cross-tabulation,

correlation matrix, etc.

Page 22: Disclosure Analysis: What do RDC Analysts do?

To lower disclosure risk

General rules we follow for household sample surveys: Do not report statistics or table cells with small number

of respondents (e.g. fewer than 5 respondents) No anecdotal information may be given about specific

respondents ‘Zero’ and ‘Full’ cell restriction Min. and Max. value restriction Saturated models, covariance/correlation matrices

treated like underlying tables And more…..

Page 23: Disclosure Analysis: What do RDC Analysts do?

Some examples…

Page 24: Disclosure Analysis: What do RDC Analysts do?

Low frequency cells

F, 0 is a low frequency cell.

Solution?

Collapse column ‘M’ and ‘F’ = column ‘total’

Collapse row ‘1’ and ‘0’ = row ‘total’

Report either column ‘M’ and row ‘1’ but not along with the ‘total’

M F total

1 34 14 48

0 15 2 17

total 49 16 65

M F total

1 34 14 48

0 15 X 17

total 49 16 65

Page 25: Disclosure Analysis: What do RDC Analysts do?

Frequency distributions

Frequency curve, e.g.: user wishes to release the the value of observation at the 99th percentile

* child 1: family 1

child 2: family 1

child 3: family 2

child 4: family 2

child 5: family 3….

If < 5 respondents are above the 99th percentile, there is a problem. One solution is to describe the distribution using the 95th percentile.

* If the survey is multilevel (NLSCY), then the 5 or more respondents from level 1 (child) must come from at least 3 different units from level 2 (household).

Page 26: Disclosure Analysis: What do RDC Analysts do?

‘Zero’ and ‘Full’ cell

(F, 1) is a full cell (F, 0) is a non-structural

zero cell Both could pose

confidentiality problem

(Married, age <12) is a structural zero cell

Not a data confidentiality problem

Not expect anyone to be in this category

M F total

1 52 64 116

0 13 0 13

65 64 129

age married single total

<12 0 40 40

13-20 5 35 40

>20 32 8 40

37 83 120

Page 27: Disclosure Analysis: What do RDC Analysts do?

Implied tables - residual disclosure

Implied tables are tables produced by subtracting results from one or more published tables from another published table

In this example, ‘non-married’ individuals can easily be calculated

Select if Married = 1

Yes No1 2013 40

2 205 35

3 132 8

2350 83

Select all cases

Yes No1 2020 41

2 209 52

3 430 16

2659 109

Page 28: Disclosure Analysis: What do RDC Analysts do?

When reporting information…

Writing a report is no different than working with table output, avoid statements such as:

“… responded incomes ranging from $2,498 to $579,789.”

If necessary, give general indications (e.g. “no income was above $600,000”.)

“… all respondents of age 16 reported experimenting with drugs.”

This is equivalent to a full cell situation.

Page 29: Disclosure Analysis: What do RDC Analysts do?

Facing Challenges

No single control of all the releases Remote Access, PUMFs, RDCs, survey

data publications, etc. Potential residual disclosure Can residual disclosure be totally

accounted for?

Page 30: Disclosure Analysis: What do RDC Analysts do?

The End

James Chowhan

E-mail: [email protected]

Phone: (905) 525-9140 x.27967

Web-sites:

www.statcan.ca/english/rdc/index.htm

http://socserv.socsci.mcmaster.ca/rdc/

Page 31: Disclosure Analysis: What do RDC Analysts do?

Ancillary Slides