22
1 Sarah Franklin October 30 th , 2013 Disclosure Control for Tables of Frequency Counts using Administrative Justice Data

Sarah Franklin October 30 th , 2013

  • Upload
    lazaro

  • View
    20

  • Download
    0

Embed Size (px)

DESCRIPTION

Disclosure Control for Tables of Frequency Counts using Administrative Justice Data. Sarah Franklin October 30 th , 2013. Overview of presentation. In 2013, the Canadian Centre for Justice Statistics (CCJS) placed two administrative crime surveys in the Research Data Centres (RDCs) - PowerPoint PPT Presentation

Citation preview

Page 1: Sarah Franklin October 30 th , 2013

1

Sarah Franklin

October 30th, 2013

Disclosure Control for Tables of Frequency Counts using Administrative Justice Data

Page 2: Sarah Franklin October 30 th , 2013

Overview of presentation

In 2013, the Canadian Centre for Justice Statistics (CCJS) placed two administrative crime surveys in the Research Data Centres (RDCs)

Methodologists and subject matter experts developed a scoring approach for tables of frequency counts to identify ‘safe’ tables

Each variable in a dataset is assigned a sensitivity score. A table’s overall score is the sum of the variable scores. If the score is below a given threshold, the table is safe.

2

Page 3: Sarah Franklin October 30 th , 2013

Uniform Crime Reporting Incident-based Survey (UCR) and

the Homicide Survey Administrative datasets Mandatory reporting by all police services Criminal incidents substantiated by the police UCR is a sample of crime data

• not all crime comes to the attention of the police• over 2 million incidents of crime annually

Homicide Survey data more sensitive than UCR• All homicides; 543 homicides in 2012

Information on incident, victim(s), accused(s)

3

Page 4: Sarah Franklin October 30 th , 2013

UCR , Homicide variables available to researchers

most serious violation for the incident of crime (e.g., homicide, robbery, mischief)

geography (region, province, CMA) location (e.g., residential home, convenience

store) weapon causing injury (e.g., handgun, knife) relationship between victim and accused age and sex of victim and/or accused clearance status (accused charged vs cleared

otherwise)4

Page 5: Sarah Franklin October 30 th , 2013

Publicly available STC police-reported crime data

UCR and homicide data available to all Canadians: CANSIM tables (very aggregate) Tables and graphs appearing in Juristat articles Custom tabulations upon request

5

Page 6: Sarah Franklin October 30 th , 2013

Homicides by CMA 2011 2012

victims rate per 100,000

victims rate per 100,000

Edmonton 50 4.2 33 2.7

Toronto 86 1.5 80 1.4

St. John’s 4 2.1 0 0

Montréal 54 1.4 47 1.2

Ottawa 11 1.2 7 0.7

Kingston 0 0 0 0

Saguenay 1 0.7 4 2.7

Trois-Rivières 1 0.7 2 1.3

Sherbrooke 1 0.5 1 0.5

Moncton 0 0 0 0

Québec 3 0.4 6 0.8

Brantford 2 1.4 0 0

Canada 598 1.7 543 1.66

Page 7: Sarah Franklin October 30 th , 2013

2009 RDC Pilot - Homicide Survey

Homicide Survey was available through RDCs Results positive, 4 proposals submitted and 3

research reports completed Researchers commented on the ease of use of

data file, documentation and wealth of data/information

Researchers noted that vetting of data tables too long

RDC analysts noted that data disclosure rules difficult to implement and required additional work

7

Page 8: Sarah Franklin October 30 th , 2013

Statistics Act, paragraph 17(1)(b):No person […] shall disclose […] any information obtained under this Act in such a manner that it is possible from the disclosure to relate the particulars obtained from any individual return to any identifiable individual person, business or organization.

Main disclosure issues:• Identity disclosure: can identify an individual• Attribute disclosure: learn something new

Group attribute disclosure: learn something about a group

• Residual disclosure: disclosure by combining results

8

Disclosure Issues : What are we concerned about?

Page 9: Sarah Franklin October 30 th , 2013

RDC disclosure control rules for tabular administrative justice

data Scoring approach developed by the Institut de la

Statistique du Québec and is used by all STC administrative datasets in the RDCs• assign a sensitivity score to each variable• table’s score = sum of variables’ scores• if table score greater than a threshold value, cannot

release table Go back and use more aggregated variables with lower

scores

Or• perform controlled rounding

9

Page 10: Sarah Franklin October 30 th , 2013

10

Reviewed all variables to appear on the RDC files

Identified variables to be excluded due to:

unique identifiers• name of victim/accused, date of birth of victim/ accused,

fingerprint of accused, incident file identifier

data quality issues • aboriginal variable, firearm information (registered,

licensed)

too sensitive• homicide victim was pregnant, blood alcohol level of

homicide victim, person accused of homicide has suspected mental or developmental disorder

Page 11: Sarah Franklin October 30 th , 2013

11

Aggregated sensitive codes of variables

Incident clearance status (UCR, Homicide Survey)• suicide of accused → cleared otherwise

Most serious violation aggregationsHomicide Survey• 1st degree murder, 2nd degree murder → murder• manslaughter, infanticide → other homicides

UCR• sexual violations against children → other sexual

assaults

Page 12: Sarah Franklin October 30 th , 2013

12

Scored all UCR variables to appear on the RDC files

0 = not sensitive• region=national; sex of victim/accused; vehicle type;

target vehicle; motor vehicle recovered; fraud type; property stolen; location of incident; attempted vs completed violation; most serious weapon status

1-7 = sensitive but can be used in a table

8 = sensitive, cannot appear in a table • police service id, exact date of incident

Table threshold: ≤7 pass; ≥ 8 fail

Page 13: Sarah Franklin October 30 th , 2013

13

Sensitive variables on the UCR, Homicide surveys

Variables deemed sensitive (score 1-7) geography (region, province, CMA) age of victim/accused (aggregated, detailed) most serious violation (aggregated) most serious weapon (aggregated, detailed) clearance status (aggregated, detailed) level of injury (aggregate, detailed) relationship of victim and accused (detailed,

aggregated)

Page 14: Sarah Franklin October 30 th , 2013

Detailed relationship between victim and accused (score=4)

14

Homicide victim was killed by:

Husband/wife Separated husband/wife

Common-law husband/wife Separated common-law h/w

Divorced husband/wife Extra-marital lover

Same-sex spouse ex same-sex spouse

Father/mother Step-father/mother

Son/daughter Step-Son/step-daughter

Sister/brother Other family

Close friend Other intimate relationship

Authority figure Neighbour

Criminal relationship Business relationship

stranger Casual acquaintance

unknown Other

Page 15: Sarah Franklin October 30 th , 2013

Aggregated relationship between victim and accused (score=3)

Homicide victim was killed by: Family – spouse Family – parent Family – other Other intimate relationship Casual acquaintance Criminal relationship Stranger Other Unknown, n/a

15

Page 16: Sarah Franklin October 30 th , 2013

Factors considered when scoring a variable

Scores, thresholds consistent across surveys Maximum number of dimensions for RDC tables

• 8 dimensions for UCR; 3 for Homicide

Homicide data: single year vs 10 year data Wanted scores to work for all CCJS tables:

• UCR scores: passed all CANSIM and Juristat tables• Homicide scores: passed all CANSIM tables but not

all Juristat tables

16

Page 17: Sarah Franklin October 30 th , 2013

Factors considered when scoring a variable

Principle behind scoring approach:• table is safe as long as sensitive characteristics

cannot be attributed to a person or a group

Scrutinized tables with scores < 8 for sensitive characteristics revealed through:

Identity disclosure• Examined cells with counts of 1 or 2

Attribute disclosure• Examined full cells, zero cells

17

Page 18: Sarah Franklin October 30 th , 2013

extract of UCR table with score=7 Sexual violation incidents, victim=female age 25-34, accused=male, Canada, 2011

relationship Weapon causing injury

Unknown physical force

Firearm knife other n/a Total

friend 1 31 0 0 1 51 84

Business 1 10 0 0 0 74 86

Criminal 0 4 0 0 0 0 4

Casual 7 90 0 0 4 190 291

Stranger 0 52 0 0 2 218 273

Step-parent 0 3 0 0 0 10 13

Step-child 0 0 0 0 0 3 3

Other intimate 1 3 0 0 0 8 12

Neighbour 1 2 0 0 0 19 22

Total 28 417 0 4 18 839 1,30618

Page 19: Sarah Franklin October 30 th , 2013

Status of UCR, Homicide RDC pilots

UCR: Crime data for 2007-2011 available in RDCs 7 research proposals submitted and accepted Disclosure control vetting committee for the pilot

• ensure disclosure control rules applied correctly• evaluate/fine tune disclosure control approaches

Homicide: Homicide data for 1961-2011 available in RDCs

19

Page 20: Sarah Franklin October 30 th , 2013

Pros and cons of scoring

Pros• easy for RDC researchers and CCJS to apply rules• rules are consistently applied• no distortion of data

Cons• determining scores and thresholds is time-consuming• difficult to determine scores if lots of variables or

variables have lots of categories• for Homicide, the pass/fail scoring approach for RDCs

is very restrictive • not immune to residual disclosure

20

Page 21: Sarah Franklin October 30 th , 2013

Conclusion

The scoring approach for frequency counts works well:

for crime-reported data and effectively mimics subject matter experts’ judgement when vetting

for census administrative data with an extensive history of published tables that set the standard for releasing tables

when there are a manageable number of variables and categories within variables

Once developed, the scoring approach is easy to apply

21

Page 22: Sarah Franklin October 30 th , 2013

For more information, please contact /

Pour plus de renseignements, veuillez contacter:

Sarah Franklin

[email protected]

22