Upload
brittney-fields
View
216
Download
0
Embed Size (px)
Citation preview
Incapacitation, Recidivism and Predicting Behavior
Easha AnandIntro. To Data Mining
April 24, 2007
Background
Crime Control Act of 1984 and USSC Idea in U.S. is deterrence, rather
than punishment Tending toward formulae—USPC in
D.C. uses 14 variables U.S. prison pop. topped 2 million,
parole/probation topped 7 million
Strategies for Incapacitation Charge-based
Historically the case; most USSC guidelines
Selective USPC and D.C. Code offenders—based
on individual’s characteristics New research focuses on “criminal
career” and predicting patterns therein (participation, frequency, seriousness, length, patterning)
Rationale The tendency is toward objective
decision-making processes to improve accuracy.
More and more variables codified as we can track offenders.
Sophistication of statistical methods used to combine predictors seems to be relevant to outcomes.
The Dataset 6,000 men incarcerated in the 1960s,
chosen at random Collected life history info, official
institutional record, inmate questionnaire, psychological tests
26 years later, followed up with Bureau of Criminal Statistics
Offenses characterized along six dimensions: Nuisance, physical harm, property damage, drugs, fraud, crimes against social order
Used 4,897 records
Dataset (cont’d)
Original Offense
Burglary
Armed Robbery
Homicide
Other Violent
OffensesNarcotics
Forgery
Other
Final Dataset
Purged
Usable
DiedUnusable
Problems With Data
Dichotomous dependent variable for behavior?
Purging = potential bias Done after age 70 OR When 10 years arrest-free
No record of out-of-state crimes
Philosophical Problems
Metric for success False positives: 30,000 arrests could
have been prevented! False negatives: 1,413 people jailed
unnecessarily… Reduced crime could have to do with
repentance, increased policing, age, etc. and not with incapacitation at all
Data Pre-processing Only used records where had both
1962 and 1988 data Priors: # of previous convictions
weighted by severity of crime PriorsP: # of previous periods of
incarceration weighted by length Inst_(M,P,V,F,etc.): # of arrests
weighted by severity of crime in each of six categories
# of Arrests to Desistance (R^2 = .159)
Predictor Regression Coeff Standardized Reg. Coeff
T
Priors 1.115 .270 11.02
Age -.104 -.144 -6.39
Drugs -2.155 -.154 -7.94
Serious -.015 -.058 -2.92
Free -.899 -.062 -3.18
PriorsP -.413 -.085 -2.37
Type -.706 -.05 -2.31
Alias .343 .046 2.31
# of Arrests to Desistance (Violent Crimes Only—n=1,998)
Predictor Regression Coeff Standardized Reg. Coeff
T
Priors -.022 -.174 -7.85
Age .134 .184 7.45
InstP .253 .076 3.35
PriorsP -.066 -.077 -2.91
R^2 = .061; p<.05
What Next? Multiple Linear Regression
Try using different things as class—nuisance only, arrest rate, crime-free time
Try different predictors—have 119 variables
BUT No reason to believe predictors are
linearly independent No reason to believe non-linear
correlation
What Next?
Better technique: Decision trees “White Box” model mimics human
decisionmaking Use some kind of feature-selection
algorithm? Maybe ensemble learning, once
feature-selection is in place?
Acknowledgements
Trevor Gardner, UC Berkeley Don Gottfredson, Rutgers
University Bureau of Criminal Statistics