Incapacitation, Recidivism and Predicting Behavior Easha Anand Intro. To Data Mining April 24, 2007

Incapacitation, Recidivism and Predicting Behavior

Easha AnandIntro. To Data Mining

April 24, 2007

Background

Crime Control Act of 1984 and USSC Idea in U.S. is deterrence, rather

than punishment Tending toward formulae—USPC in

D.C. uses 14 variables U.S. prison pop. topped 2 million,

parole/probation topped 7 million

Strategies for Incapacitation Charge-based

Historically the case; most USSC guidelines

Selective USPC and D.C. Code offenders—based

on individual’s characteristics New research focuses on “criminal

career” and predicting patterns therein (participation, frequency, seriousness, length, patterning)

Rationale The tendency is toward objective

decision-making processes to improve accuracy.

More and more variables codified as we can track offenders.

Sophistication of statistical methods used to combine predictors seems to be relevant to outcomes.

The Dataset 6,000 men incarcerated in the 1960s,

chosen at random Collected life history info, official

institutional record, inmate questionnaire, psychological tests

26 years later, followed up with Bureau of Criminal Statistics

Offenses characterized along six dimensions: Nuisance, physical harm, property damage, drugs, fraud, crimes against social order

Used 4,897 records

Dataset (cont’d)

Original Offense

Burglary

Armed Robbery

Homicide

Other Violent

OffensesNarcotics

Forgery

Other

Final Dataset

Purged

Usable

DiedUnusable

Problems With Data

Dichotomous dependent variable for behavior?

Purging = potential bias Done after age 70 OR When 10 years arrest-free

No record of out-of-state crimes

Philosophical Problems

Metric for success False positives: 30,000 arrests could

have been prevented! False negatives: 1,413 people jailed

unnecessarily… Reduced crime could have to do with

repentance, increased policing, age, etc. and not with incapacitation at all

Data Pre-processing Only used records where had both

1962 and 1988 data Priors: # of previous convictions

weighted by severity of crime PriorsP: # of previous periods of

incarceration weighted by length Inst_(M,P,V,F,etc.): # of arrests

weighted by severity of crime in each of six categories

# of Arrests to Desistance (R^2 = .159)

Predictor Regression Coeff Standardized Reg. Coeff

T

Priors 1.115 .270 11.02

Age -.104 -.144 -6.39

Drugs -2.155 -.154 -7.94

Serious -.015 -.058 -2.92

Free -.899 -.062 -3.18

PriorsP -.413 -.085 -2.37

Type -.706 -.05 -2.31

Alias .343 .046 2.31

# of Arrests to Desistance (Violent Crimes Only—n=1,998)

Predictor Regression Coeff Standardized Reg. Coeff

T

Priors -.022 -.174 -7.85

Age .134 .184 7.45

InstP .253 .076 3.35

PriorsP -.066 -.077 -2.91

R^2 = .061; p<.05

What Next? Multiple Linear Regression

Try using different things as class—nuisance only, arrest rate, crime-free time

Try different predictors—have 119 variables

BUT No reason to believe predictors are

linearly independent No reason to believe non-linear

correlation

What Next?

Better technique: Decision trees “White Box” model mimics human

decisionmaking Use some kind of feature-selection

algorithm? Maybe ensemble learning, once

feature-selection is in place?

Acknowledgements

Trevor Gardner, UC Berkeley Don Gottfredson, Rutgers

University Bureau of Criminal Statistics

Documents

Incapacitation, Recidivism and Predicting Behavior Easha Anand Intro. To Data Mining April 24, 2007