Scott Crouch, Mark43 // Reimagining Police Software

Preview:

Citation preview

IDENTIFYING DUPLICATE PEOPLE IN LAW ENFORCEMENT RECORDS

mark43

scott crouchco-founder & ceo

ej bensingsoftware engineer

mark43

A LITTLE BIT ABOUT US

founded

mission

help our first responders fight violent crime

funded by

2012

mark43

THE PROBLEM IN LAW ENFORCEMENT SOFTWARE

mark43

HUGE FRAGMENTATION ISSUES

18,000+ U.S. police departments

most software currently on premise

data incompatibility issues

mark43

WHAT WE’RE BUILDING

cloud-based records management, and analysis platform

mark43

A TYPICAL ARREST

domestic violence - aggravated assault

1 suspect, 1 victim

1 gun recovered

avg. # of fields

344

mark43

THE DATA ISSUE

for each person that is arrested

fields collected

85-150

NUMEROUS DUPLICATES OF PEOPLE

no universal master record of people

mark43

IS THERE A FEDERAL SOLUTION?

national warrant/wanted database

NCIC

master fingerprint ID system

IAFIS

all administered by the FBI, but no master person records

mark43

WHAT CAN WE DO ABOUT THIS?

mark43

WORKING WITH THE DATA

Washington, D.C. Metropolitan Police Dept.

20,000,000 reports

5,000,000 people

we had to import

mark43

DIFFICULTIES

names are not unique identifiers

data is very siloed

cannot legally, automatically merge people

mark43

WHAT TO DO?

87.7% of Americans can be correctly identified by DOB, Gender, and Location1

K-Anonymity leads to a Quasi-Identifier2

L. Sweeney, “Simple Demographics Often Identify People Uniquely.” Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh, 2000.

L. Sweeney, “k-anonymity, a model for protecting privacy” International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 2002; 557-570.

mark43

OUR APPROACH

quasi-identifiers to create groups of “likely duplicate” records

generic enough to work on other datasets (property, vehicles, etc.)

string matching

mark43

RESULTS

sample data set

accuracy of our first try?

5,000 people

2,000 known duplicates

80% of duplicates correctly identified

mark43

MOVING FORWARD

incorporating

dealing with rollbacks and versioning

real-time recommendations

officer feedbackcorrect and incorrect matched data

handlingpropertyevidencevehicleslocations