48
Criminal Incident Data Association Using OLAP Technology Donald E. Brown & Song Lin Department of Systems & Information Engineering University of Virginia

Criminal Incident Data Association Using OLAP Technology

  • Upload
    tommy96

  • View
    556

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Criminal Incident Data Association Using OLAP Technology

Criminal Incident Data Association Using OLAP Technology

Donald E. Brown & Song LinDepartment of Systems & Information

EngineeringUniversity of Virginia

Page 2: Criminal Incident Data Association Using OLAP Technology

Summary

In this paper, we combine OLAP (Online Analytical Processing) and data mining to associate criminal incidents.This method is tested with a robbery dataset from Richmond, Virginia

Page 3: Criminal Incident Data Association Using OLAP Technology

Objectives of Spatial Knowledge MiningLeverage DBMS (records management), OLAP, & GISFind spatial-temporal patterns and relationships in dataSupport crime analysis & information sharing

Page 4: Criminal Incident Data Association Using OLAP Technology

Related Applications - UVa

ReCAP Regional Crime Analysis Program Provides support for regional analysis using RDBMS Requires implementation on each client computer

CARV Crime Analysis and Reporting in Virginia Runs on Citrix Metaframe, so the number of concurrent

users is limited

GRASP Geospatial Repository for Analysis and Safety Planning Web interface for a central repository of criminal incident

data and geospatial files

Page 5: Criminal Incident Data Association Using OLAP Technology

Outline

IntroductionExisting studies on OLAP & data miningCombined approachApplicationConclusions

Page 6: Criminal Incident Data Association Using OLAP Technology

Introduction (crime association)

80-20 rule: 20% of the criminals commit 80% of the crimesHow can we link criminal incidents committed by the same criminal?Start by looking at the same crime types

Page 7: Criminal Incident Data Association Using OLAP Technology

Theories of criminal behavior (criminology)

Rational choice (Clarke and Cornish) Criminals evaluate “benefit” and

“risk”, make rational decisions to maximize “profit”.

Routine activity (Felson) A ready criminal Suitable target Lack of effective guardian

Page 8: Criminal Incident Data Association Using OLAP Technology

Theories of criminal behavior (template)

“Template” (Brantingham & Brantingham) Environment sends out cues about its

characteristics Criminals use cues to evaluate Template is built to associate certain cues

with suitable targets Template is self-reinforcing and enduring A criminal does not have many templates

Page 9: Criminal Incident Data Association Using OLAP Technology

An operational approach to the theories (template)

Criminal incidents committed by the same person Similar patterns in time Similar patterns in space Similar patterns in MO

It is possible to associate incidents from the same person by discovering these patterns

Page 10: Criminal Incident Data Association Using OLAP Technology

Existing Association Methods & Systems

AREST (Badiru et al.) Suspect matching

ViCAP (FBI) Incident matching

COPLINK (U. Arizona) Link search terms with cases (concept

space)

Page 11: Criminal Incident Data Association Using OLAP Technology

Existing Association Methods & Systems

TSM (Brown) Total similarity measures Could be used for both incidents and

suspects matching

SQL Used by analysts in practice

Page 12: Criminal Incident Data Association Using OLAP Technology

Comments on existing methods

Computer technologies are central to criminal incident associationFor example MIS Databases Information Retrieval GIS

Page 13: Criminal Incident Data Association Using OLAP Technology

Comments on existing methods

Two additional techniques that enable incident association Data Warehousing / OLAP Data Mining

We develop a method thatseamlessly integrates OLAP and data

mining.

Page 14: Criminal Incident Data Association Using OLAP Technology

Related Work on OLAP and data mining

OLAP Ancestor: OLTP (transactional data) OLAP: (summary data for analysis) Dimension:

OLAP data is multidimensional Dimension: numeric or categorical

attributes Hierarchical structures exist in dimensions

Aggregates: Sum, count, average, max, min, …

Page 15: Criminal Incident Data Association Using OLAP Technology

OLAP and Data Mining

Both of them are powerful tools to support decision making process, but OLAP focus on efficiency, few

quantitative analysis methods are used Data mining is typically for 2-D dataset

(spreadsheets), not for multidimensional OLAP data structures

Idea: combine them

Page 16: Criminal Incident Data Association Using OLAP Technology

Existing studies on combining OLAP and Data mining

Cubegrade Problem (Imielinski) Generalized version of association

rule Association rule: change of “count”

aggregate imposing another constraint, or perform a “drill-down” operation

Other aggregates could also be considered

Page 17: Criminal Incident Data Association Using OLAP Technology

Existing studies on combining OLAP and Data mining

Constrained Gradient Analysis Retrieve pairs of OLAP cells

Quite different in aggregates Similar in dimension (parents, children,

siblings) More than one aggregate could be

considered simultaneously (e.g., sum and mean).

Page 18: Criminal Incident Data Association Using OLAP Technology

Existing studies on combining OLAP and Data mining

Data driven exploration (Sarawagi) Find “exceptions” Mean and STD are calculated for a

cell If the aggregate of the cell is outside

the (-2.5, +2.5) exception OLAP version of “3” rule

Page 19: Criminal Incident Data Association Using OLAP Technology

Associating records by finding distinctive values or outliers

Basic idea If a group of records have common characteristics, and

these “common” characteristics are unusual or “outliers”, we are more confident in asserting that these records come from the same causal mechanism.

Look for distinctive characteristics – the best would be DNA

Page 20: Criminal Incident Data Association Using OLAP Technology

OLAP-outlier-based method to associate records

Rationale for distinctive values or outliers Weapon used in robberies “gun” – very common, hard to associate “Japanese sword” – distinctive, come from

the same person

We build an outlier score function to measure this “distinctiveness”, Higher score more distinctive more

confident to associate It is for categorical attributes (MO is

important in linking criminal incidents)

Page 21: Criminal Incident Data Association Using OLAP Technology

Definitions

Cell, Parent, Neighbor Cell: a vector of values for some

attributes. Parent: replace one attribute of the

cell with wildcard element “*”. Neighbor: A group of cells having the

same Parent.

Derive from OLAP field

Page 22: Criminal Incident Data Association Using OLAP Technology

Illustration -- Cell

Dimension 1

Dimension 2

a1 a4a3a2

b1

b2

b4

b3

Two-Dimension Cell

(a 4,b 2)

One-Dimension Cell

(*,b 4)

Page 23: Criminal Incident Data Association Using OLAP Technology

Illustration --parent

a1 a2 a4 a5a3

b4

b3

b2

b1

Cell (a5,b3) has two parents: (a5, *) and (*,b3)

Page 24: Criminal Incident Data Association Using OLAP Technology

Illustration -- Neighbor

Neighbor is a collection of cells sharing the same parent

Page 25: Criminal Incident Data Association Using OLAP Technology

Outlier Score Function

We start building this function from one dimension, and then we generalize to higher dimensions.For one dimension, we have the following two observations. Values with small probability

(frequency) are more “unusual” Outlier score is high when the

uncertainty level is low.

Page 26: Criminal Incident Data Association Using OLAP Technology

Observation I

Blond Brown Black Red Gray

HairColor

0

10

20

30

40

50

Cou

nt

P=0.1Outlier

For attribute “color”, value “blond” covers 10% of the records. Hence, it should get a higher outlier score.

Page 27: Criminal Incident Data Association Using OLAP Technology

Observation II

Blond Brown Black Red Gray

HairColor

0

20

40

60

80

Count

Blond Brown

HairColor

0

20

40

60

80

Coun

t

Although both of them have frequency=0.2, the left one is more “unusual”, because the uncertainty level is low.

Page 28: Criminal Incident Data Association Using OLAP Technology

Observation III

“more evidence” More evidence is better than less

higher outlier score

Page 29: Criminal Incident Data Association Using OLAP Technology

OSF for One Dimension

-log(p) comes from information theory, where p is the probability of a valueEntropy measures the information in a message (in this case, in a data record)

Entropy

pOSF

)log(

Page 30: Criminal Incident Data Association Using OLAP Technology

OSF for Higher Dimensions

For any cell, calculate the sum of the OSF of its parent cell and the OSF conditional on the neighbor of this cell. (one-dimension OSF)Do this calculation for all parent cells.Take the maximum as the outlier score for this cell.

)(*,*,...,*0

))(

))(log()),(((max

)(c

cofneighborkEntropy

cfrequencykcparentf

cf th

Page 31: Criminal Incident Data Association Using OLAP Technology

Association (using this OLAP-outlier method)

For a pair of incidents (A,B) If there is a cell that contains both A

and B And the outlier score of this cell is

large enough (threshold test) Associate them

Page 32: Criminal Incident Data Association Using OLAP Technology

Application (dataset)

Applied to a robbery dataset (Richmond, VA, 1998) Why robbery?

For evaluation purpose # of multiple offenses > murder # of known suspects > B & E

Page 33: Criminal Incident Data Association Using OLAP Technology

Attributes

Three attributes Modus Operandi -- categorical Census Features -- numeric Distance Features – numeric

Page 34: Criminal Incident Data Association Using OLAP Technology

Feature Selection

Redundant features feature selection Cluster features (similar features in the

same group) Pick a representative feature for each

group Method: k-medoid clustering

Applicable to distance matrix Return “medoids”

Page 35: Criminal Incident Data Association Using OLAP Technology

Feature Selection Result

Component 1

Co

mp

on

en

t 2

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

-0.6

-0.4

-0.2

0.0

0.2

0.4

These two components explain 44.25 % of the point variability.

Medoids -- 1 : HUNT 2 : ENRL3 3 : TRANS.PC

Page 36: Criminal Incident Data Association Using OLAP Technology

Final Selected Features

Medoids HUNT (housing unit density) ENRL3 (public school enrollment)

POP3 (population:12-17) more meaningful (attacker and victims)

TRAN_PC (transportation expense per capita) MHINC (median income)

Page 37: Criminal Incident Data Association Using OLAP Technology

Discretize

Discretize these numeric features into bins Similar to histogram Sturges’ number of bins rule

Page 38: Criminal Incident Data Association Using OLAP Technology

Evaluation

For incidents with known suspects (170) Generate all incident pairs If a pair of incidents have the same

criminal suspect, then “true association”

Compare results given by the algorithm with the “true result”

Page 39: Criminal Incident Data Association Using OLAP Technology

Evaluation Criteria

Two measures Detected true associations

Larger is better Average number of relevant records

Similar to search engines like “google” Given one record, system return a list Take the average of the length of all lists Shorter is better.

Page 40: Criminal Incident Data Association Using OLAP Technology

Evaluation Criteria (cont.)

From information retrieval Recall: ability to provide relevant

items Precision: ability to provide only

relevant items

1st measure is “recall”; 2nd is equivalent to “precision”2nd also measures the user effort (in further investigation)

Page 41: Criminal Incident Data Association Using OLAP Technology

Result (OLAP-outlier based)

Threshold Detected true associations

Avg. number of relevant records

0 33 169.00 1 32 121.04 2 30 62.54 3 23 28.38 4 18 13.96 5 16 7.51 6 8 4.25 7 2 2.29 0 0.00

Page 42: Criminal Incident Data Association Using OLAP Technology

Result of binary association method (calculating similarity score)

Threshold Detected true associations Avg. number of relevant records 0 33 169.00

0.5 33 112.98 0.6 25 80.05 0.7 15 45.52 0.8 7 19.38 0.9 0 3.97 0 0.00

Page 43: Criminal Incident Data Association Using OLAP Technology

Comparison Outlier vs. Binary

0

5

10

15

20

25

30

35

0 20 40 60 80 100 120 140 160 180

Avg. relevant records

Similarity

Outlier

Page 44: Criminal Incident Data Association Using OLAP Technology

Comparison (cont.)Generally, the curve of our method lies above the other one Given the same accuracy level, this method

returns less records Keep the same “length” of the list, this

method is more accurate

The other method is better at the tail However, that means the average number of

relevant records is > 100 Given the size is 170, no analyst would

investigate 100 incidents.

Generally, the new method is effective.

Page 45: Criminal Incident Data Association Using OLAP Technology

Comparison(Outlier vs. Simple Combination)

0

5

10

15

20

25

30

35

0 50 100 150 200

Similarity

Outlier

Combine

Page 46: Criminal Incident Data Association Using OLAP Technology

WebCAT Implementation

A secure web environment that can read several data formats, translate them into a uniform standard (XML)Uses free, open-source technology ASP, XML, MapServer, SVG, etc.

Provides tools to meet spatial and statistical analysis needs, to include associationProvides utilities for querying and reporting

Page 47: Criminal Incident Data Association Using OLAP Technology

Conclusions

Developed a new data association method for linking criminal incidents that combines Concepts in OLAP (multidimensional) Ideas in data mining (outlier detection)

Testing with a robbery dataset shows promiseDeployment through WebCAT provides open source (XML-based) capability for data access and analysis over the web

Page 48: Criminal Incident Data Association Using OLAP Technology

Questions?