HunchLab 2.0 Predictive Missions: Under the Hood

340 N 12th St, Suite 402 Philadelphia, PA 19107

215.925.2600 [email protected]

www.hunchlab.com

Missions: Under the Hood

Amelia Longo Business Development Associate [email protected] 215.701.7715

Jeremy Heffner HunchLab Product Manager [email protected] 215.701.7712

Places

People

Patterns } Prioritization

Predictive Missions

It’s the fourth Tuesday in January and school is in session. There were 3 burglaries and 2 robberies yesterday. Six bars, three take-out stores, and a school are in the neighborhood. The forecast is 17° with cloudy skies. Where do you focus your 2 vehicles?

How would you do it?

Analyst Process

•  Identify relevant factors –  Training / Literature –  Experience

•  Use heuristics –  high concentration of past crime è higher risk –  near a bar on a Friday night è higher risk –  near the police station è lower risk –  concentration of ex-offenders è higher risk –  near transit stops è higher risk

?

How HunchLab Works

A computer system designed to learn how to accomplish a task by using historic data sets. There are different ways (algorithms) to accomplish this training process.

term: machine learning

The step-by-step procedure to accomplish a given calculation. Different algorithms have different qualities. Algorithms are used to train a machine learning model.

term: algorithm

Overall Process

1.  Generate training examples of outcomes

2.  Enrich with relevant variables

3.  Build models

4.  Evaluate accuracy

5.  Select best performing model

Generate Examples

~ 500 ft cells & 1+ hour time slices

Data Volume

•  Space –  Lincoln, NE is 90 sq miles –  500 ft cell size creates 12,000 cells

•  Time –  3 years of data –  1 hour resolution –  26,000 hour blocks

•  Space x Time –  312,000,000 hour block cells (examples)

Data Volume

•  Space –  Lincoln, NE is 90 sq miles –  500 ft cell size creates 12,000 cells

•  Time –  3 years of data –  1 hour resolution –  26,000 hour blocks

•  Space x Time –  312,000,000 hour block cells (examples)

•  Sampling FTW! –  Outcomes are sparse (small % of examples have crimes) –  Sampling strategy preserves crime events

Representing Crime Theories

•  Crime predictions based on: –  Baseline crime levels

•  Similar to traditional hotspot maps

–  Near repeat patterns •  Event recency (contagion)

–  Risk Terrain Modeling •  Proximity and density of geographic features •  Points, Lines, Polygons (bars, bus stops, etc.)

–  Collective Efficacy •  Socioeconomic indicators (poverty, unemployment, etc.)

Predictive Missions

•  Crime predictions based on: –  Routine Activity Theory

•  Offender: proximity and concentration of known offenders •  Guardianship: police presence (AVL / GPS) •  Targets: measures of exposure (population, parcels, vehicles)

–  Temporal cycles •  Seasonality, time of month, day of week, time of day

–  Recurring temporal events •  Holidays, sporting events, etc.

–  Weather •  Temperature, precipitation

Predictive Missions

Representing Crime Theories Risk Terrain Modeling

Gun shoo)ngs example Source: Rutgers, h8p://www.rutgerscps.org/rtm/irvrtmgoogearth.htm

crimes prior7 prior364 dayssincelast bardist dow

0 0 0 365 >2000ft Monday

0 0 1 234 >2000ft Monday

1 1 3 3 750ft Tuesday

0 0 2 43 500ft Wednesday

2 0 2 74 500ft Friday

Representing Crime Theories Aoristic Analysis

crimes probability

0 0

1 a

2 b

3 c

4 d

crimes weights prior7 prior364 dayssincelast bardist dow

0 1 0 0 365 >2000ft Monday

0 1 0 1 234 >2000ft Monday

0 0.5 1 3 3 750ft Tuesday

1 0.5 1 3 3 750ft Tuesday

0 0 0 2 43 500ft Wednesday

0 0.13 0 2 74 500ft Friday

1 0.32 0 2 74 500ft Friday

2 0.55 0 2 74 500ft Friday

Building Models

Models

•  Baseline –  Baseline models (6)

•  Counts –  28 day –  56 day –  364 day

•  Kernel Densities –  28 day –  56 day –  364 day

–  HunchLab models •  Variations of a stacked ensemble:

–  examples è gradient boosting machine (gbm) è y/n probabilities

–  y/n probabilities è generalized additive model (gam) è counts

A machine learning algorithm that recursively partitions a data set based upon variable values forming a tree-like structure.

term: decision tree

crimes prior7 prior364 dayssincelast bardist dow

0 0 0 365 >2000ft Monday

0 0 1 234 >2000ft Monday

1 1 3 3 750ft Tuesday

0 0 2 43 500ft Wednesday

2 0 2 74 500ft Friday

A machine learning algorithm that uses a series of weaker models (typically decision trees) that are trained upon the residuals of prior iterations (boosting) to form one stronger model.

term: gradient boosting machine (GBM)

Build Decision Tree 1

Predict with 1

Calculate errors

1 Build Decision Tree 2

Predict with 1 & 2

Calculate errors

2 Build Decision Tree 3

Predict with 1-3

Calculate errors

3 …

A regression model that fits smoothed functions to the input variables. Compare to a generalized linear model which fits just a single coefficient to each variable.

term: generalized additive model (GAM)

HunchLab Model Building

1.  Build a GBM –  examples è gradient boosting machine è y/n probabilities

312 million

4 million

1 mil 1 mil 1 mil 1 mil

Sampling

4 folds

GBM

}

1 mil

Evaluate

43

200

312 million

4 million

Sampling

GBM 43


1.  Build a GBM –  examples è gradient boosting machine è y/n probabilities

•  Segment examples into several folds –  For each fold build a GBM model on the rest of the data –  For each iteration in the GBMs:

»  Randomly sample a portion of the data (stochastic) »  Adjust weights of observations (adaptive boosting)

•  Determine how many iterations result in the most accurate model •  Build a GBM on all of the data for that many iterations


2.  Build a GAM –  y/n probabilities è generalized additive model è counts

•  Transforms (“bends”) GBM output into counts •  Calibrates count levels with other key variables

Example

Lincoln NE

Lincoln Assaults

Lincoln Assaults

Lincoln Assaults

Lincoln Assaults

Lincoln Assaults

Selecting Models

Selecting Models

1.  Build models holding out last 28 days of data

2.  Score each model

–  Combine different metrics into a selection score

3.  Select best score

4.  Rebuild the best model (including last 28 days data)

Cells ranked highest to lowest

A map represented as a grid of cells

0% 100%

Crime Location

Cells ranked highest to lowest

0% 100%

Percent of Patrol Area to Capture All Crimes

Average Crime Rank

0%

50%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Percent of Crimes Captured vs. Percent of Patrol Area

0% 20% 40% 60% 80% 100%

Assault

Burglary

MVT

Ra

pe

Robb

ery

Percent of Patrol Area to Capture All Crimes

0

0.1

0.2

0.3

0.4

0.5

0.6

Assault Burglary MVT Rape Robbery

Average Crime Rank

0

0.2

0.4

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

Perc

ent

of C

rime

s C

ap

ture

d

Percent of Land Area

Theft of Motor Vehicle

Overall Process

1.  Generate training examples of outcomes

2.  Enrich with relevant variables

3.  Build models

4.  Evaluate accuracy

5.  Select best performing model

Our Solution •  Learns from several years of your data

•  Automatically determines which theories apply

–  more than just crime data

•  Prevents over-fitting

•  Calibrates predictions

•  Selects a model based upon a blind evaluation

–  prioritization and count-based metrics

Our Solution •  Learns from several years of your data

•  Automatically determines which theories apply

–  more than just crime data

•  Prevents over-fitting

•  Calibrates predictions

•  Selects a model based upon a blind evaluation

–  prioritization and count-based metrics

•  But it still cannot make your morning coffee

Additional Information •  How did HunchLab originate?

•  How does HunchLab represent crime theories?

•  What data is needed?

•  How does the modeling work specifically?

Questions



www.hunchlab.com



www.hunchlab.com

Amelia Longo Business Development Associate [email protected] 215.701.7715

Jeremy Heffner HunchLab Product Manager [email protected] 215.701.7712

Technology

HunchLab 2.0 Predictive Missions: Under the Hood