Fundamentalsof Crime Mapping 8

Fundamentals of Crime Mapping

A Brief Review of Statistics

Understand the difference between qualitative and quantitative data.

Define and explain levels of measurement including nominal, ordinal, interval, and ratio.

Understand the difference between discrete and continuous variables.

Understand descriptive statistics, including typical measures of central tendency and dispersion.

Understand inferential statistics, including typical tests of significance and measures of association.

Understand what a regression model is and how it works. Understand the limitations of statistics and how their

improper application can yield misleading results. Define and explain classification in crime mapping and be

able to identify strengths and weaknesses of each method.

Objectives

Qualitative◦ Yields narrative-oriented information

Park, Blue, Yes, Tall, Short, etc Quantitative

◦ Produces number-oriented information Key Factors or “Variables”

Types of Research/Data

Ratio◦ Highest level◦ Can be reclassified to any of the other

levels◦ - ∞ to + ∞

Interval◦ Precise value of a measure is known

and thus can also be ranked◦ 1,2,3,4,5,6,7,8,9,10

Ordinal◦ Rank order nominal data and order

can be important◦ Officer, Sergeant, Lt, Commander,

Major, Chief

Nominal◦ Male, Female

Types of Data

Nominal◦ Dichotomous

African American Caucasian Hispanic Native American Asian Other

Types of Data

Caucasian Non-Caucasian

Must be mutually exclusive and exhaustive

Nominal DataOrder not important

Ordinal◦ Categorical or numerical

data that can be ranked, but the precise value is not known

Likert scale example

Types of DataTraits, concepts,

and ideas in criminal justicecan be difficult

to operationalize,

or measure.

What is your annual household income?1. Less than $20,0002. Between $20,000 and $40,0003. Between $40,001 and $60,0004. Between $60,001 and $80,0005. More than $80,000

I feel safe walking in my neighborhood alone at night1 -Strongly agree2 – Agree3 – Neutral4 – Disagree5 - Strongly disagree6 - Don’t know

Validity◦ A variable accurately

reflects the trait or concept it is measuring

Reliability◦ The measure is

representative consistently across people, places, and time

Types of Data

Interval◦ What is your annual

household income? __________________ Ranking possible and

precise value known 112 burglaries occurred in

beat 32

Types of Data

Ratio◦ Treated the same as

interval data 112.23 burglaries occurred

on average in beat 32 Can we have .23 of a

burglary?

Types of Data

$16095.32$17262.67$24262.78$26095.32$27262.67$32262.78$33095.32$35262.67$36262.78$36095.32$40262.67$41262.78$52095.32$55262.67$68262.78

Types of Data

Ratio$16095.00$17262.00$24262.00$26095.00$27262.00$32262.00$33095.00$35262.00$36262.00$36095.00$40262.00$41262.00$52095.00$55262.00$68262.00

Interval$0 - $25,000$25,001 - $35,000$35,001 - $45,000$45,001 - $55,000$55,001 - $65,000Over $65,000

OrdinalBelow $35,000Over $35,000

Nominal

Frequency Distributions

Discrete◦ Variables that cannot

be subdivided The number of persons

living in a household is a discrete variable. For example, there cannot be 2.3 persons living in a household. There can be 2, or there can be 3, but not 2.3.

Types of Data

Continuous Can be subdivided—

theoretically they can be subdivided an infinite number of times.

Time for example Days, Hrs, Mins, Secs,

Nanosecs, etc.

Rates◦ Violent crimes per

100,000 population Violent Crimes /

(Population/100000) = Rate

Types of Data

Ratios Violent Crimes “per”

Property crime Violent crimes = 10 Property crimes = 300 PC/VC (300/10)=30 For every one violent crime,

there are 30 property crimes

Percent Change◦ For comparing time

periods ((New-Old)/Old) *100 2009 property crimes =2567 2008 property crimes = 2655 Percent change=

(2567-2655)/2655 or -0.033 * 100 = -3.3%

Types of Data

Measures of Central Tendency◦ Mean or Average

Average of a distribution of values

◦ Mode Most often found value in a

distribution

◦ Median The middle value in a

distribution

Descriptive Statistics

25555665728282849097

Exam Scores

Count = 10Average = 70.8

Mode = 82Median = 77

Median = 82-72= 10/2= 72+5

Bi-Modal


25555565728282849097

Exam Scores

Count = 10Average = 70.7Mode = 55 & 82

Median = 77

Median = 82-72= 10/2= 72+5

Mean◦ Should not be used

when distribution is greatly “skewed” As with most crime data

◦ Use Median where it makes sense instead


Positive or Right Skewed

Almost normal

Negative or Left Skewed

Measures of Variance or Dispersion◦ Range

The distance between the lowest and highest score

◦ Interquartile range The distance between the

25th and 75th percentile

◦ Variance The average squared

distance of each score in a distribution from the mean of the distribution

◦ Standard deviation The average distance of

each score from the mean


25555565728282849097

Exam Scores

Range = 72Interquartile Range = 26

Variance = 456.9Standard deviation = 21.4

1st Quartile = 57.5

3rd Quartile = 83.5

26

Mean CenterMapping

Measures of Variance or Dispersion◦ Range

The distance between the lowest and highest score

◦ Interquartile range The distance between the

25th and 75th percentile

◦ Variance The average squared

distance of each score in a distribution from the mean of the distribution

◦ Standard deviation The average distance of

each score from the mean


Standard DeviationMapping

Sample Analyzed and “infer” information to the population◦ Probability theory

The number of times any given outcome will occur if the event is repeated many times.

Inferential Statistics

Bell-Shaped or Normal Curve


Mode & Median same as Mean

Histogram◦ Normal◦ Skewed


Average 26.20Median 30Mode 40

Average 20Median 20

Mode 20

Average 13.6Median 10

Mode 1

What variables are available? What is the overall n? What is the unit of analysis? What do I want to know about the variable(s)? What is the level of measurement of the

variable(s)? Are the variables discrete or continuous? How many groups will be compared in the

analysis? Am I interested in just describing the data or

finding inferences within it?

Questions to Ask Yourself…

Independent variable◦ The variable that analysts are trying to explain

(in crime mapping, the dependent variable is often some crime measure).

Dependent variable◦ Variables that produce a change in our dependent

variable

Variables for our Stats..

Casual relationship◦ Intervening variable◦ Antecedent variable◦ Contingent variable◦ Multicollinearity

When X, Y, and Z have overlapping measures of the same concept

◦ Spurious relationships When X and Y have no direct relationship but are both

affected by Z

Variables for our Stats..X

Z Y

Multicollinearity

Chi-square T-tests Z-tests ANOVA

◦ Essentially, they work by determining whether or not variable distributions or differences between groups or areas would be expected based on random chance

Test of significance

Lambda Gamma Kendall’s tau statistics Spearman’s rho Pearson’s correlation coefficient

◦ To determine the strength and direction of a relationship between two variables

◦ Values between -1 and +1◦ Inverse/negative or positive relationships possible

Measures of Association

Variable 1 Variable 2

Positive

Variable 1 Variable 2

Inverse

Spatial Autocorrelation◦ Moran’s I

A value between 0 and 1 indicates positive spatial autocorrelation (or clustering).

A value between 1 and 0 indicates negative spatial autocorrelation (random distribution).

◦ Geary’s C Values under 1 signify positive spatial autocorrelation Values over 1 designate negative spatial autocorrelation

Spatial Measures

Linear relationship◦ (OLS) Ordinary least-squares

Y =a + b1 X1 + b2 X2 + b3 X3 …

◦ Units of analysis Must be the same

Regression models

ArcGIS Data Classification

Capabilities

Polygons Nominal (categories), Ordinal, Interval and

Ratio (Quantities) can be used with different methods

Fills and outlinesNominal data

example

Ratio Data Example

Symbology

Category data symbology comes next

It displays data by unique values of a field, or multiple fields

Nominal, ordinal, ratio or interval data

Symbology

Next, comes the quantities

symbology method

It uses a number field in the table

to display data by classified values

Ratio and interval data

Quantities Classifications

Six different ways to classify data, with an added manual method for infinite freedom

Classification Methods

Equal Interval Defined Interval Quantile Natural Breaks Geometrical Interval Standard Deviation

Types of Data Categorical (Qualitative)

◦ Grouping based on some quality◦ Labels or categories◦ E.g.; Sex = Male or Female◦ Nominal or Ordinal

Nominal the order is not important E.g.: Sex = male or female

Ordinal the order is important E.g.; Rank = Officer, Sergeant, Lieutenant, etc

◦ Can be binary or non-binary Binary = only two values (male or female) Non-Binary = More than two (red, blonde, brunette,

etc)

Types of Data Measurement (Quantitative)

◦ Grouping based on some quantity or value◦ Always numbers◦ Discrete or continuous

Discrete = only certain values are possible and data could have gaps (1, 2, 3, or 4)

Continuous = Any value along some interval (any value between 1 and 4 (ie: 3.24211)

◦ Interval or ratio In interval data the interval between values is

important (ie; temperature of 30 compared to 110 means something)

Ratio data is the best, and the “0” value can be informative (ie; a grid can have 0 crimes, or any value up to infinity)

Great Website to Explain Research and Data Types

http://www.socialresearchmethods.net/kb/index.php



Classification Methods

Equal Interval (ratio, Interval)◦ The range between the classifications is

the same

Take thehigh value-low value and for each of the 5 classes, the

value is 199.61

Number of classes desired

determines interval

Classification Methods Defined Interval (ratio, interval)

◦ Similar to the equal interval, but here, we define what the interval will be and thus establish the classes

In this case the interval was set to 150, and so the number of

classes is determined by

the interval

Classification Methods Quantile (ratio, interval)

◦ A percentage of the values in the class falling with the range. Each class contains an equal number of features.

Each of the 10 classes has the same number of features within each class, or makes up 10%

of the total records

Classification Methods Natural Breaks (ratio, interval)

◦ Breaks the data where there are natural holes between values

Use test exam score example

Classification Methods Geometrical Interval (ratio, interval)

◦ This is a classification scheme where the class breaks are based on class intervals that have a geometrical series. This ensures that each class range has approximately the same number of values with each class and that the change between intervals is fairly consistent.

The interval is determined by a

geometric equation (large

and small changes

depending on breaks in data)

Classification Methods Standard Deviation (ratio, interval)

◦ Classes are determined by mean and standard deviation of values. Can display by 1, ½, ¼ standard deviations as needed

Getting to know your data, and the factors that influence crime can help analysts create more useful maps and analysis products and do problem solving

Handling data properly will keep your from making incorrect assumptions and coming to unrealistic conclusions

Remember the wheel of science

Conclusions

Technology

Fundamentalsof Crime Mapping 8