Upload
ferdinand-stevenson
View
213
Download
1
Embed Size (px)
Citation preview
Challenges in Collecting Challenges in Collecting Police-Reported Crime Police-Reported Crime
DataData
Colin BabyakColin Babyak
Household Survey Methods Household Survey Methods DivisionDivision
ICES III - Montreal – June 20, 2007ICES III - Montreal – June 20, 2007
OverviewOverview
Structure of the Uniform Crime Structure of the Uniform Crime Reporting Survey (UCR)Reporting Survey (UCR)
UCR vs. a “typical” business surveyUCR vs. a “typical” business survey Data qualityData quality Recent developmentsRecent developments Future workFuture work
Structure of the UCRStructure of the UCR
2 versions of the survey2 versions of the survey Microdata (94%)Microdata (94%) Aggregate data (6%)Aggregate data (6%)
~1200 respondents (police services)~1200 respondents (police services) Extraction of “administrative” dataExtraction of “administrative” data 4 different vendors for extraction4 different vendors for extraction
Some respondents build their own Some respondents build their own systemsystem
Structure of the UCRStructure of the UCR
Receive information on:Receive information on: IncidentIncident AccusedAccused VictimsVictims
Monthly submissionsMonthly submissions Monthly edit reportsMonthly edit reports Monthly correctionsMonthly corrections All statistics are annualAll statistics are annual
UCR vs. a “typical” business surveyUCR vs. a “typical” business surveySimilarities:Similarities:
Population is skewed Population is skewed Most respondents are small in sizeMost respondents are small in size
Frame is well-established, good Frame is well-established, good qualityquality
Regular, personal contact with the Regular, personal contact with the largest respondentslargest respondents
Respondent data relatively Respondent data relatively consistent over timeconsistent over time
UCR vs. a “typical” business surveyUCR vs. a “typical” business surveyDifferences:Differences:
UCRUCR CensusCensus Extract admin. dataExtract admin. data Respondents can be Respondents can be
recontacted re: recontacted re: errorserrors
Data released at Data released at respondent levelrespondent level
Multiple records per Multiple records per respondentrespondent
““Typical” surveyTypical” survey SampleSample QuestionnaireQuestionnaire Respondents Respondents
usually not usually not recontactedrecontacted
Data released at Data released at aggregate levelaggregate level
One record per One record per respondentrespondent
UCR vs. a “typical” business surveyUCR vs. a “typical” business surveyImpact of differences:Impact of differences:
We cannot “treat” respondent errors We cannot “treat” respondent errors without their consentwithout their consent
Non-respondents need to be Non-respondents need to be consulted and “sign off” on their dataconsulted and “sign off” on their data
Very difficult to determine a Very difficult to determine a response rateresponse rate
UCR vs. a “typical” business surveyUCR vs. a “typical” business surveyImpact of differences (cont):Impact of differences (cont):
Collecting new information is Collecting new information is difficult:difficult: Years to implementYears to implement Vendors do not update immediatelyVendors do not update immediately Respondents do not update immediatelyRespondents do not update immediately In-house do not re-program immediatelyIn-house do not re-program immediately Recent additions include:Recent additions include:
Cybercrime, Hate Crime, Organized Crime, Cybercrime, Hate Crime, Organized Crime, Geocoding, FPS NumberGeocoding, FPS Number
Data QualityData Quality
Monthly edit reportsMonthly edit reports 6-month review of aggregate data6-month review of aggregate data Outlier detection of aggregate dataOutlier detection of aggregate data Year end sign-off of data for major Year end sign-off of data for major
respondentsrespondents Analyze distributions of key variablesAnalyze distributions of key variables
Recent Methodological Recent Methodological DevelopmentsDevelopments
Analysis of new variablesAnalysis of new variables Spatial modelingSpatial modeling Correction ratesCorrection rates Record linkage projectsRecord linkage projects Time series imputationTime series imputation Key variable distribution analysisKey variable distribution analysis
Recent DevelopmentsRecent Developments
New VariablesNew Variables Establishing baseline data for:Establishing baseline data for:
CybercrimeCybercrime Organized CrimeOrganized Crime Hate CrimeHate Crime
First data release in Spring 2007:First data release in Spring 2007:
Recent DevelopmentsRecent Developments
Spatial ModelingSpatial Modeling Goal is to determine explanatory Goal is to determine explanatory
variables for crime at neighbourhood variables for crime at neighbourhood levellevel
Observations are not independentObservations are not independent Using spatial models to “filter out” Using spatial models to “filter out”
spatial effectsspatial effects Has shown that traditional models are Has shown that traditional models are
inefficient for neighbourhood crime datainefficient for neighbourhood crime data
Recent DevelopmentsRecent Developments
Correction RatesCorrection Rates Important data quality indicatorImportant data quality indicator Are respondents acting on the E&I Are respondents acting on the E&I
reports?reports? Varies greatly across respondentsVaries greatly across respondents Concrete information for follow-upConcrete information for follow-up
Recent DevelopmentsRecent Developments
Record LinkageRecord Linkage Creation of “quality codes” to reduce Creation of “quality codes” to reduce
false positive matchesfalse positive matches
Time SeriesTime Series Using time series to impute for Using time series to impute for
missing or poor quality datamissing or poor quality data
Recent DevelopmentsRecent Developments
Variable Distribution AnalysisVariable Distribution Analysis Analyze the distribution of certain Analyze the distribution of certain
key variables:key variables: Relationship, Weapon, Location, etc.Relationship, Weapon, Location, etc.
Useful data quality toolUseful data quality tool Score function to detect biggest Score function to detect biggest
anomaliesanomalies
Future WorkFuture Work
Microdata imputationMicrodata imputation Formalized time series imputationFormalized time series imputation Proactive and more timely data Proactive and more timely data
quality measuresquality measures Periodic audits of respondentsPeriodic audits of respondents Response / imputation ratesResponse / imputation rates