21
Forecasting critical food violations at restaurants using open data Nicole Donnelly PyData DC October 8, 2106

PyDataDC- Forecasting critical food violations at restaurants using open data

Embed Size (px)

Citation preview

Page 1: PyDataDC- Forecasting critical food violations at restaurants using open data

Forecasting critical food violations at restaurants using open data

Nicole DonnellyPyData DCOctober 8, 2106

Page 2: PyDataDC- Forecasting critical food violations at restaurants using open data

Hello!Thank you!

Page 3: PyDataDC- Forecasting critical food violations at restaurants using open data

Who are you?

Page 4: PyDataDC- Forecasting critical food violations at restaurants using open data

Who am I?

Page 5: PyDataDC- Forecasting critical food violations at restaurants using open data

Why am I here?

Page 6: PyDataDC- Forecasting critical food violations at restaurants using open data

The Project

Replicate Chicago’s Food Inspection Forecasting project using Python and data about DC.

Page 7: PyDataDC- Forecasting critical food violations at restaurants using open data

Data ComputeWrangleIngest ReportVisualizeDataReport

Page 8: PyDataDC- Forecasting critical food violations at restaurants using open data

Data ComputeWrangleIngest ReportVisualizeDataReport

Page 9: PyDataDC- Forecasting critical food violations at restaurants using open data

Data ComputeWrangleIngest ReportVisualizeDataReport

Page 10: PyDataDC- Forecasting critical food violations at restaurants using open data

Hypothesis

Foodborne illness outbreaks affect millions of people annually. The city of Washington, DC, like most cities, has limited resources to inspect food establishments for critical violations that lead to these outbreaks.

We can use machine learning to predict when a critical violation is likely to occur and prioritize inspections to catch these violations sooner, mitigating foodborne illness outbreaks and more effectively deploying limited resources.

Page 11: PyDataDC- Forecasting critical food violations at restaurants using open data

Instance: an inspection

Features: the data about the instance

Prediction: will there be a critical violation

Data

Weather

DOH InspectionsCrimeABRADCRAConstruction

RatingNumber of ReviewsCategory

Non-emergency City Issues

Places

Page 12: PyDataDC- Forecasting critical food violations at restaurants using open data

Scraping

APIs

CSVs

Ingest

Page 13: PyDataDC- Forecasting critical food violations at restaurants using open data

Clean the data

Create the instances

Come to terms with features

Feature engineering

Wrangle

Page 14: PyDataDC- Forecasting critical food violations at restaurants using open data
Page 15: PyDataDC- Forecasting critical food violations at restaurants using open data

Which estimator?

All of them

Compute

Page 16: PyDataDC- Forecasting critical food violations at restaurants using open data

Drumroll please...

Visualize

Page 17: PyDataDC- Forecasting critical food violations at restaurants using open data

Results, out of sample data

Page 18: PyDataDC- Forecasting critical food violations at restaurants using open data

The scores were not great, but reprioritizing the inspections using the model confidence scores yields results.

Report

11% more violations 10 day sooner

Page 19: PyDataDC- Forecasting critical food violations at restaurants using open data

What now?Build better dataset

Get more dataGet more input

Page 20: PyDataDC- Forecasting critical food violations at restaurants using open data

Poor scores do not mean failure, they are just a starting point.

Page 21: PyDataDC- Forecasting critical food violations at restaurants using open data

Thanks!Nicole Donnelly

[email protected]@NicoleADonnellyGithub: nd1