6
Rodent Complaints in Boston Number of rodent complaints in Boston per 2010 US census tract, 2011-2013 Question: Is the spatial pattern of rodent complaints in Boston is related to other information in a) the Mayor’s Service Hotline data or b) the 2010 US census? Result: While statistically significant correlations are found, no clear causal relationship is suggested by the information at hand. Data: Boston Mayor Service Hotline:

Rodent Complaints in Boston

  • Upload
    stu

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Rodent Complaints in Boston. Question : Is the spatial pattern of rodent complaints in Boston is related to other information in a) the Mayor’s Service Hotline data or b ) the 2010 US census? Result : - PowerPoint PPT Presentation

Citation preview

Page 1: Rodent Complaints in Boston

Rodent Complaints in Boston

Number of rodent complaints in Boston per 2010 US census tract, 2011-2013

Question:Is the spatial pattern of rodent complaints in Boston is related to other information in a) the Mayor’s Service Hotline data or b) the 2010 US census?

Result:While statistically significant correlations are found, no clear causal relationship is suggested by the information at hand.

Data:Boston Mayor Service Hotline:https://data.cityofboston.gov/

2010 US census:http://tinyurl.com/otsvzma(links to mass.gov website)

Page 2: Rodent Complaints in Boston

Linear Model of Rodent Complaints – OLS

Ordinary Least Squares (OLS) model of rodent complaintsExogenous variables: 44 variables extracted from Mayor’s Service Hotline and 2010 census

Question: Can other information in the Mayor’s Service Hotline and 2010 census explain the spatial variability in the rodent complaints?

Note: Gray census tracts are those with < 500 residents and 1 outlier, located in Allston.

The model captures some of the spatial pattern in rodent complaints, but the difference map reveals model deficiencies. Of particular importance are the large residuals in census tracts with high observed rodent counts.

Page 3: Rodent Complaints in Boston

Linear Model of Rodent Complaints – Poisson

Generalized Linear Model (GLM), assuming Poisson distribution of rodent complaints

Question: Can we make a better model using a generalized linear model (GLM) framework, assuming a Poisson distribution of rodent complaints?

Note: Gray census tracts are those with < 500 residents and 1 outlier, located in Allston.

This exercise is reasonable because rodent complaints in Boston follow something closer to a Poisson than a Gaussian distribution. Flipping between slides shows that red/blue tones in the difference map are somewhat muted in GLM. However, large residuals do persist.

Page 4: Rodent Complaints in Boston

Linear Model of Rodent Complaints – Poisson

The GLM outperforms OLS at small values of rodent complaints, where OLS often predicts negative values. The Poisson regression also performs better at large values of rodent complaints, though there is still room for improvement.

Improvement using the Poisson GLM may be difficult to visualize in the maps, so I plot true and modeled rodent complaints in ascending order.

Robust interpretation of a model with many exogenous variables, some of which may exhibit strong colinearity, is difficult. I therefore seek a simpler model.

Page 5: Rodent Complaints in Boston

Linear Model of Rodent Complaints – SparsityI perform OLS regression again, regularizing the vector of regression coefficients using its L1 norm. The strength of the regularization is controlled by a parameter, α.

We select the variables associated with the first five regression coefficients to turn on using L1 regularization. We build a linear model from this smaller set of variables.

OLS coefficients at different strengths of regularizationL1 regularization promotes sparse solutions, meaning that many regression coefficients are set to zero.

The plot at right shows regression coefficients turning on as I relax the regularization constraint (moving from right to left on the x-axis).

Perhaps I can make a simpler model using the first few coefficients to turn on.

Page 6: Rodent Complaints in Boston

Linear Model of Rodent Complaints – Conclusions

A Poisson GLM using the five coefficients selected on the previous slide reveals nothing about rodent complaints in Boston. I skip showing the results because they are of no interest. Instead, I summarize my findings and move on to Part 2: unsupervised learning!

Conclusions:• The spatial distribution of rodent counts is not obviously causally related to most information in the data set.• Assuming the correct functional form of y can impact regression results.• L1 regularization can provide sparse estimates of regression coefficients, but this doesn’t necessarily facilitate interpretation of regressions.• Other data may be more useful for understanding the spatial distribution of rodents in Boston. I would prefer to have data on the age of buildings, zoning information (more rats around more food waste?), and the population density of outdoor cats!• Most importantly, if this were a serious investigation, I would first to speak with an expert in rodent control. Someone has put thought into this before, and that person could help facilitate this kind of analysis.