Upload
matilda-booth
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Air pollution is the introduction of chemicals and biological materials into the atmosphere that causes damage to the natural environment.
We focused on Sulfur Dioxide as a major contributor to air pollution.
Sulfur is:• Highly reactive gas• Cause of acid rain• Precursor to respiratory and cardiovascular problems
Air pollution is an ongoing problem worldwide, now more than ever.
We conduct a cross-sectional study of the air pollution levels in terms of Sulfur and related factors for 41 US cities using the means over the years 1969-1971. By running several regressions we attempt to determine the likely causes of air pollution.
City SO2 Temperature Man Population Wind Rain RainDays
Phoenix 10 70.3 213 582 6 7.05 36
Little Rock 13 61 91 132 8.2 48.52 100
San Francisco 12 56.7 453 716 8.7 20.66 67
Denver 17 51.9 454 515 9 12.95 86
Hartford 56 49.1 412 158 9 43.37 127
Wilmington 36 54 80 80 9 40.25 114
Washington 29 57.3 434 757 9.3 38.89 111
Jacksonville 14 68.4 136 529 8.8 54.47 116
……. …. …. …. …. …. …. ….
The data are means over the years 1969-1971.
1. City: City
2. SO2: Sulfur dioxide content of air in micrograms per cubic meter
3. Temp: Average annual temperature in degrees Fahrenheit
4. Man: Number of manufacturing enterprises employing 20 or more workers
5. Pop: Population size in thousands from the 1970 census
6. Wind: Average annual wind speed in miles per hour
7. Rain: Average annual precipitation in inches
8. RainDays: Average number of days with precipitation per year
Histogram of sulfur levels:
Since the data has a high Jarque-Bera test and are positively skewed, sulfur levels are not normally distributed.
We ran a number of bi-variate regressions to find out which independent variables significantly explain SO2 levels, both including and excluding dummy variables.
Next we ran a multi-variate regression to see if the variables that we found to be significant are significant in explaining SO2 levels when combined.
We then tested for multicollinearity and lastly investigated an interesting problem.
Dependent Variable: SO2Method: Least SquaresDate: 11/22/09 Time: 14:03Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
TEMPERATURE -1.408133 0.468595 -3.005012 0.0046C 108.5711 26.34371 4.121328 0.0002
R-squared 0.188009 Mean dependent var 30.04878Adjusted R-squared 0.167189 S.D. dependent var 23.47227S.E. of regression 21.42044 Akaike info criterion 9.014119Sum squared resid 17894.58 Schwarz criterion 9.097708Log likelihood -182.7894 F-statistic 9.030097Durbin-Watson stat 1.848386 Prob(F-statistic) 0.004624
Temperature significantly explains SO2 levels due to the high t-statistic and low p-values.The coefficient of temperature is negative meaning SO2 levels decrease as temperature increases.
Dependent Variable: SO2Method: Least SquaresDate: 11/22/09 Time: 14:22Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
MAN 0.026859 0.005099 5.267788 0.0000C 17.61057 3.691587 4.770462 0.0000
R-squared 0.415727 Mean dependent var 30.04878Adjusted R-squared 0.400745 S.D. dependent var 23.47227S.E. of regression 18.17025 Akaike info criterion 8.684999Sum squared resid 12876.16 Schwarz criterion 8.768588Log likelihood -176.0425 F-statistic 27.74959Durbin-Watson stat 1.721399 Prob(F-statistic) 0.000005
Manufacturing Enterprises significantly explains SO2 levels due to the high t-statistic and low p-values. The positive coefficient of man means that as number of manufacturing enterprises increases so do SO2 levels.
Dependent Variable: SO2Method: Least SquaresDate: 11/22/09 Time: 14:16Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
POPULATION 0.020014 0.005644 3.546111 0.0010C 17.86832 4.713844 3.790604 0.0005
R-squared 0.243818 Mean dependent var 30.04878Adjusted R-squared 0.224429 S.D. dependent var 23.47227S.E. of regression 20.67121 Akaike info criterion 8.942912Sum squared resid 16664.66 Schwarz criterion 9.026500Log likelihood -181.3297 F-statistic 12.57490Durbin-Watson stat 1.791243 Prob(F-statistic) 0.001035
Population significantly explains SO2 levels due to the high t-statistic and low p-values.The coefficient of population is positive meaning as population increases, so does SO2.
Dependent Variable: SO2Method: Least SquaresDate: 11/22/09 Time: 14:07Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
WIND 1.555741 2.619045 0.594011 0.5559C 15.35652 25.00859 0.614050 0.5427
R-squared 0.008966 Mean dependent var 30.04878Adjusted R-squared -0.016445 S.D. dependent var 23.47227S.E. of regression 23.66448 Akaike info criterion 9.213378Sum squared resid 21840.30 Schwarz criterion 9.296967Log likelihood -186.8743 F-statistic 0.352849Durbin-Watson stat 1.818109 Prob(F-statistic) 0.555935
Wind does not significantly explain SO2 levels as can be seen by the low t-statistic and low R-square.It thus makes sense to take the wind variable out of our regression model.
Dependent Variable: SO2Method: Least SquaresDate: 11/22/09 Time: 14:14Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
RAIN 0.108262 0.318822 0.339569 0.7360C 26.06809 12.29492 2.120233 0.0404
R-squared 0.002948 Mean dependent var 30.04878Adjusted R-squared -0.022618 S.D. dependent var 23.47227S.E. of regression 23.73623 Akaike info criterion 9.219433Sum squared resid 21972.94 Schwarz criterion 9.303022Log likelihood -186.9984 F-statistic 0.115307Durbin-Watson stat 1.820565 Prob(F-statistic) 0.736003
Rain does not significantly explain SO2 levels due to the low t-statistic and low R-squared. We thus remove the wind variable from our regression model.
Dependent Variable: SO2Method: Least SquaresDate: 11/22/09 Time: 14:09Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
RAINYDAYS 0.327260 0.131760 2.483761 0.0174C -7.226963 15.39914 -0.469310 0.6415
R-squared 0.136577 Mean dependent var 30.04878Adjusted R-squared 0.114438 S.D. dependent var 23.47227S.E. of regression 22.08842 Akaike info criterion 9.075534Sum squared resid 19028.03 Schwarz criterion 9.159123Log likelihood -184.0485 F-statistic 6.169068Durbin-Watson stat 1.970233 Prob(F-statistic) 0.017404
RainyDays does significantly explain the SO2 levels due to the high t-statistic and low p-value.The coefficient of rainydays is positive meaning the SO2 levels will increase as the number of rainy days increases.
Dependent Variable: SO2Method: Least SquaresDate: 12/03/09 Time: 00:56Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
TEMPERATURE -0.417243 0.391666 -1.065304 0.2938RAINYDAYS 0.127634 0.100713 1.267308 0.2132
POPULATION -0.043929 0.015398 -2.852937 0.0071MAN 0.068179 0.016111 4.231909 0.0002
C 33.93991 27.91632 1.215773 0.2320
R-squared 0.629094 Mean dependent var 30.04878Adjusted R-squared 0.587882 S.D. dependent var 23.47227S.E. of regression 15.06835 Akaike info criterion 8.376920Sum squared resid 8173.989 Schwarz criterion 8.585892Log likelihood -166.7269 F-statistic 15.26491Durbin-Watson stat 1.543633 Prob(F-statistic) 0.000000
Dependent Variable: RAINYDAYSMethod: Least SquaresDate: 11/22/09 Time: 14:35Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
TEMPERATURE -1.577840 0.530112 -2.976427 0.0050C 201.8882 29.80212 6.774289 0.0000
R-squared 0.185108 Mean dependent var 113.9024Adjusted R-squared 0.164214 S.D. dependent var 26.50642S.E. of regression 24.23253 Akaike info criterion 9.260819Sum squared resid 22901.40 Schwarz criterion 9.344408Log likelihood -187.8468 F-statistic 8.859119Durbin-Watson stat 1.233606 Prob(F-statistic) 0.004989
Multicollinearity does exist because the two variables are significantly correlated; they have a high t-statistic and high R-square. RainyDays and Temperature are negatively correlated, as temperature goes up, rainy days goes down.
Dependent Variable: SO2Method: Least SquaresDate: 11/22/09 Time: 14:44Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
TEMPERATURE -1.094340 0.512399 -2.135717 0.0392RAINYDAYS 0.198875 0.139720 1.423383 0.1628
C 68.42054 38.36511 1.783405 0.0825
R-squared 0.229110 Mean dependent var 30.04878Adjusted R-squared 0.188537 S.D. dependent var 23.47227S.E. of regression 21.14411 Akaike info criterion 9.010956Sum squared resid 16988.80 Schwarz criterion 9.136339Log likelihood -181.7246 F-statistic 5.646841Durbin-Watson stat 1.934916 Prob(F-statistic) 0.007126
Since multicollinearity exists, we cannot look at the t-statistic for a regression using these two variables as the independent variables. We can however, continue to use the F-statistic to determine if these two variables collectively significantly impact SO2 levels. As it turns out we cannot tell which variable significantly impacts the SO2 level.
Box plot indicating the two outliers: Providence (94) and Chicago (110)
Smallest = 8 (Wichita)Q1 = 12.5 Median = 26 (Richmond)Q3 = 35.5 Largest = 110 (Chicago)IQR = 23
94 110
Dependent Variable: SO2Method: Least SquaresDate: 12/02/09 Time: 14:26Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
TEMPERATURE -1.046409 0.342685 -3.053562 0.0042C2 61.31696 15.76031 3.890593 0.0004C1 77.94481 15.73461 4.953716 0.0000C 85.00347 19.36350 4.389881 0.0001
R-squared 0.600423 Mean dependent var 30.04878Adjusted R-squared 0.568025 S.D. dependent var 23.47227S.E. of regression 15.42711 Akaike info criterion 8.402598Sum squared resid 8805.845 Schwarz criterion 8.569776Log likelihood -168.2533 F-statistic 18.53262Durbin-Watson stat 1.893874 Prob(F-statistic) 0.000000
Temperature still significantly explains SO2 levels due to the high t-statistic and low p-values.
Dependent Variable: SO2Method: Least SquaresDate: 12/02/09 Time: 14:30Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
MAN 0.025836 0.007284 3.546764 0.0011C2 68.91495 15.10630 4.562001 0.0001C1 7.380472 26.27515 0.280892 0.7804C 16.22323 3.724041 4.356351 0.0001
R-squared 0.626658 Mean dependent var 30.04878Adjusted R-squared 0.596387 S.D. dependent var 23.47227S.E. of regression 14.91206 Akaike info criterion 8.334685Sum squared resid 8227.670 Schwarz criterion 8.501863Log likelihood -166.8610 F-statistic 20.70163Durbin-Watson stat 1.877703 Prob(F-statistic) 0.000000
Manufacturing Enterprises still significantly explains SO2 levels due to the high t-statistic and low p-values.
Dependent Variable: SO2Method: Least SquaresDate: 12/02/09 Time: 14:31Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
POPULATION 0.012183 0.007103 1.715259 0.0947C2 72.14691 17.02945 4.236597 0.0001C1 49.28269 26.15993 1.883900 0.0675C 19.67230 4.719600 4.168214 0.0002
R-squared 0.536577 Mean dependent var 30.04878Adjusted R-squared 0.499002 S.D. dependent var 23.47227S.E. of regression 16.61396 Akaike info criterion 8.550832Sum squared resid 10212.88 Schwarz criterion 8.718010Log likelihood -171.2921 F-statistic 14.28020Durbin-Watson stat 1.799458 Prob(F-statistic) 0.000002
Population no longer significantly explains SO2 levels due to the low t-statistic and high p-values.
0
500
1000
1500
2000
2500
3000
3500
0 20 40 60 80 100 120
SO2
PO
PULA
TIO
NPOPULATION vs. SO2
Dependent Variable: SO2Method: Least SquaresDate: 12/02/09 Time: 14:28Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
WIND -0.393012 1.937653 -0.202829 0.8404C2 68.11667 17.62874 3.863956 0.0004C1 84.03807 17.58138 4.779946 0.0000C 30.04926 18.40261 1.632881 0.1110
R-squared 0.500282 Mean dependent var 30.04878Adjusted R-squared 0.459765 S.D. dependent var 23.47227S.E. of regression 17.25229 Akaike info criterion 8.626234Sum squared resid 11012.73 Schwarz criterion 8.793412Log likelihood -172.8378 F-statistic 12.34727Durbin-Watson stat 1.712120 Prob(F-statistic) 0.000010
Wind still does not significantly explain SO2 levels as can be seen by the low t-statistic and low R-square.
Dependent Variable: SO2Method: Least SquaresDate: 12/02/09 Time: 14:30Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
RAIN 0.070950 0.232440 0.305241 0.7619C2 67.21003 17.51681 3.836887 0.0005C1 83.79963 17.46754 4.797449 0.0000C 23.75684 8.960694 2.651228 0.0117
R-squared 0.500983 Mean dependent var 30.04878Adjusted R-squared 0.460522 S.D. dependent var 23.47227S.E. of regression 17.24018 Akaike info criterion 8.624830Sum squared resid 10997.28 Schwarz criterion 8.792008Log likelihood -172.8090 F-statistic 12.38194Durbin-Watson stat 1.716316 Prob(F-statistic) 0.000009
Rain still does not significantly explain SO2 levels due to the low t-statistic and low R-squared.
Dependent Variable: SO2Method: Least SquaresDate: 12/02/09 Time: 14:29Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
RAINYDAYS 0.278414 0.092644 3.005193 0.0047C2 64.41428 15.71003 4.100200 0.0002C1 81.24952 15.69349 5.177276 0.0000C -5.215998 10.79510 -0.483182 0.6318
R-squared 0.597879 Mean dependent var 30.04878Adjusted R-squared 0.565274 S.D. dependent var 23.47227S.E. of regression 15.47614 Akaike info criterion 8.408944Sum squared resid 8861.907 Schwarz criterion 8.576122Log likelihood -168.3834 F-statistic 18.33736Durbin-Watson stat 1.897244 Prob(F-statistic) 0.000000
Rainy Days still significantly explains the SO2 levels due to the high t-statistic and low p-value.
Dependent Variable: TEMPERATUREMethod: Least SquaresDate: 12/02/09 Time: 15:38Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
RAINYDAYS -0.114168 0.040132 -2.844797 0.0072C2 -4.720416 6.805350 -0.693633 0.4922C1 -4.462919 6.798183 -0.656487 0.5156C 68.99137 4.676276 14.75349 0.0000
R-squared 0.204186 Mean dependent var 55.76341Adjusted R-squared 0.139660 S.D. dependent var 7.227716S.E. of regression 6.704032 Akaike info criterion 6.735763Sum squared resid 1662.930 Schwarz criterion 6.902941Log likelihood -134.0831 F-statistic 3.164421Durbin-Watson stat 1.108636 Prob(F-statistic) 0.035732
Multicollinearity still exists because the two variables are significantly correlated.
Dependent Variable: SO2Method: Least SquaresDate: 12/02/09 Time: 15:30Sample: 1 41Included observations: 41
Variable Coefficient Std. Error t-Statistic Prob.
TEMPERATURE -0.741891 0.364337 -2.036277 0.0491RAINYDAYS 0.193714 0.098186 1.972930 0.0562
C2 60.91225 15.17959 4.012773 0.0003C1 77.93852 15.15345 5.143285 0.0000C 45.96809 27.18869 1.690706 0.0995
R-squared 0.639411 Mean dependent var 30.04878Adjusted R-squared 0.599346 S.D. dependent var 23.47227S.E. of regression 14.85731 Akaike info criterion 8.348710Sum squared resid 7946.626 Schwarz criterion 8.557682Log likelihood -166.1486 F-statistic 15.95916Durbin-Watson stat 1.971757 Prob(F-statistic) 0.000000
According to the F-statistic temperature and rainy days are significantly related to SO2 levels. However since multicollinearity exists we cannot refer to the t-statistic and therefore do not know how significant each variable is.
Our final model includes the two dummy variables. This regression model has a significant F-statistic and a small p-value.
0
1
2
3
4
5
6
7
8
9
-30 -20 -10 0 10 20 30
Series: ResidualsSample 1 41Observations 41
Mean 5.20e-16Median -1.262938Maximum 26.55544Minimum -29.56690Std. Dev. 12.37240Skewness 0.169384Kurtosis 2.795795
Jarque-Bera 0.267292Probability 0.874900
Histogram of sulfur levels with dummy variables:
The data has a low Jarque-Bera test, a high probability and is slightly positively skewed, so sulfur levels are normally distributed.
According to the figure above, there is an indication of heteroskedasticity. However since this is a cross sectional analysis, it does not have a significant impact on our final regression.
From our regression model, we find that temperature, rainy days and manufacturing all have a significant effect on SO2 levels, explaining 72% of the sulfur levels.
Out of the three variables however, manufacturing enterprises is the most significant explanatory variable.
Economic Impact:• Given that SO2 is a threat to human wellbeing and the environment, lowering the SO2 levels can reduce future costs.• SO2 pollution is preventable as it stems from human activity.• Lower SO2 levels could be achieved by future restrictions on the number of manufacturing enterprises or on the emission levels of SO2 they release.