16
Multiple Regression and Model Building (cont’d) + GIS 11.220 Lecture 21 3 May 2006 R. Ryznar

Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Multiple Regression and Model Building (cont’d) + GIS

11.220Lecture 213 May 2006R. Ryznar

Page 2: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Model Summaryb

.991a .982 .977 46.801Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), SizeSquared, HomeSizea.

Dependent Variable: EnergyUseb.

ANOVAb

831069.5 2 415534.773 189.710 .0001a

15332.554 7 2190.365846402.1 9

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), SizeSquared, HomeSizea.

Dependent Variable: EnergyUseb.

Coefficientsa

-1216.1438870 242.80636850 -5.009 .001552.39893018 .24583560 4.049 9.758 .00003-.00045004 .00005908 -3.161 -7.618 .00012

(Constant)HomeSizeSizeSquared

Model1

B Std. ErrorUnstandardized Coefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: EnergyUsea.

S2 = SSE/n – (k + 1)Sometimes called MSE

F= ______R2/k______(1-R2)/[n-(k+1)]

R2=SSR/SST or 1-(SSE/SST)

SSE

1-[(SSE/n-k+1)/(SST/n-1)]

K=number of X variablesεβββ +++= 2210 xxy

Page 3: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Model Summaryb

.912a .832 .811 133.438Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), HomeSizea.

Dependent Variable: EnergyUseb. ANOVAb

703957.2 1 703957.183 39.536 .000a

142444.9 8 17805.615846402.1 9

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), HomeSizea.

Dependent Variable: EnergyUseb.

εββ ++= xy 10

Coefficientsa

578.928 166.968 3.467 .008.540 .086 .912 6.288 .000

(Constant)HomeSize

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: EnergyUsea.

Page 4: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Correlation with Y (r)(survival time)

x1 .346x2 .593

x3 .665

x4 .726

X variables SSE R2

X1 (Blood Clotting) 3.4961 .120X2 (Prognostic Ind.) 2.5763 .352X3 (Enzyme Func.) 2.2153 .442X4 (Liver Func.) 1.8776 .527X1, X2 2.2325 .438X1, X3 1.4072 .646X1, X4 1.8758 .528X2, X3 0.7430 .813X2, X4 1.3922 .650X3, X4 1.2453 .687X1, X2, X3 0.1099 .972X1, X2, X4 1.3905 .650X1, X3, X4 1.1156 .719X2, X3, X4 0.4652 .883X1, X2, X3, X4 0.1098 .972

x1 x2 x3 x4

x1 1 .090 -.150 .502

x2 1 -.024 .369

x3 1 .416

x4 1

Page 5: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Standardized coefficients used to establish a common metric for comparison

εαεββα

+++=+++=

.).(1)(2.).()( 21

QIeducationofyearsincomeQIeducationofyearsincome

Can you say that years of education is more important than I.Q.?

Of course, you cannot, because they are not measured with the same metric. One way to solve this problem of comparing beta coefficients is to use standardized coefficients.

Standardized coefficients are calculated in a regression equation using the z-scores of the dependent (Y) and independent (X) variables.

Page 6: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Interpreting the standardized coefficients

One standard deviation of x1 will increase y by the standardized coefficient associated with x1.

Coefficientsa

-1216.1438870 242.80636850 -5.009 .001552.39893018 .24583560 4.049 9.758 .00003-.00045004 .00005908 -3.161 -7.618 .00012

(Constant)HomeSizeSizeSquared

Model1

B Std. ErrorUnstandardized Coefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: EnergyUsea.

Descriptive Statistics

10 1594.70 306.66710 1880.00 517.62310 3775540 2153984.10510

EnergyUseHomeSizeSizeSquaredValid N (listwise)

N Mean Std. Deviation

Every increase of 1 s.d. in X1 increases the Y by 4.049 s.d., i.e., 4.049 * 306.667=1241.69 or using the unstandardized coefficients 2.39893018 * 517.623=1241.74 (rounding errors …but they should be equal)

Page 7: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Dummy variableseducofyrsOTHERHISPANCAUCASASIAMERIncome 12*95.*2.2*7.0*5.2*9.141.5 +−+−+++=

Page 8: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

MulticolinearityData for 67 Florida Counties

• fem = Percentage of households headed by a female

• inc = Median income• hs = Percentage of residents over 25 years old

with at least a high school diploma• urb = Percentage of residents living in an urban

environment• cr = Number of crimes per capita• unemrt = Unemployment rate

Page 9: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

fem inc un hs urb cr unemrt

unem

rtcr

urb

hsun

inc

fem

Page 10: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Correlations

1 -.561** -.055 -.511** -.435** -.143 -.055.000 .661 .000 .000 .248 .661

67 67 67 67 67 67 67-.561** 1 -.119 .793** .730** .432** -.119.000 .337 .000 .000 .000 .337

67 67 67 67 67 67 67-.055 -.119 1 -.250* -.053 -.001 1.000**.661 .337 .041 .670 .996 .000

67 67 67 67 67 67 67-.511** .793** -.250* 1 .791** .468** -.250*.000 .000 .041 .000 .000 .041

67 67 67 67 67 67 67-.435** .730** -.053 .791** 1 .678** -.053.000 .000 .670 .000 .000 .670

67 67 67 67 67 67 67-.143 .432** -.001 .468** .678** 1 -.001.248 .000 .996 .000 .000 .996

67 67 67 67 67 67 67-.055 -.119 1.000** -.250* -.053 -.001 1.661 .337 .000 .041 .670 .996

67 67 67 67 67 67 67

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

fem

inc

un

hs

urb

cr

unemrt

fem inc un hs urb cr unemrt

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

Page 11: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Detecting Multicollinearity with the Variance Inflation Factor (VIF)

Coefficientsa

.024 .042 .579 .565

.002 .001 .172 1.516 .135 .646 1.5471.450E-08 .000 .002 .015 .988 .313 3.191

.000 .001 -.090 -.482 .632 .237 4.217

.000 .001 .030 .304 .762 .842 1.188

.001 .000 .824 5.172 .000 .328 3.049

(Constant)feminchsunemrturb

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIFCollinearity Statistics

Dependent Variable: cra.

The percentage of each variable not related to the other predictors.

VIF = 1/Tolerance. If Tolerance =1, then VIF =1. As VIF becomes larger, greater overlap exists among predictors.

Page 12: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Z scores for crime per capita

Page 13: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Z scores for % living in

urbanized area

Page 14: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS
Page 15: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Positive and significant z-score indicates spatial clustering of high values.

Negative and significant z-score indicates spatial clustering of low values.

Page 16: Multiple Regression and Model Building (cont’d) + GISdspace.mit.edu/bitstream/handle/1721.1/55900/11-220Spring-2006/N… · Multiple Regression and Model Building (cont’d) + GIS

Final Paper data in GIS

ma_eqv.dbf MA ‘Kind of Community’ (KOC) data for all cities/towns in MAma_eqv_intro.txt A brief explanation of the MA Department of Revenue’s Kind-of-

Community classification of MA cities and towns

GIS Spatial Data Set (formatted as ArcGIS shapefiles and located in the ‘gis’ sub-directory):

ma_towns00 Town boundaries for MA cities and townsmajmhda1 Major roads for MA 9see ‘class’ for road type distinctions)maj_pop1 Major MA lakes and ponds (for better cartography)p525_ma Boundaries for MA PUMA regionsmajmhdcl.avl Pre-configured classification and symbols for MA major roads