14
Basic Econometrics Notes

Basic Econometrics Notes

  • Upload
    others

  • View
    29

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Basic Econometrics Notes

BasicEconometrics

Notes

Page 2: Basic Econometrics Notes

Econometricsisabranchofmathematicalstatisticsthatisconcernedwithapplyingstatisticstonon-experimentaldata.Usedfor;-Estimatingeconomicrelationships-Testingeconomictheories-Evaluatingpolicy-ForecastingmacrovariablesNon-experimentaldatareferstodatathatisnotaccumulatedthroughtheuseofcontrolledexperimentsonindividuals,firmsorsegmentsoftheeconomy.Non-experimentaldataisalsoreferredtoasobservationaldataorretrospectivedata.Experimentaldata,ontheotherhand,referstodatacollectedinlaboratoryenvironmentsinthenaturalsciences.Itishardtodeviseexperimentsineconomicsbecauseof;-Theyareoftenimpossibletoconduct-Theyareprohibitivelyexpensive-TheycanbemorallyrepugnanttoconductthekindsofexperimentsneededtoaddresseconomicissuesAnempiricalanalysisemploysdatatotestatheoryorestimatearelationship.4Stepsinconductinganempiricalanalysis.(S.E.E.F.)1.Specification(F.F.D.A.)Thisinvolvesspecifyingthemodel,whichisformulatingthequestionthatdescribesvariousrelationships.Aneconomicmodelconsistsofmathematicalequationsthatdescribevariousrelationships.Thisentailslookingat:-First-Thenatureofstochasticfunctions-Thereisanunobservable‘populationregressionfunction’(PRF),whichwetrytoapproximatewithour‘sampleregressionfunction’(SRF)-Thisgivesustheconceptofthe‘conditionalmean’,E(y|x)(E=expected)-Second-Thetypeofmathematicalfunctions-Themostcommonlyusedfunctionalformsarethefollowing;-Linearfunctions:yt=β0+β1xt-Quadraticfunctions:yt=β0+β1x2t-Powerfunctions:yt=β0(xt)β1-Exponentialfunctions:yt=β0eβ1t-Third–Thechoiceofvariables-Choosingthedataavailabilityanddataqualityused.-Fourth–Theassumptionsmadeabouttheerrorterms-Theerrorterm,ut,representsthosefactorsthatimpactonytwhicharenotincludedinthemodel-Wecanneverknowexactlywhattheimpactonytisofalltheothervariables-Becauseofthis,weneedtomakecertainassumptionsaboutthenatureofut-Thereare5keyassumptionsgenerallymadeabouttheerrorterms:1.Theerrortermsarerandomvariableswithanormaldistribution2.Theyhaveazeromean,thatis,E(ut)=0forallt=1,2,...,n3.Theyhavethesamevariance,thatis,Var(ut)=σ2forallt=1,2,...,n 4.Theyareindependentrandomvariables,thatis,Cov(ut,us)=0forallt,s=1,2,...,n

Page 3: Basic Econometrics Notes

5.Theerrortermsandtheindependentvariablesthemselvesareindependentofeachother,thatis,Cov(ut,Xt)=02.Estimation-Themodelisestimatedhere,toestimateweuseordinaryleastsquaresmethod(OLS).Thisisbasedonminimisingthesumofthesquaredresidualsinthis

equation .OLSminimisesthefunctionwithrespecttobeta0andbeta1.(Anythingwithahatisestimatedorexpectedvalues)3.Evaluation-Duringthisstagewealwaysexaminetheestimatedresiduals.Ifourmodeliscorrectlyspecifiedthentheestimatedresiduals(e)arethesampleestimatesofthepopulationerrorterms(u).Wealsolookattheparameterssizeandsign(beta0and1)toseeiftheymakesense.-Oncewearesatisfiedthatitmakessenseweconductteststoseeifthereisasignificantrelationship.Individualtestsincludet-testsandcorrelationanalysisandtestthesignificanceofindividualvariables.JointteststestforoverallsignificanceandconsistofRsquared,RhatsquaredandANOVA.4.ForecastingWhenwetalkaboutforecasts,werefertoforecastsfromtwodifferentperiods:-Withinsampleforecasts,whichareforecastsgeneratedwithinthesampleperiodusedtoestimatethemodel-Outofsampleforecasts,whichareforecastsgeneratedforanyperiodthatwasnotusedinestimatingthemodelTherearealsotwotypesofforecasts,staticanddynamic:-Staticforecastsareobtainedbysubstitutingknownvaluesoftheindependentvariableintotheestimatedequation-Dynamicforecastsareobtainedbysubstitutingvaluesoftheindependentvariables,whicharenotactuallyknownbutaregeneratedbythemodelMostapplicationsareconcernedwithoutofsamplestaticforecastsToestimateamodelweneedtoconverttheeconomicmodelintoaestimableeconometricmodel:

Fromthis: Tothis:

Econometricmodelsmaybeusedforhypothesistesting.Forexample,theparameterB3representstheeffectoftrainingonthewage.Howlargeistheeffectandisitstatisticallydifferentfromzero?

2t1

n

1t0t

n

1t

2t ]xˆˆy[e β−β−=∑∑

==

Page 4: Basic Econometrics Notes

StructureofEconomicDataToundertakeaneconometricanalysis,dataisrequiredonthevariablesofinterest.-Thetypeofeconometricmethodsemployedwilldependonthenatureofthedataused.-Theuseofaninappropriateestimationmethodmayleadtomisleadingresults

• Therearefourdifferentkindsofeconomicdatasets• Cross-sectionaldata• Timeseriesdata• Pooledcrosssections• PanelorLongitudinaldata

Cross-sectionaldatasets:Thesedatasetsconsistofasampleofobservationsonindividuals,households,firms,cities,states,countries,orotherentitiesofinterest,measuredataparticularpointintime.(theaverageageexample).Cross-sectionalobservationsareassumedtohavebeenobtainedthroughpurerandomsamplingfromapopulation.Theyareassumedtobeindependent.Purerandomsamplingiswhereeachobservationisdrawnatrandomfromapopulation,sothateachindividualobservationisaslikelyasthenexttobedrawn,andthateachdrawisindependentofallotherdraws.Thisdataisusuallyemployedinfieldsofappliedmicroeconomics.Problemswiththisdataandpurerandomsampling:-Refusaltoparticipate(leadstosampleselectionbias)-Samplingcanleadtoclustering(wheresampledentitiesarelargelyrelativetothepopulation.-Differentvariablescancorrespondtodifferenttimeperiodsincross-sectionaldata.Example

Timeseriesdata:Thistypeofdatasetconsistsofobservationsonavariableorseveralvariables,measuredovertime.Timeseriesobservationsarenotgenerallyindependent.Theytendtobestronglyrelatedtotheirrecenthistories.Theyare‘seriallycorrelated’.Becauseofthisdependency,modificationstostandardestimation

Page 5: Basic Econometrics Notes

techniqueshavebeendevelopedtotakeintoaccountthisdependency.Timeseriesdatatypicallydisplaytrendsandseasonalpatterns.Someofthemorecommonfrequenciesatwhichdataarecollectedaredaily,weekly,monthly,quarterly,andannualintervals.TimeseriesdataistypicallyemployedinfieldsofAppliedMacroeconomicsandFinance.Example:

Pooledcrosssections:(combinationofcross-sectionalandtimeseries)Thesedatasetshavebothacross-sectionalandtimeseriesdimension.Forinstance,supposethattwohouseholdsurveysaretakeninAustralia,onein2010andtheotherin2014.Combiningthetwoyearstoincreasethesamplesizecanformapooledcrosssection.Crosssectionsaredrawnindependentlyofeachother.Forinstance,anewrandomsampleofhouseholdswouldbetakenin2014toanswerthesamesurveyquestions.Pooledcrosssectionscanoftenbeusedtoevaluatepolicychanges.Forinstance,supposewewantedtoevaluatetheeffectofareductioninpropertytaxesthatoccurredin1994.Wewouldcollectarandomsampleofhousepricesfortheyear,1993.Anewrandomsampleofhousepriceswouldbecollectedfortheyear,1995.Wewouldthencomparetheeffectthatreductionsinpropertytaxeshavehadonhouseprices.Comparebeforeandafterthereformwasimplemented,where1993wasbeforethereformand1995isafterthereform.

Checkoutthesplitinyear’sform1993to1995.

Page 6: Basic Econometrics Notes

PanelorLongitudinalDataThistypeofdatasetconsistsofatimeseriesforeachcrosssectionalentity.Withpaneldata,thesamecross-sectionalunitsarefollowedovertime.Thisfeaturedistinguishesapaneldatasetfromthatofapooledcrosssection.Assuch,paneldatahaveacross-sectionalandatimeseriesdimension.Observingthesameentitiesovertimeleadstoseveraladvantagesovercross-sectionalorpooledcrosssectionaldata.Paneldataallowsustocontrolfortime-invariantunobservedcharacteristics.Paneldataallowsustomodellaggedresponses.Considerthefollowingexampleoncrimestatisticsfor150cities.Eachcityisobservedintwoyears,1986and1990.Anumberoftime-invariantunobservedcitycharacteristicsmaybemodelled.Theeffectofpoliceoncrimeratesmayexhibitatimelag,whichcanbemodelled.Example:2yeardataoncitycrimefor150cities.

CausalityandCeterisParibus:Ourgoalistoinferthatonevariablehasacausaleffectonanothervariable.Causalityiswhereachangeinonevariableleadstoachangeinanothervariable.Findinganassociationbetweentwoormorevariablesmaybesuggestivebutnotconclusiveunlesscausalityisestablished(achangemaybecoincidenceorafactorhasnotbeenaccountedforinthemodel).Weneedtoholdallotherfactorsconstantinordertoanalysetheeffectofoneparticularvariableonanother.Ifallotherfactorsarenotheldconstant,thenitisimpossibletoknowthecausaleffectofonevariableonanother.Someexampleswhereinferringcausalityarisesconsistof:-Measuringthereturnofeducation-Analysingtheeffectoflawenforcementoncrimelevels-AnalysingtheeffectoftheminimumwageonunemploymentThisisduetothefactthatitisdifficulttoisolatevariablesinarelationshipthatwillbeobviouslyaffectedbyothervariablesinasimplelinearregression.

Page 7: Basic Econometrics Notes

CHAPTER2:TheSimpleRegressionModelDefinitions:Weareinterestedinexplainingyintermsofxorhowyvarieswithchangesinx.Inwritingamodelthatexplainsthistherearethreeissues:-First,sincethereisneveranexactrelationship,howdoweallowforotherfactorstoaffecty?-Second,whatisthefunctionalformbetweenyandx?-Third,howcanwebesurethatwearecapturingaceterisparibusrelationshipbetweenyandx?Someofthesecanbesolvedbywritingtheequationyt=β0+β1xt+uwhichisknownasthesimplelinearregression(SLR)orthetwovariableregressionmodelorabivariateregressionmodel.-Thevariablesyandxhaveseveraldifferentnamesthatareusedinterchangeably:-Theyvariableiscalledthedependentvariable,theexplainedvariable,theresponsevariable,thepredictedvariable,andtheregressand-Thexvariableiscalledtheindependentvariable,theexplanatoryvariable,thecontrolvariable,thepredictorvariable,andtheregressor-Thevariable,u,representsallfactorsotherthanxthataffecty.Itisknownastheerrorterm,thedisturbanceterm,thestochasticterm,andtherandomtermTherefore,ifthechangeinu=0thenthechangeinywill=thechangeinx.BorepresentstheinterceptandB1representstheslopeparameterinaSLR.UsingtheassumptionthatE(u)=0,andthatE(β0)=β0andE(β1)=β1,wecanobtainwhatisknownasthe‘populationregressionfunction’(PRF):E(y|x)=E(β0+β1x+u)E(y|x)=E(β0+β1x)+E(u)E(y|x)=E(β0)+E(β1x)+E(u)Therefore:E(y|x)=β0+β1x.ThismeansthelinearrelationshipofthePRFgivesusaoneunitincreasesinxchangestheexpectedvalueofybytheslopeamount.DerivingOLSestimatesAlinebestfitdrawniscalledthesampleregressionfunctionthatis

Foranyxivalue,thedifferencebetweentheactualvalueofyiandthevaluegivenbythesampleregressionfunctioniscalledtheresidual,ui,where:

Foracorrectlyspecifiedmodeltheresidualisthesampleestimateoftheerrorterm.Liketheerrorterm(u),itrepresentsthatpartofthevalueofthevariableythattheestimatedlinearmodelisunabletoexplain.TheOLSmethodstatesthatweshouldchoosetheSMFlinethathasthesmallestresiduals.Thelinethatminimisestheamountofvariationinythatcannotbe

i10i xˆˆy β+β=

)xˆˆ(yyyu i10iiii β+β−=−=

Page 8: Basic Econometrics Notes

explainedbythemodel.Tomeasurethevariationwesumtheresiduals.Howevertheywouldequal0aretheyarenegativeandpositive.Toavoidthiswe

sumtheirsquaredvalues: TheOLSprocedureminimisesthefunctionwithrespecttotwounknownsBoandB1.Usingcalculusthenecessaryconditionsforlocalextremumare

Thisisnowexpandedtogive‘normalequations’

Andsolved:

Usinganexamplewherei=7wehave

Sothat:THESEFORMULASAREVERYIMPORTANT!

Andwegetaregressionfunctionof:

Weinterpretthisas:

§ Whenthetemperatureisequalto0oC,thelevelofmonthlysalesforFisher-Hausenis-$1,728.96

§ Anincreaseintemperatureby1oCleadstoanincreaseinmonthlysalesforFisher-Hausenby$466.33

§ Supposethatthetemperatureispredictedtobe20oCinAugust

WhatwouldbethelevelofsalesinAugustearnedbyFisher-Hausen?Ŝi=-1728.956+466.330(20)=$7597.64

∑∑==

−=n

1i

2ii

n

1i

2i ]yy[u 2

i1

n

1i0i ]xˆˆy[ β−β−=∑

=

0x)xˆˆy(2ˆ

u

0)xˆˆy(2ˆ

u

ii10i1

n

1i

2i

i10i0

n

1i

2i

=β−β−−=β∂

=β−β−−=β∂

∑∑

∑∑

=

=

0xˆxˆyx

0xˆnˆy

2i1i0ii

i10i

=β−β−

=β−β−

∑∑∑

∑∑

( )( )[ ]

( )∑

∑∑

∑∑∑

=

=

==

===

−−=

⎟⎠

⎞⎜⎝

⎛−

⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛−

=β n

1i

2i

n

1iii

2n

1it

n

1i

2i

n

1ii

n

1ii

n

1iii

1

xx

yyxx

xxn

yxyxnˆ

xˆyˆ10 β−=β

∑=

=n

1ii 76500y ∑

=

=n

1ii 190x ∑

=

=n

1iii 2116000yx 143.27x =

570.10928y = ∑=

=n

1i

2t 5242x

330.466)190()5242(7

)76500)(190()2116000(7ˆ21 =

−=β

956.1728)143.27(330.466570.10928ˆ0 −=−=β

ii x330.466956.1728y +−=

Page 9: Basic Econometrics Notes

OLSSampledataproperties:

1.OLSresidualsiszerobecause 2.CovariancebetweenregressorsandOLSresidualsiszerobecause

3.ThepointxandyhatisalwaysontheestimatedOLSregressionline.Inotherwords,ifwetakethesampleregressionfunction,andpluginxhatforx,thenthepredictedvalueisyhat.Whenevaluatingthequalityofourestimatedmodelweusethevarianceoftheerrorterms.toobtainanunbiasedestimateofthevariancefortheerrortermweusethefollowingformula:

Wherekisthenumberofparametersestimated,excludingtheconstant.Anduhat=0

Standarderroris: Thedifferencebetweenyobservationandyestimateistotalvariance.

( ) 0xˆˆyn

1ii10i =−−∑

=

ββ

( ) 0uxyn

1ii

n

1ii10i ==−− ∑∑

==

ˆˆˆ ββ

( )

)1kn(

uuˆ

n

1i

2i

2

−−

−=σ∑=

σ 2 =ut2∑

n− k −1

)1kn(

n

1i

2t

−−=σ

∑=

6,000

7,000

8,000

9,000

10,000

11,000

12,000

13,000

14,000

15,000

16,000

20 22 24 26 28 30 32 34

Y

X

Page 10: Basic Econometrics Notes

ExplainedvariationisbetweenthehorizontallineandtheendpointofOLSline.Theunexplainedvariationisthebracketedspacethatisuhat.Thereforetotalvariationisyobservation–yestimate=yobservation.Thisistruewhenwehavenocorrelationbetweenvariables.IfwecallthedistancebetweenY1andtheTotalVariation,thenthiscanbedividedintotwoparts:Thefirstpartisthedifferencebetweenthevaluepredictedbythemodelandthemeanvalue.-ThisiscalledtheExplainedVariationbecauseitisthecomponentofthevariationinYifrom,whichourregressionmodelcanexplain

Forobservation1,wehave:ExplainedVariation(ESS)= -ThesecondpartisthedifferencebetweentheactualvalueofY,andthevaluethatthemodelpredicts.ThisiscalledtheResidualorChanceVariationbecauseitisthecomponentofvariationthatthemodelisunabletoexplain:ResidualVariation(RSS)=

TotalvariationTSS=ESS+RSS

Oncethemodelhasbeenestimateditmustbefittedtodeterminethegoodnessofthisoverallfit,whichismeasuredbythecoefficientofdeterminationorrsquared.IfwedividebothsidesbytheTSS,weget:1=(ESS/TSS)+(RSS)/(TSS)

Rsquared: ThevalueofR2liesbetween0and1,thatis0≤R2≤1IfESS=TSS,thenR2=1Thisimpliesaperfectfit.R2isameasureofthegoodnessoffitoftheestimatedregressionmodeltothesampledata,andthebetterthisfitis,thecloserisR2toavalueof1IftheESS=0,thenR2=0-ThissuggeststhattheregressionmodeldoesnotprovidebetterestimatesforYi.-Moreimportantly,afindingthatR2=0suggeststhatalinearrelationshipdoesnotexistbetweenYiandtheregressors(xvariables).-Thisimpliesthatanon-linearrelationshipexistsbetweenYiandtheregressors.-WhethertheobtainedR2ishighenoughdependsontheactualproblemstudiedandthetypeofdatathathasbeenused.-Inthecaseofcross-sectionaldata,R2≥0.5isusuallyconsideredtobeagoodenoughfit.Itislowerbecauseitdoesnottakeintoaccountthenatureoffluctuationsoveratimeseries;ratheritisasnapshotintime.ThisiswhytimeseriesdatarequiresahigherRsquared.-Intime-seriesdata,R2≥0.9isusuallyconsideredtobeagoodenoughfit.

111 XˆY-Y β=

111 uY-Y =

2n

1iii

2n

1ii

2n

1ii )Y-Y()Y-Y()Y-Y( ∑∑∑

===+= ∑

=

=

=

= == n

1i

2i

n

1i

2i

n

1i

2i

n

1i

2i

)Y-Y(

u-1

)Y-(Y

)Y-Y(

∑∑ ∑n

1i

2i

2i

n

1i

n

1i

21

2i uxˆy

== =

+β=

2n

1iii

2n

1ii

2n

1ii )Y-Y()Y-Y()Y-Y( ∑∑∑

===+=

Page 11: Basic Econometrics Notes

ThenumberofregressorsdoesnotaffecttheTSS;itisfixedforagivensample.TheRSSdecreases,ordoesnotincrease,asmoreandmoreregressorsareincludedinthemodel.R2isanincreasingfunctionofk,oratleast,isnon-decreasingink.Thatis,R2onlyincreaseswhentheadditionofanotherindependentvariablehelpsexplainthevariationinthedependentvariable.ThevalueofR2isconstrainedbythesamplesize.But,kcannotbeincreasedindefinitely,becausedf=n-(k+1)≥0TheOLSestimatorsdonotexistifdf<0,andthelargerthedf,themorereliabletheestimatedmodel.Aquestionthatiscommonlyaskedis:‘shouldweincludeaconstant’?Ifwedon’t,thenweareforcingthemodeltopassthroughtheorigina.k.a.zeroonboththeyandx-axis(inthebottomleftmostcorner).Ineffect,weareimposingtherestrictionthatβ0=0whichtheoretically,thismaybeincorrect.Forinstance,inestimatingaconsumptionfunction,imposingβ0maybeinvalid,becauseconsumptionexpenditureisstillundertakenthroughsavingsandothermeasuresofwealth,andnotjustdirectlythroughincome(peoplestillconsumewheretheirincomeiszero).Weshouldalmostalwaysincludeβ0inourestimatedmodel.Evenifβ0fromtheestimatedmodelisveryclosetozeroinmagnitude,andisstatisticallyinsignificant,itisgoodtoleaveitin.Ifwedon’t,weareforcingthemodeltogothroughtheorigin.Donotexcludeβ0fromtheestimatedmodelunlessyouhavesomestrongtheoreticalreasontodosowhichisrare.Chapter3:MultipleRegression,estimationAsimplelinearregressionmodelisquitelimitedinhowwetrytoexplainonlyonevariableintermsofanother.Incontrast,amultiplelinearregressionmodeldealswithseveralexplanatoryvariables.Weavoidtheproblemofomittedvariablebiasandincludefactorsthatwouldobviouslybeincludedintheerrorterm.Weinterpretthevariablesthesameasasimplelinearregression.Tohelpuschoosewhatvariablestoincludeinourmodelweuseacorrelationmatrix.

Page 12: Basic Econometrics Notes

Doesthelittleprepresentsigma?Thismeansthatsigma,iscloseto+1thereisastrongpositiverelationship.Closeto-1meansthereisastrongnegativerelationship.+1or-1beingtheperfectrelationship.0meansthatxandyareindependent.Ifwesquaresigma,thisgivesustheproportionofthevarianceinonevariablethatcanbedirectlyrelatedtochangesintheother.

• |ρ|>0.8–strongcorrelation,withρ2≈2/3• |ρ|<0.5–weakcorrelation,withρ2=¼• Inbetween–moderatecorrelation

Thisisacorrelationmatrix

StatisticalSignificance:Probabilityvaluelessthan0.05(p-value)T-testforsignificanceLEARNTHISFORMULA

Page 13: Basic Econometrics Notes

LEARNWHATLEVELLOGLEVELLEVELANDLOGLOGRELATIONSHIPSARE.Mustknowtheseformulas:PRF=

SRF=

OLS= Whenworkingwithamultipleregressionthet-statis

Multicollinearityiswhenseveralindependentvariablesarelinearlyrelated.Thiscanalterresultsdrastically.2typescanoccurthatincludeexactandnear.Iftwoormoreindependentvariableshavealinearrelationshipbetweenthem,thenwehaveperfectorexactmulticollinearity.ThereisnouniquesolutiontothenormalequationsobtainedbyOLS.Whenthereisalinearrelationshipamongthevariables,thevariance-covariancevaluesbecomeundefined.Itisnotpossibletoestimatetheregressioncoefficientsinthecaseofexactmulticollinearity.Mostcomputerprogramswillhighlightthisbydisplayinganerrormessage,suchas‘matrixsingular’,‘nearsingularmatrix’,or‘exactcollinearityencountered’Inmostpracticalsituations,thereisaclosebutnotexactrelationshipbetweentheXvariablesinthemodel.Inthisinstance,weobtainactualparameterestimates.Estimationpackagesdon’ttellusthatwehaveahighlycollinearrelationshipamongtheindependentvariables.Whentheseestimateshaveverylargevariances,itmeansthattheirvaluesvarywidelyfromonesampletoanother.Thismeansthatwhenanewsetofobservationsbecomesavailableattheendofthemonthortheendofthequarter,whenweusethesenewvaluestore-estimateourmodel,wemayobtainestimatesthatdiffersignificantlyfromtheoneswearenowusing.Ageneralruleofthumbthat’susedtoindicatewhetherornotmulticollinearityisaproblemisifR21▪≥0.90,whichimpliesthattheVIF≥10.

Whywedon’tignoremulticollinearity?

First,iftwoormoreexplanatoryvariablesareexactlylinearlyrelated,thenthemodelcannotbeestimated.Second,ifsomeexplanatoryvariablesarenearlylinearlyrelated,thentheOLSestimatorsarestillBLUE,buthaveverylargevariancesandcovariance’s,makingpreciseestimationdifficult.

y X u= ʹ +β

! !y X= ʹβ

! ( )β = ʹ ʹ−X X X y1

tVar SE

n kk k

k

k k

k− − =

−=

−1

!

(! )

!

(! )β β

β

β β

β

Page 14: Basic Econometrics Notes

Third,nearmulticollinearityincreasesthestandarderrorsoftheregressioncoefficientsandreducesthet-statistics,makingcoefficientslesssignificantandpossiblyeveninsignificant.Fourth,thecovariancebetweentheregressioncoefficientsofapairofhighlycorrelatedvariableswillbeveryhigh,inabsoluteterms,makingitdifficulttointerpretindividualcoefficients.

MCisaproblemwiththedatanotthemodel.Itcomesupwithahighrsquaredandlowtstatvalues.Highvaluesforcorrelationcoefficientsbetweentheindependentvariables.Thisisdifferenttoahighcorrelationcoefficientwiththedependentandindependentvariables.Regressioncoefficientsappeartobesensitivetomodelspecification.Notethatmulticollinearitymaystillbepresentevenifapairwisecorrelationcoefficientmaybequitelow.Thisisbecause3ormorevariablesmaybelinearlyrelated;yetpairwisecorrelationcoefficientsmaynotbehigh.Insuchinstances,multicollinearityisobservedwhenregressioncoefficientsaredrasticallyaltered,evenpossiblyreversingsigns,whenvariablesareaddedordropped.TosolveMC:-Eliminatevariables(butthiscanleadtoomittedvariablebias)-IncreasesamplesizeWhenyouhaveamultipleregressionmodel,addinganewindependentvariable,nomatterhowirrelevantorridiculousitis,eitherleavestheR2thesame,orincreasesitsvalue.Thus,weareunabletouseR2tocomparetwomodelswithdifferentnumbersofindependentvariables.WemustthenusetheadjustedRsquared.

Thedegreesoffreedomforthesumsofsquaresare:

TSS:n-1 ESS:(k+1)-1 RSS:n–(k+1)

and,dfTSS=dfESS+dfRSS,wherekrepresentsthenumberofregressors,and(k+1)isthenumberofparameterstobeestimated,whichincludestheconstantterm.

Althoughahigherisusuallypreferredtoalowerone,oneshouldneverattempttomaximisethevalueoftheadjustedR2attheexpenseofeconomictheory.

)R-1(1-k-n

1-n-1

TSSRSS

1-k-n

1-n-1

)1-n/(TSS)1-k-n/(RSS

-1df/TSSdf/RSS

-1R

2

TSS

RSS2

==

==