20
Hypothesis testing Is the given estimation in keeping with the stated hypothesis (null hypothesis) Null hypothesis is tested against alternate hypothesis Alternate hypothesis can be simple or composite Two approaches to hypothesis testing: Confidence interval approach Test of significance approach Both approaches assume estimate follows a distribution

class4

Embed Size (px)

DESCRIPTION

notes

Citation preview

Page 1: class4

Hypothesis testing● Is the given estimation in keeping with the stated

hypothesis (null hypothesis)● Null hypothesis is tested against alternate

hypothesis● Alternate hypothesis can be simple or composite● Two approaches to hypothesis testing:

– Confidence interval approach– Test of significance approach

● Both approaches assume estimate follows a distribution

Page 2: class4

Confidence Interval approach

• Construct a 100(1-a)% confidence interval for b. If b under Ho falls within this confidence interval do not reject Ho, but if it falls outside the interval, reject Ho

• b^=0.5091• Ho:b=0.3, Ha:b 0.3• Interval at 95% confidence level is 0.4288<b<0.5914• This interval does not include 0.3• Therefore null hypothesis is rejected• There is still a 5% probability of rejecting a true

hypothesis. This is called type I error.

Page 3: class4

Test of Significance approach• Construct a 100(1-a)% confidence interval for b^. If b^

under Ho falls within this confidence interval do not reject Ho, but if it falls outside the interval, reject Ho

• b^=0.5091 SE(b^)=0.0357• Ho:b=0.3, Ha:b 0.3, df=8 tc=2.306 • Interval at 95% confidence level is 0.3-SE(b^)*tc and

0.3+SE(b^)*tc

• Pr(0.2177<b^<0.3823)=0.95• Check if this interval includes the estimated b of 0.5091• If not , the null hypothesis is rejected. If yes, accept Ho.• There is still a 5% probability of rejecting a true

hypothesis. This is called type I error.

Page 4: class4

Test of Significance approach● Practically under this approach there is no need to estimate

the upper and lower limits of the hypothesised b and check if estimated b falls within it

● Inference can be made about the hypothesis by Estimating the t value and comparing it to critical table value

● t=|b^-b|/SE(b^)= estimated table value● Compare above with tc value from table (critical t value)● If t>tc, t is statistically significant so reject null hypothesis● If t<tc, t is statistically insignificant so accept null hypothesis● Tc is determined depending if its a single or two tailed test.● When alternate hypothesis doesnt indicate direction then its

a two tailed test. ● If Ha indicates direction it is one tailed test

Page 5: class4

5

Rejection Region for a Test of Significance in 2-tailed test

For a 5% significance level a 2-sided test with df=8, the critical t value is 2.306 on both tails (look under 0.05 of the second row in t table). Values beyond it falls in rejection region. If estimated t is greater than 2.306, Ho is rejected. If estimated t is less than 2.306, Ho is accepted

f(x)

95% non-rejection region

2.5% rejection region

2.5%rejection region

Page 6: class4

6

The Rejection Region for a 1-Sided Test (Upper Tail)For a 5% significance level a 1-sided test with df=8, the critical t value is 1.860 on upper tail (look under 0.05 of the first row in t table). Values beyond it falls in rejection region. If estimated t is

greater than 1.86, Ho is rejected. If estimated t is less than 1.86, Ho is accepted

f(x)

95% non-rejection region 5% rejection region

Page 7: class4

7

The Rejection Region for a 1-Sided Test (Lower Tail)For a 5% significance level a 1-sided test with df=8, the critical t value is 1.860 on lower tail (look under 0.05 of the first row in t table). Values beyond it falls in rejection region. If estimated t is greater than 1.86, Ho is rejected. If estimated t is less than 1.86,

Ho is acceptedf(x)

95% non-rejection region

5% rejection region

Page 8: class4

8

“2-t” Thumb rule for rejection of hypothesis

If the df is 20 or more and if level of significance is 5%, then the null hypothesis b=0 can be rejected if the estimated t exceeds 2 in absolute value

The decision to accept or reject the null hypothesised is dependent heavily on the chose significance level. Usually it is 1%, 5% or 10%

Setting this arbitrary level can be avoided by finding the exact level of probability of committing Type 1 error at the estimated t levels. “p-value” indicates the lowest significance level at which Ho can be rejected. P-value is given in Stata results. A p value lower than 0.00 means Ho can rejected at lower than 1% significance level.

Page 9: class4

9

● Inference in a two-variable model is based on:

– R2 and t test● Prediction can be made based on the estimated model if

inferences indicate it is good by plugging in values of X to predict value of Y at that level of X

– Mean Prediction: interval for prediction of E(Y/X)

Y^-tc*SE(Y^) and Y^+tc*SE(Y^)

where

-Individual Prediction: interval prediction of Y is same as above except SE estimation

2

1)(

22

2

2

02

0

nu

x

XX

nYV a r

i

i

2

2

02

00

11)(

ix

XX

nYYV a r

Page 10: class4

10

Test of normality● Histogram● Jarque Bera test

– This test examines both the skewness and kurtosis of a distribution to test for normality.

– Where S is the skewness and K is the kurtosis of the residuals.

– JB has a 2 distribution with 2 df.– Null hypothesis is that distribution is normally distributed– If JB is estimated close to zero then null hypothesis is

accepted.– Decsion made based on 2 distribution with 2 df.

2 4

)3(

6

22 KSnJB

Page 11: class4

11

Extensions of 2 variable model

● Model so far has a constant and slope● Situations when constant should be avoided● Estimation of slope of a constant-less model :

22

i

ii

X

YX

22

22

2

2

2

2

2

)(

1

)(

ii

ii

i

i

YX

YXrRaw

n

u

XV a r

Page 12: class4

12

Linear transformations do not affect the shape ofthe distribution (e.g. Celsius into Fahrenheit; milesinto kilometres; pounds into kilograms)

The linear transformation, e.g.

ii bXaX *

Produces observations X1*, X2

*, X3*,…, Xn

*

In the context of regression analysis this means thatthe slope and intercept coefficients will change butthe coefficient of determination, standard errors andt-ratios will remain the same

Data Transformation

Page 13: class4

Scaling and Units of Measurement• Value of coefficient depends on the unit of

measurement of X and Y– If both X and Y are in billions, slope indicates for 1 billion

change in X, Y changes of 0.8 billions– If X is in billion and Y in millions, same slope will be 800.

• If X and Y are scaled, coefficients can be adjusted to derive the unscaled values

1

1

2

*

11

*

22

w

w

w

Page 14: class4

Standardized variables

• In a multiple regression model, its not possible to compare the slope coefficients unless all the X and Y variable are standardised

• If standardised, the highest value slope variable has highest impact on Y for 1 standard deviation movement in X variable

• You can derive the unstandardised coeff from the standardised coeff

X

ii

Y

ii S

XXXsimilarly

S

YYY

** '

x

y

S

S*22

Page 15: class4

15

Modelling nonlinear relationshipsTo change the shape of a distribution we use non-linear transformations. Eg, a power, square root or logarithm

The linear aspect means that the same amount of increase in income will have the same effect on consumpn at both low and high income level. A nonlinear change would mean that as income increases, its impact upon consumption might increase at lower income levels.•MPC of rich is lower than the poorIn the context of regression analysis the logarithmictransformation is commonly used:• Functional forms:

● Log linear model● Semi log models (log-lin / lin-log)● Reciprocal model● Logrithmic reciprocal

Page 16: class4

16

Nonlinear Relations

Xi

Yi

..

.

..

......

....

..

... ...

..... ..

.. .

..

. ..

..

.

Page 17: class4

17

Economic Relationship as an exponential regression

iUiii eXXY 32

321

• Nonlinear in terms of parameters - can’t use OLS•Therefore, need to apply a suitable transformation

•Log-linear (log-log) model because both dependent variable and independent variables are in logarithmic form•Coefficients measure the relative change in Y for a relative change in X (partial effects). Therefore, can be interpreted as elasticities• Often applied to production functions (following Cobb and Douglas, 1928)

iiii UXXY 33221 lnlnlnln

Log linear (Double log) form

Page 18: class4

18

Semi-Log models● Log-lin model: Y varible is in log form while X is not

– slope=Relative change in Y/absolute change in X– Slope in this model is the instantaneous growth rate– Estimating Compound growth rate: Take antilog of slope,

subtract 1 from it and multiply by 100.● Lin-log model: X variable is in log form while Y is not

– slope=Absolute change in Y/Relative change in X – Slope has to be divided by 100 to estimate change in Y– For 1% change in X, the Y changes by “slope/100” units

Remember r2 of linear model cannot be compared with these models

Page 19: class4

19

Reciprocal models

● If sign of coefficient is positive it indicates a negative relationship between X and Y

● As value of X increases infinitely, Y approaches the limiting value of constant (b1) bcos the term b2*(1/X) approaches 0.

● Eg: Mortality rate reduces as GDP grows. However deaths occur even in the richest country. Therefore in a reciprocal model as GDP tends to infinity, mortality rate will approach the value of the constant.

i

i

i uX

Y )1

(21

Page 20: class4

Model Equation Slope* Elasticity **

Linear Y=a+b X b b(X/Y) Log linear Ln Y= a+ b ln X b(Y/X) b Log-lin Ln Y= a+b X b(Y) b(X) Lin-log Y= a+ b ln X b(1/X) b(1/Y) Reciprocal Y= a+b(1/X) -b(1/X2) -b(1/XY)

dX

dY*

Y

X

dX

dY**