43
Exam 2 Review & 9.3 - 9.4 Review Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c Department of Mathematics University of Houston Lecture 12 Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston ) Review Lecture 12 1 / 43

Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Exam 2 Review & 9.3 - 9.4Review

Cathy Poliak, [email protected] in Fleming 11c

Department of MathematicsUniversity of Houston

Lecture 12

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 1 / 43

Page 2: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Outline

1 Review of Topics

2 Review Questions

3 Estimating Regression Parameters

4 Inferences for β1

5 F-test

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 2 / 43

Page 3: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Review Questions

We wish to test the hypotheses H0 : µ = 13.5 versus Ha : µ > 13.5 atα = 0.02 significance level. From a random sample of 40 the samplemean is 15.9. Assume that the population standard deviation is 2.7.

1. Which test would be appropriate to use?a) z-test for proportions b) z-tests for means

c) t-test for means d) t-tests for proportions

2. Calculate the test statistic.a) 2.4 b) -5.62 c) 5.62 d) 0.89 e) 2.0537

3. Determine the p-value, approximately.a) 0 b) 0.02 c) 1.434 d) 4.68

4. What is our decision of the test?a) Reject H0 b) Fail to reject H0 c) Accept H0

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 3 / 43

Page 4: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Review Questions

For each of the following scenarios, determine if it is a) paired t-test orb) two sample t-test

5. The weight of 14 patients before and after open-heart surgery.

6. The smoking rates of 14 men measured before and after a stroke.

7. The number of cigarettes smoked per day by 14 men who havehad strokes compared with the number smoked by 14 men whohave not had strokes.

8. The photosynthetic rates of 10 randomly chosen Douglas-fir treescompared with 10 randomly chosen western red cedar trees.

9. The photosynthetic rate measured on 10 randomly chosen Sitkaspruce trees compared with the rate measured on the western redcedar growing next to each of the Sitka spruce trees.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 4 / 43

Page 5: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Dates of Exam and Covered Chapters

Dates: July 12 & 13

Chapters: 5, 6.5, 7, 8, & 10

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 5 / 43

Page 6: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Chapter 5 Continuous Random Variables

Density functions

Know how to calculate expected values and quantiles forcontinuous distributions.

Named distributionsI UniformI ExponentialI GammaI Normal

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 6 / 43

Page 7: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Chapter 6 Sampling Distributions

Expected values and variances of X + Y

Applying the Central Limit Theorem

The sampling distribution of the sample means, X .

The sampling distribution of the proportions, p.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 7 / 43

Page 8: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Confidence Intervals

Know how to calculate the confidence intervals for mean (µ),proportions (p) and standard deviation (σ).

For mean if σ, population standard deviation is given: x ± z∗(σ√n

).

For mean if σ is not given: x ± t∗df

(s√n

).

For proportions p = Xn : p ± z∗

√p(1−p)

n .

For standard deviation: lcl =√

(n−1)s2

qchisq(1−α/2,n−1) and

ucl =√

(n−1)s2

qchisq(α/2,n−1)

For paired - t: xD ± t∗df

(sD√

n

)For two-sample t for means: (x1 − x2)± t∗ν

√s2

1n2

+ s2s

n2

For two-sample for proportions: (p1 − p2)± z∗√

p1(1−p1)n1

+ p2(1−p2)n2

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 8 / 43

Page 9: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Degrees of Freedom for Two-Sample T

df = ν =

(s2

1n1

+s2

2n2

)2

1n1−1

(s2

1n1

)2+ 1

n2−1

(s2

2n2

)2

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 9 / 43

Page 10: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Confidence Intervals

Know how the confidence interval changes as the confidence levelC changes and as the sample size changes.

Know how to interpret confidence intervals.

Know how to determine a sample size given confidence level andmargin of error.

I For means n >(

zα/2×σE

)2, where E = margin of error.

I For proportions n > p∗(1− p∗)(zα/2

E )2, where p∗ is some previousknowledge of the proportion if not known we use 0.5.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 10 / 43

Page 11: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Hypothesis Tests

Know how to set up null and alternative hypotheses.Be able to determine a rejection region, given the level ofsignificance (α).Calculate a test statistic.

I For one - sampleH0 : µ = µ0 H0 : µ = µ0 H0 : p = p0if σ is given if σ is unknown

Test Statistic z = x−µ0σ/√

n t = x−µ0s√

n z = p−p0√p0(1−p0)

n

I For two-sampleH0 : µ1 = µ2 H0 : p1 = p2 H0 : µD = 0

Test Statistic t = x1−x2√s21

n1+

s22

n2

z = p1−p2√p1(1−p1)

n1+

p2(1−p2)

n2

t = xD−µdsD/√

n

Be able to calculate the p-value and know to reject (RH0) or fail toreject (FRH0).Know the difference between Type I and Type II errors.Be able to make a conclusion in context of the problem.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 11 / 43

Page 12: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Go through the practice test in the "Online Assignments"

Go through the review problems:https://www.math.uh.edu/~cathy/Math3339/Tests/Test%202/test2_review_sp19.pdf

Any questions about the material post on the discussion board.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 12 / 43

Page 13: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Example, Continuous Random Variable

Suppose a continuous random variable X has the following pdf:

f (x) =

{1ax 0 ≤ x ≤ 20 otherwise

1. Find the value of a that makes this function a valid pdf.

2. Determine the expected value of X.

3. Find P(X ≤ 1)

4. Determine the CDF.

5. Find the value of c such that P(X ≤ c) = 0.04.

6. Determine Q3, the third quartile.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 13 / 43

Page 14: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 14 / 43

Page 15: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Example, Another Continuous Distribution

Suppose a continuous random variable X has the following pdf:

f (x) =

{13e−

13 x x ≥ 0

0 otherwise

1. Determine P(X ≤ 3)

2. Determine E(X)3. Determine SD(X)

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 15 / 43

Page 16: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 16 / 43

Page 17: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Cumulative Functions

Let X be the amount of time (in hours) the wait is to get a table atrestaurant. Suppose the cdf is represented by

F (x) =

0 x < 0x2

9 0 ≤ x ≤ 31 x > 3

1. Determine P(1.5 ≤ x ≤ 4).2. Determine µ = E(X ).3. Determine σ, that is the standard deviation.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 17 / 43

Page 18: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 18 / 43

Page 19: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Example of Confidence Interval

The average height of students at UH from an SRS of 17 studentsgave a standard deviation of 2.9 feet. Construct a 95% confidenceinterval for the standard deviation of the height of students at UH.Assume normality for the data.a) (1.160, 9.413)b) (6.160, 11.413)c) (4.160, 9.413)d) (1.660, 8.413)e) (2.160, 4.413)f) None of the above

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 19 / 43

Page 20: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Which Test?

Identify the most appropriate test to use for the following situations.1. A national computer retailer believes that the average sales are

greater for salespersons with a college degree. A random sampleof 14 salespersons with a degree had an average weekly sale of$3542 last year, while 17 salespersons without a college degreeaveraged $3301 in weekly sales. The standard deviations were$468 and $642 respectively. Is there evidence to support theretailer’s belief?

2. Quart cartons of milk should contain at least 32 ounces. A sampleof 22 cartons was taken and amount of milk in ounces wasrecorded. We would like to determine if there is sufficient evidenceexist to conclude the mean amount of milk in cartons is less than32 ounces?

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 20 / 43

Page 21: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Which Test?

Identify the most appropriate test to use for the following situations.1. In a experiment on relaxation techniques, subject’s brain signals

were measured before and after the relaxation exercises. We wishto determine if the relaxation exercise slowed the brain waves.

2. A private and a public university are located in the same city. Forthe private university, 1046 alumni were surveyed and 653 saidthat they attended at least one class reunion. For the publicuniversity, 791 out of 1327 sampled alumni claimed they haveattended at least one class reunion. Is the difference in thesample proportions statistically significant?

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 21 / 43

Page 22: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Least-Squares regression

The least-squares regression line (LSRL) of Y on X is the linethat makes the sum of the squares of the vertical distances of thedata points from the line as small as possible.The linear regression model is: Y = β0 + β1x + ε

I Y is dependent variable (response).I x is the independent variable (explanatory).I β0 is the population intercept of the line.I β1 is the population slope of the line.I ε is the error term which is assumed to have mean value 0. This is

a random variable that incorporates all variation in the dependentvariable due to factors other than x .

I The variability: σ of the response y about this line. More precisely,σ is the standard deviation of the deviations of the errors, εi in theregression model.

We will gather information from a sample so we will have the leastsquares estimates model: Y = β0 + β1x .

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 22 / 43

Page 23: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Principle of Least Squares

The vertical deviation of the point (xi , yi) from the line y = b0 + b1x is

hieght of point − height of line = yi − (b0 + b1xi)

The sum of the square vertical deviations from the points(x1, y1), (x2, y2), . . . , (xn, yn) to the line is then

f (b0,b1) =n∑

i=1

[yi − (b0 + b1xi)]2

The point estimates of β0 and β1, denoted by β0 and β1 and called theleast squares estimates, are those values that minimize f (b0,b1).

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 23 / 43

Page 24: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Estimating the Regression Parameters

In the simple linear regression setting, we use the slope b1 andintercept b0 of the least-squares regression line to estimate theslope β1 and intercept β0 of the population regression line.The standard deviation, σ, in the model is estimated by theregression standard error

s =

√∑(yi − yi)2

n − 2=

√∑all residuals2

n − 2

Recall that yi is the observed value from the data set and yi is thepredicted value from the equation.In R s is the called the Residual Standard Error in the lastparagraph of the summary.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 24 / 43

Page 25: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

The Least - Squares Estimates

Recall ei = observed Y - predicted Y is the i th residual. Think of itas an estimate of the unobservable true random error εi .

The method of least squares selects estimators β0 and β1 thatminimizes the residual sum of squares:

SS(resid) = SSE =n∑

i=1

e2i =

n∑i=1

(Yi − Yi

)Where the estimate of the slope coefficient β1 is:

β1 =

∑(xi − x)(yi − y)∑

(xi − x)2 =Sxy

Sxx

The estimate for the intercept β0 is:

β0 = y − β1x

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 25 / 43

Page 26: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Example

The marketing manager of a supermarket chain would like to use shelfspace to predict the sales of coffee. A random sample of 12 stores isselected, with the following results.

Store Shelf Space (ft) Weekly Sales (# sold)1 5 1602 5 2203 5 1404 10 1905 10 2406 10 2607 15 2308 15 2709 15 280

10 20 26011 20 29012 20 310

The correlation coefficient is r = 0.827.Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 26 / 43

Page 27: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Least-Squares

5 10 15 20

150

200

250

300

Shelf Space(feet)

Num

ber

Sol

d

> plot(shelf$space,shelf$sold)> shelf.lm=lm(shelf$sold~shelf$space)> abline(shelf.lm)

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 27 / 43

Page 28: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Finding Equation for Coffee Sales

Given these values determine the least-square regression line (LSRL)equation for predicting number sold based on shelf space.

Explanatory variable: Shelf space (X ) x = 12.5 feet, sx = 5.83874feet.Response variable: Sales (Y ) y = 237.5 units sold, sy = 52.2451units sold.Correlation: r = 0.827

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 28 / 43

Page 29: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

R code

> shelf.lm=lm(sold~space)> summary(shelf.lm)Call:lm(formula = sold ~ space)

Residuals:Min 1Q Median 3Q Max-42.00 -26.75 5.50 21.75 41.00

Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 145.000 21.783 6.657 5.66e-05 ***space 7.400 1.591 4.652 0.000906 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 30.81 on 10 degrees of freedomMultiple R-squared: 0.6839,Adjusted R-squared: 0.6523F-statistic: 21.64 on 1 and 10 DF, p-value: 0.0009057

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 29 / 43

Page 30: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Determining if the Model is Good

For the sample we can use R2 and the residuals to determine ifthe equation is a good way of predicting the response variable.

Another way to determine if this equation is a good way ofpredicting the response variable is to determine if the explanatoryvariable is needed (significant) in the equation.

These tests of significance and confidence intervals in regressionanalysis are based on assumptions about the error term ε.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 30 / 43

Page 31: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Assumptions about the error term ε

1. The error term ε is a random variable with a mean or expectedvalue of zero, that is E(ε) = 0, an estimate for ε is the residuals foreach value of the X-variable.

residual = observed y− predicted y

2. The variance of ε, denoted by σ2, is the same for all values of x .The estimate for σ2 is s2 = MSE = SSE

n−2 =∑

(yi−yi )2

n−2 .

3. The values of ε are independent.

4. The error term ε is a normally distributed random variable.

5. The residual plots help us assess the fit of a regression line anddetermine if the assumptions are met.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 31 / 43

Page 32: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Definitions of Regression Output

1. The error sum of squares, denoted by SSE is

SSE =∑

(yi − yi)2

2. A quantitative measure of the total amount of variation in observedvalues is given by the total sum of squares, denoted by SST .

SST =∑

(yi − y)2

3. The regression sum of squares, denoted SSR is the amount oftotal variation that is explained by the model

SSR =∑

(yi − y)2

4. The coefficient of determination, r2 is given by

r2 =SSRSST

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 32 / 43

Page 33: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Finding these values using R

> anova(shelf.lm)Analysis of Variance Table

Response: soldDf Sum Sq Mean Sq F value Pr(>F)space 1 20535 20535 21.639 0.0009057 ***Residuals 10 9490 949---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 33 / 43

Page 34: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Conditions for regression inference

The sample is an SRS from the population.

There is a linear relationship in the population.

The standard deviation of the responses about the population lineis the same for all values of the explanatory variable.

The response varies Normally about the population regressionline.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 34 / 43

Page 35: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

t Test for Significance of β1

HypothesisH0 : β1 = β10 versus Ha : β1 6= β10

Usually β10 = 0Test statistic

t =observed− hypothesized

standard deviation of observedobserved = b1

hypothesized = 0

standard error = SEb1 =s√∑

(xi − x)2

With degrees of freedom df = n − 2.P-value: based on a t distribution with n − 2 degrees of freedom.Decision: Reject H0 if p-value ≤ α.Conclusion: If H0 is rejected we conclude that the explanatoryvariable x can be used to predict the response variable y .

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 35 / 43

Page 36: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Testing β1

1. We want to test: H0 : β1 = 0 versus Ha : β1 6= 0 for the coffeesales.

2. Test statistic: t = (7.4−0)1.591 = 4.652

3. P-value: 2 ∗ P(T > 4.652) = 2 ∗ 0.00425 = 0.00906

4. Decision: Reject the Null hypothesis

5. Conclusion: β1 is significantly not zero, thus shelf space can beused to predict the number of units sold.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 36 / 43

Page 37: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

R code

> shelf.lm=lm(sold~space)> summary(shelf.lm)Call:lm(formula = sold ~ space)

Residuals:Min 1Q Median 3Q Max-42.00 -26.75 5.50 21.75 41.00

Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 145.000 21.783 6.657 5.66e-05 ***space 7.400 1.591 4.652 0.000906 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 30.81 on 10 degrees of freedomMultiple R-squared: 0.6839,Adjusted R-squared: 0.6523F-statistic: 21.64 on 1 and 10 DF, p-value: 0.0009057

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 37 / 43

Page 38: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Height

Because elderly people may have difficulty standing to have theirheights measured, a study looked at predicting overall height fromheight to the knee. Here are data (in centimeters, cm) for five elderlymen:

Knee Height (cm) 57.7 47.4 43.5 44.8 55.2Overall Height(cm) 192.1 153.3 146.4 162.7 169.1

1. We want to test if the relationship of the measurement of theheight is more than double that of the measurement from the floorto the knee. Test H0 : β1 = 2 versus Ha : β1 > 2.

2. Give a conclusion for the relationship between using knee lengthto predict overall height.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 38 / 43

Page 39: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 39 / 43

Page 40: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Confidence Intervals for β1

If we want to know a range of possible values for the slope we can usea confidence interval.

Remember confidence intervals are

estimate± t∗ × standard error of the estimate

Confidence interval for β1 is

b1 ± tα/2,n−2 × SEb1

Where t∗ is from table D with degrees of freedom n − 2 where n =number of observations.In R we can get this by confint(name.lm,level = 0.95).> confint(shelf.lm)2.5 % 97.5 %(Intercept) 96.464405 193.53560space 3.855461 10.94454

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 40 / 43

Page 41: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

Determine a 90% Confidence Interval for CoffeeExample

From the output in R:

Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 145.000 21.783 6.657 5.66e-05 ***space 7.400 1.591 4.652 0.000906 ***

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 41 / 43

Page 42: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

F-distribution

The F distribution with ν1 degrees of freedom in the numeratorand ν2 degrees of freedom in the denominator is the distribution ofa random variable

F =U/ν1

V/ν2,

where U ∼ χ2(df = ν1) and V ∼ χ2(df = ν2) are independent.That F has this distribution is indicated by F ∼ F (ν1, ν2).Notice U = SSR

σ2 ∼ χ2(df = 1) and V = SSEσ2 ∼ χ2(df = n − 2) are

independent.Let MSE = SSE/(n − 2) and MSR = SSR/1. Then

F =MSRMSE

=SSR/1

SSE/(n − 2)∼ F (1,n − 2)

Then we can use the F-distribution to test the hypothesisH0 : β1 = 0 versus Ha : β1 6= 0.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 42 / 43

Page 43: Exam 2 Review & 9.3 - 9.4 - Reviewcathy/Math3339/Lecture/lect12_3339_sum19.pdf · Outline 1 Review of Topics 2 Review Questions 3 Estimating Regression Parameters 4 Inferences for

F-test for Shelf Space

Analysis of Variance Table

Response: soldDf Sum Sq Mean Sq F value Pr(>F)space 1 20535 20535 21.639 0.0009057 ***Residuals 10 9490 949---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note: F = t2 and the p-value is the same.

Cathy Poliak, Ph.D. [email protected] Office in Fleming 11c (Department of Mathematics University of Houston )Review Lecture 12 43 / 43