STATISTICS 200personal.psu.edu/drh20/200/lectures/lecture05.pdfFormalize the trend: Regression lines...

Preview:

Citation preview

STATISTICS 200 Lecture #5 Tuesday, September 6, 2016 Textbook: Sections 2.7 through 3.2

• Define z-scores and relate them to the empirical (68-95-99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines:

– slope interpretation – y-intercept interpretation – dangerous to extrapolate

Objectives:

Standardized z-scores

• Tells us how many standard deviations an observation is from the mean.

• A useful measure of the relative value of any observation in a dataset

• Allows comparison of observations in different data sets.

Standardized z-scores

• About 68% of values have z-scores

between __ and __. • About 95% of values have z-scores

between __ and __. • About 99.7% of values have z-scores

between __ and __.

•  Z-scores correspond directly to the Empirical Rule.

–1 1

–2 2

–3 3

Example 1 n What is the z-score and interpretation in

the following situation? n  Obs = 3, mean = 4, SD = 0.5

Interpretation: The observation of 3 is 2 standard deviations below the mean.

Z-score = (observation – mean)/SD

= (3 – 4) / 0.5

= –1 / 0.5

= –2

Example 2 n What is the z-score and interpretation in

the following situation? n  Obs = 200, mean=150, SD = 20

Interpretation: The observation 200 is 2.5 standard deviations above the mean.

Z-score = (observation – mean)/SD

= (200-150)/20

= 50/20

= 2.5

More complicated example: which person has a more unusual height?

Me: a 53” tall woman

• Women’s heights are normal with mean 54” and std. dev. 3”.

My husband: a 73” tall man

• Men’s heights are normal with mean 70” and std. dev. 3”

These heights come from different distributions, so we cannot compare them directly. We need a tool to make them comparable…

Z-score!

Calculate Z-scores for both: • Me: Z-score = (obs – mean)/(std. dev)

•  = (53 – 54) / (3) •  = -1/3 •  = -0.33

•  Husband: Z-score = (obs – mean) / (std. dev) = (73 – 70) / 3

= 3 / 3 = 1

Compare Z-scores – draw them below

Me Husband

Compare Z-scores

Me: ____ std. dev. _____ the mean Husband: ____ std. dev. _____ the mean

.33 below

1 above

Conclusion:

My husband’s height is more unusual than mine, because it is more std. dev. from the mean.

So far…

n We have talked about quantitative variables, but only one at a time.

n Now we’re going to begin looking at the

relationships between two different quantitative variables.

n Start with looking at a Scatterplot

Scatterplots:

n A scatterplot is a two-dimensional graph of two numeric variables.

n There are two axes on a scatterplot, the vertical axis (y-axis) and the horizontal axis (x-axis). n  The y-axis is assigned to the response variable n  The x-axis is assigned to the explanatory variable.

Example 1: Apartment size and rent

Two Variables: • size of one-bed-room apartment (square feet)

• monthly rent ($)

Size (Square Ft) Rent ($) 415 438 485 636 548 666 646 545 690 688 538 469

1000 833 1003 1089 1150 1181 1237 1225 1469 1501 1177 958

What is the average pattern?

What is the direction of the pattern?

A positive, linear association

Explanatory / independent / x variable Response / dependent / y variable

n Linear relationship n a relationship that, on average, will follow a line

n Curvilinear or nonlinear relationship n a relationship that, on average, will follow a curve

Linear versus curvilinear

Association : a term used to describe direction of the pattern shown by the two variables.

n  A positive association occurs when the values of one variable tend to _________as the values of the other variable increase.

n  A negative association occurs when the values of one variable tend to _________ as the values of the other variable increase.

increase

decrease

Outliers

n  When we consider two variables, an outlier is a point with

an _________________ of values.

n  May be unusual and interesting data points, or may be errors.

unusual combination

17

Example – Tornado Activity

Variables: • year • number of tornadoes (Jan – May)

Source: National Weather Service

Unusually high observations that don’t follow trend of other observation

Formalize the trend: Regression lines

8580757065605550

300

250

200

150

100

Height

Wei

ght

S 24.3673R-Sq 43.0%R-Sq(adj) 43.0%

Fitted Line PlotWeight = - 195.9 + 5.175 Height

n  Regression line: a straight line that describes how values of the response variables (y) are related, on average, to values of the explanatory variable (x).

n  We can use the regression line to… n  Estimate average value of y at a specified value of x n  Predict the unknown value of y for an individual using

that individual’s x value.

19

Specify Linear Relationships with Simple Linear Regression Model

Regression: •  used to find the best straight line to fit the data points

Name of Procedure: ___________ Squares

Least Square Model: •  smallest ________ of the __________ differences found with all possible lines

Least

sum squared

The regression equation

y-intercept slope

average value of y

xbby 10ˆ +=In statistics

In math

In a picture:

xbby 10ˆ +=

22 Example : Positive Linear Relationship between meal bill ($) and amount of tip ($)

r = 0.830 & n = 10 bills

data from a restaurant

23

Example: Tip example

Question: Use the amount of bill ($) to estimate the amount of tip left ($), on the average?

Identify the Variables: •  Bill ($): response explanatory

•  Tip ($): response explanatory

• Note: explanatory variable is also called the predictor variable

To fit a regression line in Minitab: Stat > Regression > Fitted Line Plot

24

correctly identify explanatory variable and response variable

straight line: simple linear regression

25 Least Squares Regression Equation

sample y-intercept (bo)

sample slope (b1)

The regression equation is Tip = -0.60 + 0.190 Bill

Slope Interpretation b1 = $0.19

• For each additional ___ $ found on the bill, you can expect the tip to ____________ by ___ cents, on the average

tip

Tip = -0.60 + 0.19 Bill

tip

bill

1 increase 19

Y-intercept Interpretation

bo = -$0.60 In theory it says: When you have no bill, you can expect a tip to be ________

• So does the y-intercept have a logical interpretation in the context of this problem?

Tip = -0.60 + 0.19 Bill

-$0.60

No: we have no data for bill = 0

28

Estimation & Limitations Question: If the bill is $30, estimate the average amount left for a tip?

Tip = -0.60 + 0.19 Bill

Can: _______________ within the range of $15 to $45

Tip = -0.60 + 0.19×(_____)

Tip = ______

Note: Bill = $30 is not an actual observation in the sample

30 $5.1

Estimate

29

Question: If the bill is $70, estimate the average amount left for a tip.

x = $_____

Tip = -0.60 + 0.19 × Bill

Can’t: _______________ outside the range of $15 to $45

70

Extrapolate

Example 5B: Estimation & Limitations

30

To remember about regression equations:

Y-intercept: logical interpretation: •  restricted to data where ____ is in the range of data in the

sample No Extrapolation: •  don’t use a regression equation to estimate a value for the

response variable ___________ the range of x values Estimation: •  regression equation estimates the __________ value for y at

a given value of x.

0

outside

average

Review: If you understood today’s lecture, you should be able to solve

• 3.1, 3.3, 3.5, 3.13, 3.15, 3.19, 3.21

Recall Objectives: • Define z-scores and relate them to the empirical (68-95-99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines:

– slope interpretation – y-intercept interpretation – dangerous to extrapolate

Recommended