View
0
Download
0
Category
Preview:
Citation preview
STATISTICS 200 Lecture #5 Tuesday, September 6, 2016 Textbook: Sections 2.7 through 3.2
• Define z-scores and relate them to the empirical (68-95-99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines:
– slope interpretation – y-intercept interpretation – dangerous to extrapolate
Objectives:
Standardized z-scores
• Tells us how many standard deviations an observation is from the mean.
• A useful measure of the relative value of any observation in a dataset
• Allows comparison of observations in different data sets.
Standardized z-scores
• About 68% of values have z-scores
between __ and __. • About 95% of values have z-scores
between __ and __. • About 99.7% of values have z-scores
between __ and __.
• Z-scores correspond directly to the Empirical Rule.
–1 1
–2 2
–3 3
Example 1 n What is the z-score and interpretation in
the following situation? n Obs = 3, mean = 4, SD = 0.5
Interpretation: The observation of 3 is 2 standard deviations below the mean.
Z-score = (observation – mean)/SD
= (3 – 4) / 0.5
= –1 / 0.5
= –2
Example 2 n What is the z-score and interpretation in
the following situation? n Obs = 200, mean=150, SD = 20
Interpretation: The observation 200 is 2.5 standard deviations above the mean.
Z-score = (observation – mean)/SD
= (200-150)/20
= 50/20
= 2.5
More complicated example: which person has a more unusual height?
Me: a 53” tall woman
• Women’s heights are normal with mean 54” and std. dev. 3”.
My husband: a 73” tall man
• Men’s heights are normal with mean 70” and std. dev. 3”
These heights come from different distributions, so we cannot compare them directly. We need a tool to make them comparable…
Z-score!
Calculate Z-scores for both: • Me: Z-score = (obs – mean)/(std. dev)
• = (53 – 54) / (3) • = -1/3 • = -0.33
• Husband: Z-score = (obs – mean) / (std. dev) = (73 – 70) / 3
= 3 / 3 = 1
Compare Z-scores – draw them below
Me Husband
Compare Z-scores
Me: ____ std. dev. _____ the mean Husband: ____ std. dev. _____ the mean
.33 below
1 above
Conclusion:
My husband’s height is more unusual than mine, because it is more std. dev. from the mean.
So far…
n We have talked about quantitative variables, but only one at a time.
n Now we’re going to begin looking at the
relationships between two different quantitative variables.
n Start with looking at a Scatterplot
Scatterplots:
n A scatterplot is a two-dimensional graph of two numeric variables.
n There are two axes on a scatterplot, the vertical axis (y-axis) and the horizontal axis (x-axis). n The y-axis is assigned to the response variable n The x-axis is assigned to the explanatory variable.
Example 1: Apartment size and rent
Two Variables: • size of one-bed-room apartment (square feet)
• monthly rent ($)
Size (Square Ft) Rent ($) 415 438 485 636 548 666 646 545 690 688 538 469
1000 833 1003 1089 1150 1181 1237 1225 1469 1501 1177 958
What is the average pattern?
What is the direction of the pattern?
A positive, linear association
Explanatory / independent / x variable Response / dependent / y variable
n Linear relationship n a relationship that, on average, will follow a line
n Curvilinear or nonlinear relationship n a relationship that, on average, will follow a curve
Linear versus curvilinear
Association : a term used to describe direction of the pattern shown by the two variables.
n A positive association occurs when the values of one variable tend to _________as the values of the other variable increase.
n A negative association occurs when the values of one variable tend to _________ as the values of the other variable increase.
increase
decrease
Outliers
n When we consider two variables, an outlier is a point with
an _________________ of values.
n May be unusual and interesting data points, or may be errors.
unusual combination
17
Example – Tornado Activity
Variables: • year • number of tornadoes (Jan – May)
Source: National Weather Service
Unusually high observations that don’t follow trend of other observation
Formalize the trend: Regression lines
8580757065605550
300
250
200
150
100
Height
Wei
ght
S 24.3673R-Sq 43.0%R-Sq(adj) 43.0%
Fitted Line PlotWeight = - 195.9 + 5.175 Height
n Regression line: a straight line that describes how values of the response variables (y) are related, on average, to values of the explanatory variable (x).
n We can use the regression line to… n Estimate average value of y at a specified value of x n Predict the unknown value of y for an individual using
that individual’s x value.
19
Specify Linear Relationships with Simple Linear Regression Model
Regression: • used to find the best straight line to fit the data points
Name of Procedure: ___________ Squares
Least Square Model: • smallest ________ of the __________ differences found with all possible lines
Least
sum squared
The regression equation
y-intercept slope
average value of y
xbby 10ˆ +=In statistics
In math
In a picture:
xbby 10ˆ +=
22 Example : Positive Linear Relationship between meal bill ($) and amount of tip ($)
r = 0.830 & n = 10 bills
data from a restaurant
23
Example: Tip example
Question: Use the amount of bill ($) to estimate the amount of tip left ($), on the average?
Identify the Variables: • Bill ($): response explanatory
• Tip ($): response explanatory
• Note: explanatory variable is also called the predictor variable
To fit a regression line in Minitab: Stat > Regression > Fitted Line Plot
24
correctly identify explanatory variable and response variable
straight line: simple linear regression
25 Least Squares Regression Equation
sample y-intercept (bo)
sample slope (b1)
The regression equation is Tip = -0.60 + 0.190 Bill
Slope Interpretation b1 = $0.19
• For each additional ___ $ found on the bill, you can expect the tip to ____________ by ___ cents, on the average
tip
Tip = -0.60 + 0.19 Bill
tip
bill
1 increase 19
Y-intercept Interpretation
bo = -$0.60 In theory it says: When you have no bill, you can expect a tip to be ________
• So does the y-intercept have a logical interpretation in the context of this problem?
Tip = -0.60 + 0.19 Bill
-$0.60
No: we have no data for bill = 0
28
Estimation & Limitations Question: If the bill is $30, estimate the average amount left for a tip?
Tip = -0.60 + 0.19 Bill
Can: _______________ within the range of $15 to $45
Tip = -0.60 + 0.19×(_____)
Tip = ______
Note: Bill = $30 is not an actual observation in the sample
30 $5.1
Estimate
29
Question: If the bill is $70, estimate the average amount left for a tip.
x = $_____
Tip = -0.60 + 0.19 × Bill
Can’t: _______________ outside the range of $15 to $45
70
Extrapolate
Example 5B: Estimation & Limitations
30
To remember about regression equations:
Y-intercept: logical interpretation: • restricted to data where ____ is in the range of data in the
sample No Extrapolation: • don’t use a regression equation to estimate a value for the
response variable ___________ the range of x values Estimation: • regression equation estimates the __________ value for y at
a given value of x.
0
outside
average
Review: If you understood today’s lecture, you should be able to solve
• 3.1, 3.3, 3.5, 3.13, 3.15, 3.19, 3.21
Recall Objectives: • Define z-scores and relate them to the empirical (68-95-99.7) rule • Explore scatterplots as a tool for visualizing two quantitative variables • Familiarize yourselves with least squares regression lines:
– slope interpretation – y-intercept interpretation – dangerous to extrapolate
Recommended