Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Chapter 1-6
Regression and Correlation
1-6-1 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Linear Regression and Correlation Analysis
Not enough to know what impacts things but need to know how they impact.
Correlation establishes if something impacts, regression establishes how it impacts.
1-6-2 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Topics
1
•Linear Regression
•Correlation
2
•Curvilinear Regression
•Multiple Linear Regression
1-6-3 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Linear Regression
The regression equation is determined mathematically
from data collected on a process.
The regression equation predicts a value for the
dependent variable, y, from the independent variable x.
1-6-4 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Least Squares Regression Model
1-6-5 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Linear Regression
If there is a correlation the equation for that linear relationship can be determined from the data.
In the equation above b0 is the intercept and b1 is the slope.
– The intercept is where the curve crosses the y axis.
– The slope is the change in y divided by the change in x
The values are calculated from the normal equations:
xbby 10
1-6-6 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Normal Equations
• Determine slope (b1) and intercept (b0)
• Developed from data
• Solved simultaneously
2
10
10
xbxbxy
xbnby
1-6-7 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Slope and Intercept Equations
• Determine slope (b1) and intercept (b0)
• Developed from data
n
xb
n
yb
xxn
yxxynb
10
221
1-6-8 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Regression Study
• Collect Data.
• Determine independent and dependent variables.
• Graph the data in a scatter diagram to determine if the data appears to be a straight line. (Not an obvious curve.)
• Proceed to analysis if the data is linear.
• Consider transforming data if not.
• Always be aware of outliers.
1-6-9 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example
It is thought that abrasion loss in microns over time is a function of Rockwell hardness. Eight samples were taken. The data is shown in the table on the next page. Note that abrasion loss is the dependent variable.
1-6-10 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Data for Example
X Y
60 251
62 245
63 246
67 233
70 221
74 202
79 188
81 170
Data Set 1-6-1
1-6-11 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
1-6-12 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
1-6-13 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
1-6-14 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Explanation
• R square is the coefficient of determination. It is the percent of variation that can be determined by the regression equation.
• Adjusted R square is an attempt to compensate for the sample size.
• Standard error is a measure of the variability.
st variableindependen ofnumber theisk
and size sample theisn Where
)1(1)](k-[n
1)-(n 2r
1-6-15 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example
X Y
14 5
15 4
16 3
17 2
18 4
14 5
16 4
17 2
18 4
16 3
14 5
15 3
15 4
16 4
Data Set 1-6-5
1-6-16 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Using Regression Equations
Make sure there is a cause-effect relationship between the dependent and independent variables.
If there is no significant linear correlation don’t use the regression equation to make predictions.
When using the regression equation for predictions stay within the scope of the available sample data.
A regression equation based on old data is not necessarily valid now.
Don’t make predictions about a population that is different from the population from which the sample data were drawn.
1-6-17 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Outliers
• In a scatter plot an outlier is a point lying far away from the other data points.
• When one is noted it should be investigated. If there is an identifiable special cause of variation it may be discarded. (We know why it was different.)
1-6-18 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Confidence
• The regression equation gives the best estimate of the predicted value.
• A confidence interval can be determined for the true value using the equation shown.
• The value of Se is given by Excel®.
22
2
02/
)()(
)(11
Where
EyyE-y
: value truey the and valuepredicted they Calling
xxn
xxn
nstE e
1-6-19 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Interval Estimates for Different Values of x
y
x
Prediction Interval
for an individual y,
given xp
xp x
Confidence
Interval for
the mean of
y, given xp
1-6-20 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Correlation Definition
• The coefficient of correlation, r, measures the strength of the relationship between two variables.
• High correlation indicates a strong relationship.
• High correlation does not indicate a cause-effect relationship.
1-6-21 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Correlation
A correlation exists between two variables when one of them is related to the other in some way.
1-6-22 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Correlation Values
r = +1 means a perfect direct
relationship
r = -1 means a perfect indirect
relationship
r = 0 means no
relationship
1-6-23 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Coefficient of Determination
• Indicates the proportion of the variation of y which is accounted for by x
• Calculation: r2
1-6-24 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
0
2
4
6
8
10
12
14
16
10 20 30 40 50 60 70 80 90
Meal Amount
Tip
Am
ou
nt
Positive Correlation
1-6-25 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Properties of Linear Correlation Coefficient, r
1) The value is always between –1 and 1.
2) The value of r does not change if all values of either variable are converted to a different scale.
3) The value of r is not affected by the choice of x or y.
4) It measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear.
1-6-26 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Linear Correlation Coefficient
The linear correlation coefficient r measures the strength of the linear relationship between the paired x and y values in a sample. It is calculated as shown below.
22
22
)(
)(
))((
yynS
xxnS
yxxynS
yy
xx
xy
))(( SyySxx
Sxyr
The actual calculation is performed using these calculating “devices.”
1-6-27 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Excel ®
1-6-28 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
1-6-29 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example
Determine the coefficient of correlation between x and y in the table on the next page.
1-6-30 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example X Y
14 5
15 4
16 3
17 2
18 4
14 5
16 4
17 2
18 4
16 3
14 5
15 3
15 4
16 4
1-6-31 © 2014 Institute of Industrial Engineers
Data Set 1-6-5
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Calculating r
))(( SyySxx
Sxyr
56.
)86.12)(36.24(
86.9
1-6-32 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Significance of Coefficient of Correlation
In order to answer whether or not the value of r that is calculated is significant, a test of hypothesis must be performed.
0:
0:
1
0
rH
rH)
)1(
)1()(ln
2
3(
r
rnz
1-6-33 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example
• Refer back to the data that gave us an r of -.56 based on 14 pairs of data.
• Can we say, with 95 percent confidence, that the r is significant?
1-6-34 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Solution Methodology
State Hypothesis
Identify Test Statistic
Specify Confidence
Calculate Test Statistic
Identify Table Value
Compare Test and Table Statistics
1-6-35 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Calculating r
• Hypothesis:
• Test Statistic:
0:
0:
1
0
rH
rH
10.2
))56.1(
)56.1()(ln
2
314(
))1(
)1()(ln
2
3(
z
z
r
rnz
1-6-36 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Calculating r
• 95% Table Z Value: + 1.96
• Comparison and Conclusion:
Since the calculated z (2.10) is greater than 1.96, the test rejects H0 and accepts H1. Therefore, r is not equal to zero and correlation is significant.
1-6-37 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example
An analyst observes a kitting operation and collects data on package volume and time required per unit for the operation. Determine if the correlation is significant at the .01 level. The data is on the next page.
1-6-38 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Data for Example
Operation
Time/Unit
Kit
Volume
1.42 21.8
0.75 11.1
0.82 13.5
1.2 19.4
0.64 11.9
1.12 18.1
1.08 15.4
0.49 8.6
1.05 14.4
0.99 11.8
0.58 11
1.25 17.1
Data Set 1-6-11
1-6-39 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
1-6-40 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Curvilinear Regression
• Determines the relationship between one dependent and one independent variables when the relationship is not linear
• Transform data
• Proceed as if linear
• High correlation does not necessarily imply a cause effect relationship
1-6-41 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Typical Curvilinear Models
1-6-42 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Curvilinear Regression Normal Equations
2
210 xbxbby
4
2
3
1
2
0
2
3
2
2
10
2
210
xbxbxbyx
xbxbxbxy
xbxbnby
1-6-43 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example - Curvilinear
X Y
5 26
4 17
3 8
2 5
4 15
5 23
1 1
2 3
4 17
6 58
36 173
Data Set 1-6-3
1-6-44 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Practice Y X
12 7.143752
20 7.320564
32 7.500000
26 7.418645
37 7.558924
40 7.591279
25 7.403654
46 7.650560
12 7.143752
42 7.611786
67 7.818542
51 7.695402
22 7.355601
28 7.447294
10 7.084893
38 7.569935
3 6.745731
20 7.320564
41 7.601632
50 7.686724
Data Set 1-6-7
1-6-45 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Multiple Linear Regression
• Determines the relationship between one dependent and two or more independent variables
• Methodology is similar
• Best done using appropriate statistical software (EXCEL)
1-6-46 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Multiple Linear Regression Normal Equations
2
33322311303
323
2
22211202
313212
2
11101
3322110
xbxxbxxbxbyx
xxbxbxxbxbyx
xxbxxbxbxbyx
xbxbxbnby
3322110 xbxbxbby
1-6-47 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
• Three dimension Y
X1
X2
Graph of a Two-Variable Model
22110 XbXbbY
1-6-48 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example
Downtime/ Month Machine Speed Machine Age
10 420 1.0
20 400 2.0
30 300 2.7
42 250 4.1
9 520 1.2
25 300 2.5
19 300 1.9
41 240 5.0
22 320 2.1
12 375 1.1
11 450 2.0
24 400 4.0
33 300 5.0
52 200 8.0
8 520 2.4
Data
Se
t 1-6
-4
1-6-49 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Improving the Regression
Evaluate the correlation of the independent variables
1-6-50 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Example
Sum of Demand through 11:00amResort Population (Day of)Resort Population (Day before)Room Departures (Check-outs)Room Arrivals (Check-Ins)Rooms Occupied
Sum of Demand through 11:00am 1
Resort Population (Day of) 0.801123054 1
Resort Population (Day before) 0.790690249 0.987313682 1
Room Departures (Check-outs) -0.086433627 0.382976451 0.430612696 1
Room Arrivals (Check-Ins) 0.030083536 0.520779805 0.538351856 0.744181425 1
Rooms Occupied 0.797394506 0.978004138 0.961164075 0.282206191 0.516381616 1
A high correlation between two independent variables indicates colinearity. Using both variables in the multiple regression equation may dilute the overall results of the regression equation. Colinearity is a situation where there is close to a near perfect linear relationship among some or all of the independent variables in a regression model. In practical terms, this means there is some degree of redundancy or overlap among your variables.
1-6-51 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Using all the Variables
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.945470825
R Square 0.893915081
Adjusted R Square 0.868656767
Standard Error 226.3443088
Observations 27
Coefficients Standard Error
Intercept 105.950797 287.9311262
Resort Population (Day of) 0.309206885 0.419606489
Resort Population (Day before) 0.651505697 0.318650952
Room Departures (Check-outs) -1.240704971 0.504451195
Room Arrivals (Check-Ins) -1.601892492 0.850702345
Rooms Occupied -0.805081213 1.080676402
1-6-52 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Dropping one of the Variables Rooms Occupied
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.943986992
R Square 0.89111144
Adjusted R Square 0.871313521
Standard Error 224.0434175
Observations 27
Coefficients Standard Error
Intercept 3.076016032 250.0880236
Resort Population (Day of) 0.105796019 0.315379126
Resort Population (Day before) 0.620476699 0.312705697
Room Departures (Check-outs) -0.977664626 0.356618752
Room Arrivals (Check-Ins) -1.977393216 0.678333632
This improves the adjusted r square and makes for a better fit.
1-6-53 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Multiple Regression Guidelines
Use common sense and practical considerations to include or exclude variables.
Include as few variables as possible.
Use adjusted R2 to guide.
1-6-54 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Practice Problems (Data Set 1-6-6)
• An industrial engineer must use a regression based standard data system of work measurement to estimate the time required to cut various sizes of boards.
• Representative time studies were performed on sample sizes. The results of the studies are shown in the table on the right. The dimensions are in inches. The times are in minutes.
• What is the relationship? • How good is it?
Width Thick Time
6 1 0.064
6 2 0.074
6 3 0.081
6 4 0.093
12 1 0.088
12 2 0.112
12 3 0.093
12 4 0.111
16 1 0.112
16 2 0.130
16 3 0.151
16 4 0.181
20 1 0.120
20 2 0.160
20 3 0.169
20 4 0.216
1-6-55 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Practice Problems
• The State Agricultural Extension Service has hired you, as a Six Sigma Black Belt to help improve a process. They desire to help soybean farmers increase the yield (in bushels per acre) from their fields based on certain easily measurable and controllable factors.
• We know that fertilizer sells for $25 per 100 pounds. Water costs $.18 per gallon. Lime costs $1.75 per 100 pounds. Soybeans sell for $6.85 a bushel.
• What would be the best combination of fertilizer, water, and lime to maximize the profit for the farmer? Don’t limit the analysis to the experimental values.
• Use the data collected on the following page.
1-6-56 © 2014 Institute of Industrial Engineers
6s
IMPROVE Le
an S
ix S
igm
a B
lack
Be
lt
Bushels
per Acre
Pounds
Fertilizer
Gallons
Water
Pounds
Lime
Bushels
per Acre
Pounds
Fertilizer
Gallons
Water
Pounds
Lime
21.0 500 50 200 40.2 900 100 600
20.0 500 50 200 40.7 900 100 600
21.0 500 50 200 38.3 900 100 600
24.0 600 50 300 43.5 1000 100 700
22.4 600 50 300 42.9 1000 100 700
29.4 600 50 300 43.8 1000 100 700
26.1 700 75 400 50.1 1200 125 800
26.0 700 75 400 50.6 1200 125 800
27.4 700 75 400 50.5 1200 125 800
32.1 800 75 500 39.9 1500 125 900
32.2 800 75 500 46.8 1500 125 900
32.4 800 75 500 43.3 1500 125 900
Soybean Data
Data Set 1-6-9
1-6-57 © 2014 Institute of Industrial Engineers