41
IBS Statistics Year 1 Dr. Ning DING

004

Embed Size (px)

DESCRIPTION

Statistics techniques for business management

Citation preview

Page 1: 004

IBS Statistics Year 1Dr. Ning DING

Page 2: 004

Table of content• Review• Interquartile Range• Skewness

• Learning Goals

• Chapter 12: Simple Regression and Correlation

• Exercises

Page 3: 004

Chapter 3: Describing Data

Review

Find the interquartile range: 146014711637172117581787194020382047205420972205228723112406

Interquartile Range=Q3-Q1

=2205-1721=484

Page 4: 004

Correction of EXCEL Exercise 5

L=(8+1)*25%=2.25L=(8+1)*25%=2.25

Q1=133.5Q1=133.5

L=(8+1)*75%=6.75L=(8+1)*75%=6.75

Q3=274.5Q3=274.5

Interquartile Range=274.5-133.5=141

Interquartile Range=274.5-133.5=141

Page 5: 004

BoxplotBoxplotBoxplotBoxplot

12245789

12

12245789

12

Median1224

1224

789

12

789

12

Quartile

Q1=2

Q3=8.5

5InterquartileInterquartile

RangeRange

Decile

1st D

9th D

Percentile

http://cnx.org/content/m11192/latest/

How to interpret?How to interpret?

Page 6: 004

BoxplotBoxplotBoxplotBoxplot

The distribution is skewed to __________ because the mean is __________the median.

the right larger than

http://cnx.org/content/m11192/latest/

€ 20 € 2000Q1= € 250 Q3= € 850Median= € 350

Mean= € 450Mean= € 450a b

Page 7: 004

0.81.01.01.21.21.31.51.72.02.02.12.24.0

0.81.01.01.21.21.31.51.72.02.02.12.24.0 2.0

3.23.63.74.04.24.24.54.54.64.85.05.0

2.03.23.63.74.04.24.24.54.54.64.85.05.0

Mean > MedianMean > Median

Mean < MedianMean < Median

Positively skewedPositively skewed

Negatively skewedNegatively skewedhttp://qudata.com/online/statcalc/

Page 8: 004

This means that the data is symmetrically distributed.

Zero skewness

mode=median=mean

Zero skewness

mode=median=mean

Page 9: 004

Learning Goals• Chapter 12:

– Learn how many business decisions depend on knowing the specific relationship between two or more variables

– Use scatter diagrams to visualize the relationship between two variables

– Use regression analysis to estimate the relationship between two variables

– Use the least-squares estimating equation to predict future values of the dependent variable

– Learn how correlation analysis describes the degree to which two variables are linearly related to each other

– Understand the coefficient of determination as a measure of the strength of the relationship between two variables

– Learn limitations of regression and correlation analyses and caveats about their use.

Page 10: 004

1. IntroductionChapter 12: Sim Reg & Corr

Regression and Correlation Analyses:

– How to determine both the nature and the strength of a relationship between variables.

Page 11: 004

1. IntroductionChapter 12: Sim Reg & Corr

Scatter Diagram:

28

Describing Relationship between Two Variables – Scatter Diagram Examples

Positive correlationPositive correlation

Page 12: 004

1. IntroductionChapter 12: Sim Reg & Corr

Scatter Diagram:

Negative correlationNegative correlation

28

Describing Relationship between Two Variables – Scatter Diagram Examples

Page 13: 004

1. IntroductionChapter 12: Sim Reg & Corr

Scatter Diagram:

No correlationNo correlation

28

Describing Relationship between Two Variables – Scatter Diagram Examples

Page 14: 004

2. Types of RelationshipsChapter 12: Sim Reg & Corr

Variables: – Independent variables: known– Dependent variables: to predict

Independent VariableIndependent Variable

Dependent VariableDependent Variable

28

Describing Relationship between Two Variables – Scatter Diagram Examples

28

Describing Relationship between Two Variables – Scatter Diagram Examples

Page 15: 004

2. Types of RelationshipChapter 12: Sim Reg & Corr

Correlation & Cause Effect?

• The relationships found by regression to be relationships of association

• Not necessarilly of cause and effect.

Independent VariableIndependent Variable

Dependent VariableDependent Variable

28

Describing Relationship between Two Variables – Scatter Diagram Examples

28

Describing Relationship between Two Variables – Scatter Diagram Examples

Page 16: 004

2. Estimation Using the Regression Line

Chapter 12: Sim Reg & Corr

Scatter Diagrams:• Patterns indicating that the variables are related• If related, we can describe the relationship

Strong & Positivecorrelation

Strong & Negativecorrelation

Weak & Positivecorrelation

Weak & Negativecorrelation

Nocorrelation

Page 17: 004

Chapter 12: Sim Reg & Corr

Scatter Diagrams:

2. Estimation Using the Regression Line

Page 18: 004

Chapter 12: Sim Reg & Corr

Simple Linear Regression:• The dependent variable Y is determined by the independent variable

X

2. Estimation Using the Regression Line

Ŷ = a + bXŶ = a + bX

YX

Independent VariableIndependent Variable

Dependent VariableDependent Variable

Page 19: 004

Chapter 12: Sim Reg & Corr

Simple Linear Regression:• The dependent variable Y is determined by the independent variable

X

2. Estimation Using the Regression Line

Ŷ = a + bXŶ = a + bX

Page 20: 004

Chapter 12: Sim Reg & Corr

Slope of the Best-Fitting Regression Line:

2. Estimation Using the Regression Line

xn-x

y xn-xy=b

22

Y = a + bX a = Y - bX

Page 21: 004

Chapter 12: Sim Reg & Corr

2. Estimation Using the Regression Line

75.09*444

6*3*478

-

-=b

Y = a + bX a = Y - bX

the relationship between the age of a truck and the annual repair expense?

X=3 Y=6

xn-x

y xn-xy=b

22

a = 6 - 0.75*3 = 3.75 Ŷ = 3.75 + 0.75 XŶ = 3.75 + 0.75 X

If the city has a truck that is 4 years old,

the director could use the equation to predict $675 annually in repairs.

6.75 = 3.75 + 0.75 * 46.75 = 3.75 + 0.75 * 4

Page 22: 004

Chapter 12: Sim Reg & Corr

Example:• To find the simple/linear regression of Personal Income (X) and Auto Sales (Y)

Exercise

Count the number of values.      Step 1:

Find XY, X2   See the below tableStep 2:

N = 5N = 5

X=64 what about Y?

Page 23: 004

Chapter 12: Sim Reg & Corr

Exercise

Step 3:

Step 4:

Find ΣX, ΣY, ΣXY, ΣX2.            ΣX = 311 Mean = 62.2             ΣY = 18.6 Mean = 3.72            ΣXY = 1159.7             ΣX2 = 19359

xn-x

y xn-xy=b

22 Substitute in the above slope formula given.

            Slope(b) = = 0.19 1159.7-5*62.2*3.72

19359-5*62.2*62.2

Page 24: 004

Chapter 12: Sim Reg & Corr

Exercise

Step 5:

Then substitute these values in regression equation formula            Regression Equation(Ŷ) = a + bX

         Ŷ  = -8.098 + 0.19X.

Step 6:

            Slope(b) = 0.19

Now, again substitute in the above intercept formula given.           

Intercept(a) = Y - bX  = 3.72- 0.19 * 62.2= -8.098

Suppose if we want to know the approximate y value for the variable X = 64. Then we can substitute the value in the above equation.

Regression Equation:Ŷ = a + bX             = -8.098 + 0.19(64).            = -8.098 + 12.16            = 4.06

Regression Equation:Ŷ = a + bX             = -8.098 + 0.19(64).            = -8.098 + 12.16            = 4.06

Page 25: 004

Chapter 12: Sim Reg & Corr

Least Squares Method:Minimize the sum of the squares of the errors to measure thegoodness of fit of a line

2. Estimation Using the Regression Line

ei = residuali

Page 26: 004

Chapter 12: Sim Reg & Corr

Least Squares Method:

2. Estimation Using the Regression Line

Page 27: 004

Chapter 12: Sim Reg & Corr

Example:

2. Estimation Using the Regression Line

Page 28: 004

Chapter 12: Sim Reg & Corr

Example Solution:

2. Estimation Using the Regression Line

Page 29: 004

Chapter 12: Sim Reg & Corr

Correlation Analysis:describe the degree to which one variable is linearly related

to another.

3. Correlation Analysis

Coefficient of Determination:Measure the extent, or strength, of the association that existsbetween two variables.

Coefficient of Correlation:Square root of coefficient of determination

r 2r 2

rr

Page 30: 004

Chapter 12: Sim Reg & Corr

3. Correlation Analysis

Coefficient of Determination:Measure the extent, or strength, of the association that existsbetween two variables.

• 0 ≤ r2 ≤ 1.• The larger r2 , the stronger the linear relationship.• The closer r2 is to 1, the more confident we are in our

prediction.

Page 31: 004

Chapter 12: Sim Reg & Corr

3. Correlation Analysis

Coefficient of Correlation:

Page 32: 004

Chapter 12: Sim Reg & Corr

3. Correlation Analysis

Coefficient of Determination:

Page 33: 004

Chapter 12: Sim Reg & Corr

Example Solution:

3. Correlation Analysis

Page 34: 004

Chapter 12: Sim Reg & Corr

Example Solution:

3. Correlation Analysis

Page 35: 004

Chapter 3: Describing Data

Review

Which value of r indicates a stronger correlation than 0.40? A. -0.30B. -0.50C. +0.38D. 0

If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate? A. -1B. +1C. 0D. Infinity

Page 36: 004

Chapter 3: Describing Data

Review

In the least squares equation,  Ŷ = 10 + 20X the value of 20 indicates A. the Y intercept.B. for each unit increase in X, Y increases by 20.C. for each unit increase in Y, X increases by 20.D. none of these. 

Page 37: 004

Chapter 3: Describing Data

Exercise

A sales manager for an advertising agency believes there is a relationship between the number of contacts and the amount of the sales. To verify this belief, the following data was collected: 

What is the Y-intercept of the linear equation? A. -12.201B. 2.1946C. -2.1946D. 12.201

Page 38: 004

Chapter 12: Sim Reg & Corr

Exercise

Ŷ = -1.8182 + 0.1329XŶ = -1.8182 + 0.1329X Sample Exam P.4

Page 39: 004

Chapter 12: Sim Reg & Corr

Exercise

Sample Exam P.4

Page 40: 004

Chapter 12: Sim Reg & Corr

Exercise

Sample Exam P.4

Ŷ = -1.8182 + 0.1329XŶ = -1.8182 + 0.1329X

Page 41: 004

SummaryChapter 1: What is Statistics?

• Chapter 3: – Calculate the arithmetic mean, weighted mean, median,

mode, and geometric mean– Explain the characteristics, uses, advantages, and

disadvantages of each measure of location– Identify the position of the mean, median, and mode for

both symmetric and skewed distributions– Compute and interpret the range, mean deviation,

variance, and standard deviation– Understand the characteristics, uses, advantages, and

disadvantages of each measure of dispersion– Understand Chebyshev’s theorem and the Empirical Rule

as they relate to a set of observations