42
Chap 3-1 Chapter 3 Describing Data: Numerical Instructor: Nahid Farnaz (Nhn) North South University BUS173: Applied Statistics

bus173_chap03

Embed Size (px)

DESCRIPTION

BUS 173

Citation preview

Page 1: bus173_chap03

Chap 3-1

Chapter 3

Describing Data: Numerical

Instructor: Nahid Farnaz (Nhn)North South University

BUS173: Applied Statistics

Page 2: bus173_chap03

Chap 3-2

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

xx

n

1ii

Overview

Midpoint of ranked values

Most frequently observed value

Arithmetic average

Page 3: bus173_chap03

Chap 3-3

Arithmetic Mean

The arithmetic mean (mean) is the most common measure of central tendency

For a population of N values:

For a sample of size n:

Sample sizen

xxx

n

xx n21

n

1ii

Observed

values

N

xxx

N

xμ N21

N

1ii

Population size

Population values

Page 4: bus173_chap03

Chap 3-4

Arithmetic Mean

The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers)

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321

4

5

20

5

104321

Page 5: bus173_chap03

Chap 3-5

Median

In an ordered list, the median is the “middle” number (50% above, 50% below)

Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Page 6: bus173_chap03

Chap 3-6

Finding the Median

The location of the median:

If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of

the two middle numbers

Note that is not the value of the median, only the

position of the median in the ranked data

dataorderedtheinposition2

1npositionMedian

2

1n

Page 7: bus173_chap03

Chap 3-7

Mode

A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Page 8: bus173_chap03

Chap 3-8

Mean is generally used, unless extreme values (outliers) exist

Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be

reported for a region – less sensitive to outliers

Which measure of location is the “best”?

Page 9: bus173_chap03

Chap 3-9

Shape of a Distribution

Describes how data are distributed Measures of shape

Symmetric or skewed

Mean = Median Mean < Median Median < Mean

Right-SkewedLeft-Skewed Symmetric

Page 10: bus173_chap03

Exercise 3.2

A department store manager is interested in the number of complaints received by the customer service dept. about the quality of electrical products sold. Records over a 5 week period show the following number of complaints for each week:

13, 15, 8, 16, 8

a.Compute the mean number of weekly complaints

b.Compute the median

c.Find the mode

Ans: a.12; b.13; c.8

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-10

Page 11: bus173_chap03

Ex. 3.6

The demand for bottled water increases during the hurricane season in Florida. A random sample of 7 hours showed that the following numbers of 1 gallon bottles were sold in one store:

40, 55, 62, 43, 50, 60, 65.

a. Describe the central tendency of the data

b. Comment on symmetry or skewness

Ans: Mean=53.57; no unique mode; median=55; left skewed

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-11

Page 12: bus173_chap03

Chap 3-12

Same center, different variation

Measures of Variability

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

Measures of variation give information on the spread or variability of the data values.

Page 13: bus173_chap03

Chap 3-13

Range

Simplest measure of variation Difference between the largest and the smallest

observations:

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Page 14: bus173_chap03

Chap 3-14

Ignores the way in which data are distributed

Sensitive to outliers

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Page 15: bus173_chap03

Chap 3-15

Interquartile Range

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

Page 16: bus173_chap03

Chap 3-16

Quartile Formulas

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1) (the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

Page 17: bus173_chap03

Chap 3-17

Interquartile Range

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Five number summary:Minimum < Q1 < Median Q2 < Q3 < Maximum

Page 18: bus173_chap03

Chap 3-18

(n = 9)

Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data

so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Quartiles

Sample Ranked Data: 11 12 13 16 16 17 18 21 22

Example: Find the first quartile

Page 19: bus173_chap03

Example 3.4: Waiting Times at Gilotti’s Grocery (five number summary)

A stem and leaf display for a sample of 25 waiting times (in seconds) is given below. Compute the 5 number summary.Calculate and interpret IQR

Minutes N = 25

Stem Unit =10.0; Leaf Unit = 1.0

Chap 3-19

1 1 2 4 6 7 8 8 9 9

2 1 2 2 2 4 6 8 9 9

3 0 1 2 3 4

4 0 2

Page 20: bus173_chap03

Chap 3-20

Average of squared deviations of values from the mean

Population variance:

Population Variance

Where = population mean

N = population size

xi = ith value of the variable x

μ

Page 21: bus173_chap03

Chap 3-21

Average (approximately) of squared deviations of values from the mean

Sample variance:

Sample Variance

1-n

)x(xs

n

1i

2i

2

Where = sample mean

n = sample size

Xi = ith value of the variable X

X

Page 22: bus173_chap03

Chap 3-22

Population Standard Deviation

Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Population standard deviation:

Page 23: bus173_chap03

Chap 3-23

Sample Standard Deviation

Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Sample standard deviation:

1-n

)x(xS

n

1i

2i

Page 24: bus173_chap03

Chap 3-24

Example:Sample Standard Deviation

Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

A measure of the “average” scatter around the mean

Page 25: bus173_chap03

Example 3.5

A professor teaches two large sections of marketing and randomly selects a sample of test scores from both sections. Find the range and standard deviation from each sample:

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-25

Section 1 50 60 70 80 90

Section 2 72 68 70 74 66

Page 26: bus173_chap03

Chap 3-26

This theorem established data intervals for any data set, regardless of the shape and distribution

For any population with mean μ and standard deviation σ , and k > 1 , the percentage of observations that fall within the interval

[μ kσ]

Is at least

Chebyshev’s Theorem

)]%(1/k100[1 2

Page 27: bus173_chap03

Chap 3-27

Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean (for k > 1)

Examples:

(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)

(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)

(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)

Chebyshev’s Theorem

withinAt least

(continued)

Page 28: bus173_chap03

Chap 3-28

If the data distribution is bell-shaped, then the interval:

contains about 68% of the values in the population or the sample

The Empirical Rule

1σμ

μ

68%

1σμ

Page 29: bus173_chap03

Chap 3-29

contains about 95% of the values in the population or the sample

contains about 99.7% of the values in the population or the sample

The Empirical Rule

2σμ

3σμ

3σμ

99.7%95%

2σμ

Page 30: bus173_chap03

Exercise 3.18

Use Chebychev’s theorem to approximate each of the following observations if the mean is 250 and S.D. is 20. Approximately what proportion of the observations is

a)Between 190 and 310?

b)Between 210 and 290?

c)Between 230 and 270?

Ans: a) 88.9% k=3; b) 75% k=2; c) 0% k=1

Chap 3-30

Page 31: bus173_chap03

Chap 3-31

Coefficient of Variation

Measures relative variation Always in percentage (%) Shows standard deviation relative to mean Can be used to compare two or more sets of

data measured in different units

100%x

sCV

Page 32: bus173_chap03

Chap 3-32

Comparing Coefficient of Variation

Stock A: Average price last year = $50 Standard deviation = $5

Stock B: Average price last year = $100 Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Page 33: bus173_chap03

Chap 3-33

The Sample Covariance The covariance measures the strength of the linear relationship

between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship No causal effect is implied

N

))(y(xy),(xCov

N

1iyixi

xy

1n

)y)(yx(xsy),(xCov

n

1iii

xy

Page 34: bus173_chap03

Chap 3-34

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Interpreting Covariance

Page 35: bus173_chap03

Chap 3-35

Correlation Coefficient

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:

YX ss

y),(xCovr

YXσσ

y),(xCovρ

Page 36: bus173_chap03

Chap 3-36

Features of Correlation Coefficient, r

Unit free

Ranges between –1 and 1

The closer to –1, the stronger the negative linear

relationship

The closer to 1, the stronger the positive linear

relationship

The closer to 0, the weaker any positive linear

relationship

Page 37: bus173_chap03

Chap 3-37

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Page 38: bus173_chap03

Chap 3-38

Interpreting the Result

r = .733

There is a relatively strong positive linear relationship between test score #1 and test score #2

Students who scored high on the first test tended to score high on second test

Scatter Plot of Test Scores

70

75

80

85

90

95

100

70 75 80 85 90 95 100

Test #1 ScoreT

est

#2 S

core

Page 39: bus173_chap03

Example 3.13 (Covariance and Correlation Coefficient)

Rising Hills Manufacturing Inc. wishes to study the relationship between the number of workers, X, and the number of tables produced, Y. it has obtained a random sample of 10 hours of production. The following (x,y) combinations of points were obtained:

(12,20) (30,60) (15,27) (24,50) (14,21)

(18,30) (28,61) (26,54) (19,32) (27,57)

Compute the covariance and correlation coefficient. Briefly discuss the relationship between X and Y.

Chap 3-39

Page 40: bus173_chap03

Chap 3-40

Obtaining Linear Relationships

An equation can be fit to show the best linear relationship between two variables:

Y = β0 + β1X

Where Y is the dependent variable and X is the independent variable

Page 41: bus173_chap03

Chap 3-41

Least Squares Regression

Estimates for coefficients β0 and β1 are found to minimize the sum of the squared residuals

The least-squares regression line, based on sample data, is

Where b1 is the slope of the line and b0 is the y-intercept:

xbby 10ˆ

x

y

2x

1 s

sr

s

y)Cov(x,b xbyb 10

Page 42: bus173_chap03

Example 3.15

Using the answers from Example 3.13 (Rising Hills Inc.), compute the sample regression coefficients b1 and b0.

What is the sample regression line?

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-42