bus173_chap03

Preview:

DESCRIPTION

BUS 173

Citation preview

Chap 3-1

Chapter 3

Describing Data: Numerical

Instructor: Nahid Farnaz (Nhn)North South University

BUS173: Applied Statistics

Chap 3-2

Measures of Central Tendency

Central Tendency

Mean Median Mode

n

xx

n

1ii

Overview

Midpoint of ranked values

Most frequently observed value

Arithmetic average

Chap 3-3

Arithmetic Mean

The arithmetic mean (mean) is the most common measure of central tendency

For a population of N values:

For a sample of size n:

Sample sizen

xxx

n

xx n21

n

1ii

Observed

values

N

xxx

N

xμ N21

N

1ii

Population size

Population values

Chap 3-4

Arithmetic Mean

The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers)

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321

4

5

20

5

104321

Chap 3-5

Median

In an ordered list, the median is the “middle” number (50% above, 50% below)

Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Chap 3-6

Finding the Median

The location of the median:

If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of

the two middle numbers

Note that is not the value of the median, only the

position of the median in the ranked data

dataorderedtheinposition2

1npositionMedian

2

1n

Chap 3-7

Mode

A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Chap 3-8

Mean is generally used, unless extreme values (outliers) exist

Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be

reported for a region – less sensitive to outliers

Which measure of location is the “best”?

Chap 3-9

Shape of a Distribution

Describes how data are distributed Measures of shape

Symmetric or skewed

Mean = Median Mean < Median Median < Mean

Right-SkewedLeft-Skewed Symmetric

Exercise 3.2

A department store manager is interested in the number of complaints received by the customer service dept. about the quality of electrical products sold. Records over a 5 week period show the following number of complaints for each week:

13, 15, 8, 16, 8

a.Compute the mean number of weekly complaints

b.Compute the median

c.Find the mode

Ans: a.12; b.13; c.8

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-10

Ex. 3.6

The demand for bottled water increases during the hurricane season in Florida. A random sample of 7 hours showed that the following numbers of 1 gallon bottles were sold in one store:

40, 55, 62, 43, 50, 60, 65.

a. Describe the central tendency of the data

b. Comment on symmetry or skewness

Ans: Mean=53.57; no unique mode; median=55; left skewed

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-11

Chap 3-12

Same center, different variation

Measures of Variability

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

Measures of variation give information on the spread or variability of the data values.

Chap 3-13

Range

Simplest measure of variation Difference between the largest and the smallest

observations:

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Chap 3-14

Ignores the way in which data are distributed

Sensitive to outliers

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Chap 3-15

Interquartile Range

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

Chap 3-16

Quartile Formulas

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1) (the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

Chap 3-17

Interquartile Range

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Five number summary:Minimum < Q1 < Median Q2 < Q3 < Maximum

Chap 3-18

(n = 9)

Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data

so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Quartiles

Sample Ranked Data: 11 12 13 16 16 17 18 21 22

Example: Find the first quartile

Example 3.4: Waiting Times at Gilotti’s Grocery (five number summary)

A stem and leaf display for a sample of 25 waiting times (in seconds) is given below. Compute the 5 number summary.Calculate and interpret IQR

Minutes N = 25

Stem Unit =10.0; Leaf Unit = 1.0

Chap 3-19

1 1 2 4 6 7 8 8 9 9

2 1 2 2 2 4 6 8 9 9

3 0 1 2 3 4

4 0 2

Chap 3-20

Average of squared deviations of values from the mean

Population variance:

Population Variance

Where = population mean

N = population size

xi = ith value of the variable x

μ

Chap 3-21

Average (approximately) of squared deviations of values from the mean

Sample variance:

Sample Variance

1-n

)x(xs

n

1i

2i

2

Where = sample mean

n = sample size

Xi = ith value of the variable X

X

Chap 3-22

Population Standard Deviation

Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Population standard deviation:

Chap 3-23

Sample Standard Deviation

Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Sample standard deviation:

1-n

)x(xS

n

1i

2i

Chap 3-24

Example:Sample Standard Deviation

Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

A measure of the “average” scatter around the mean

Example 3.5

A professor teaches two large sections of marketing and randomly selects a sample of test scores from both sections. Find the range and standard deviation from each sample:

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-25

Section 1 50 60 70 80 90

Section 2 72 68 70 74 66

Chap 3-26

This theorem established data intervals for any data set, regardless of the shape and distribution

For any population with mean μ and standard deviation σ , and k > 1 , the percentage of observations that fall within the interval

[μ kσ]

Is at least

Chebyshev’s Theorem

)]%(1/k100[1 2

Chap 3-27

Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean (for k > 1)

Examples:

(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)

(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)

(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)

Chebyshev’s Theorem

withinAt least

(continued)

Chap 3-28

If the data distribution is bell-shaped, then the interval:

contains about 68% of the values in the population or the sample

The Empirical Rule

1σμ

μ

68%

1σμ

Chap 3-29

contains about 95% of the values in the population or the sample

contains about 99.7% of the values in the population or the sample

The Empirical Rule

2σμ

3σμ

3σμ

99.7%95%

2σμ

Exercise 3.18

Use Chebychev’s theorem to approximate each of the following observations if the mean is 250 and S.D. is 20. Approximately what proportion of the observations is

a)Between 190 and 310?

b)Between 210 and 290?

c)Between 230 and 270?

Ans: a) 88.9% k=3; b) 75% k=2; c) 0% k=1

Chap 3-30

Chap 3-31

Coefficient of Variation

Measures relative variation Always in percentage (%) Shows standard deviation relative to mean Can be used to compare two or more sets of

data measured in different units

100%x

sCV

Chap 3-32

Comparing Coefficient of Variation

Stock A: Average price last year = $50 Standard deviation = $5

Stock B: Average price last year = $100 Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Chap 3-33

The Sample Covariance The covariance measures the strength of the linear relationship

between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship No causal effect is implied

N

))(y(xy),(xCov

N

1iyixi

xy

1n

)y)(yx(xsy),(xCov

n

1iii

xy

Chap 3-34

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Interpreting Covariance

Chap 3-35

Correlation Coefficient

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:

YX ss

y),(xCovr

YXσσ

y),(xCovρ

Chap 3-36

Features of Correlation Coefficient, r

Unit free

Ranges between –1 and 1

The closer to –1, the stronger the negative linear

relationship

The closer to 1, the stronger the positive linear

relationship

The closer to 0, the weaker any positive linear

relationship

Chap 3-37

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Chap 3-38

Interpreting the Result

r = .733

There is a relatively strong positive linear relationship between test score #1 and test score #2

Students who scored high on the first test tended to score high on second test

Scatter Plot of Test Scores

70

75

80

85

90

95

100

70 75 80 85 90 95 100

Test #1 ScoreT

est

#2 S

core

Example 3.13 (Covariance and Correlation Coefficient)

Rising Hills Manufacturing Inc. wishes to study the relationship between the number of workers, X, and the number of tables produced, Y. it has obtained a random sample of 10 hours of production. The following (x,y) combinations of points were obtained:

(12,20) (30,60) (15,27) (24,50) (14,21)

(18,30) (28,61) (26,54) (19,32) (27,57)

Compute the covariance and correlation coefficient. Briefly discuss the relationship between X and Y.

Chap 3-39

Chap 3-40

Obtaining Linear Relationships

An equation can be fit to show the best linear relationship between two variables:

Y = β0 + β1X

Where Y is the dependent variable and X is the independent variable

Chap 3-41

Least Squares Regression

Estimates for coefficients β0 and β1 are found to minimize the sum of the squared residuals

The least-squares regression line, based on sample data, is

Where b1 is the slope of the line and b0 is the y-intercept:

xbby 10ˆ

x

y

2x

1 s

sr

s

y)Cov(x,b xbyb 10

Example 3.15

Using the answers from Example 3.13 (Rising Hills Inc.), compute the sample regression coefficients b1 and b0.

What is the sample regression line?

Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chap 3-42