67
1 Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni. Chapter 3 Chapter 3 Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Measures of Location Measures of Location Measures of Variability Measures of Variability Measure of Relative Location and Measure of Relative Location and Detecting Outliers Detecting Outliers Exploratory Data Analysis Exploratory Data Analysis Measures of Association between Two Measures of Association between Two Variables Variables The Weighted mean and Working with The Weighted mean and Working with Grouped Data Grouped Data x %

Chapter 3 Descriptive Statistics: Numerical Methods

Embed Size (px)

DESCRIPTION

Chapter 3 Descriptive Statistics: Numerical Methods. Measures of Location Measures of Variability Measure of Relative Location and Detecting Outliers Exploratory Data Analysis Measures of Association between Two Variables The Weighted mean and Working with Grouped Data. . . %. x. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 3  Descriptive Statistics:  Numerical Methods

1 1

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Chapter 3Chapter 3 Descriptive Statistics: Numerical Descriptive Statistics: Numerical

MethodsMethods

Measures of LocationMeasures of Location Measures of VariabilityMeasures of Variability Measure of Relative Location and Detecting Measure of Relative Location and Detecting

OutliersOutliers Exploratory Data AnalysisExploratory Data Analysis Measures of Association between Two Measures of Association between Two

VariablesVariables The Weighted mean and Working with The Weighted mean and Working with

Grouped DataGrouped Data

xx%%

Page 2: Chapter 3  Descriptive Statistics:  Numerical Methods

2 2

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

3.1 Measures of Location3.1 Measures of Location

Mean Mean (均值)(均值) Median Median (中位数)(中位数) Mode Mode (众数)(众数) Percentiles Percentiles (百分位数)(百分位数) Quartiles Quartiles (四分位数)(四分位数)

Page 3: Chapter 3  Descriptive Statistics:  Numerical Methods

3 3

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Given below is a sample of monthly rent Given below is a sample of monthly rent values ($)values ($)

for one-bedroom apartments. The data is a for one-bedroom apartments. The data is a sample of 70sample of 70

apartments in a particular city. The data are apartments in a particular city. The data are presentedpresented

in ascending order. in ascending order.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 4: Chapter 3  Descriptive Statistics:  Numerical Methods

4 4

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

MeanMean

The The meanmean (平均值)(平均值) of a data set is the of a data set is the average of all the data values.average of all the data values.

If the data are from a sample, the mean is If the data are from a sample, the mean is denoted by denoted by

..

If the data are from a population, the mean is If the data are from a population, the mean is denoted by denoted by (mu). (mu).

xxnixxni

xNi x

Ni

xx

Page 5: Chapter 3  Descriptive Statistics:  Numerical Methods

5 5

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

MeanMean

xxni 34 35670

490 80,

.xxni 34 35670

490 80,

.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 6: Chapter 3  Descriptive Statistics:  Numerical Methods

6 6

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

MedianMedian

The The medianmedian (中位数)(中位数) is the measure of is the measure of location most often reported for annual income location most often reported for annual income and property value data.and property value data.

A few extremely large incomes or property A few extremely large incomes or property values can inflate the mean.values can inflate the mean.

Page 7: Chapter 3  Descriptive Statistics:  Numerical Methods

7 7

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

MedianMedian

The The medianmedian of a data set is the value in the of a data set is the value in the middle when the data items are arranged in middle when the data items are arranged in ascending order.ascending order.

For an odd number of observations, the For an odd number of observations, the median is the middle value.median is the middle value.

For an even number of observations, the For an even number of observations, the median is the average of the two middle median is the average of the two middle values.values.

Page 8: Chapter 3  Descriptive Statistics:  Numerical Methods

8 8

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

MedianMedian

Median = 50th percentileMedian = 50th percentile

i i = (= (pp/100)/100)nn = (50/100)70 = 35 = (50/100)70 = 35 Averaging the 35th and Averaging the 35th and

36th data values:36th data values:

Median = (475 + 475)/2 = 475Median = (475 + 475)/2 = 475425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 9: Chapter 3  Descriptive Statistics:  Numerical Methods

9 9

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

ModeMode

The The modemode (众数)(众数) of a data set is the value of a data set is the value that occurs with greatest frequency.that occurs with greatest frequency.

The greatest frequency can occur at two or The greatest frequency can occur at two or more different values.more different values.

If the data have exactly two modes, the data If the data have exactly two modes, the data are are bimodalbimodal. . (双峰)(双峰)

If the data have more than two modes, the If the data have more than two modes, the data are data are multimodalmultimodal. . (多峰)(多峰)

Page 10: Chapter 3  Descriptive Statistics:  Numerical Methods

10 10

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

ModeMode

450 occurred most frequently (7 450 occurred most frequently (7 times)times)

Mode = 450Mode = 450425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 11: Chapter 3  Descriptive Statistics:  Numerical Methods

11 11

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

PercentilesPercentiles

A A percentilepercentile (百分位数)(百分位数) provides provides information about how the data are spread information about how the data are spread over the interval from the smallest value to over the interval from the smallest value to the largest value.the largest value.

Admission test scores for colleges and Admission test scores for colleges and universities are frequently reported in terms of universities are frequently reported in terms of percentiles.percentiles.

Page 12: Chapter 3  Descriptive Statistics:  Numerical Methods

12 12

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

The The ppth percentileth percentile of a data set is a value such of a data set is a value such that at least that at least pp percent of the items take on this percent of the items take on this value or less and at least (100 - value or less and at least (100 - pp) percent of ) percent of the items take on this value or more.the items take on this value or more.

• Arrange the data in ascending order.Arrange the data in ascending order.

• Compute index Compute index ii, the position of the , the position of the ppth th percentile.percentile.

ii = ( = (pp/100)/100)nn

• If If ii is not an integer, round up. The is not an integer, round up. The pp th th percentile is the value in the percentile is the value in the ii th position.th position.

• If If ii is an integer, the is an integer, the pp th percentile is the th percentile is the average of the values in positionsaverage of the values in positions i i and and ii +1.+1.

PercentilesPercentiles

Page 13: Chapter 3  Descriptive Statistics:  Numerical Methods

13 13

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

90th Percentile90th Percentile

ii = ( = (pp/100)/100)nn = (90/100)70 = 63 = (90/100)70 = 63

Averaging the 63rd and 64th data Averaging the 63rd and 64th data values:values:

90th Percentile = (580 + 590)/2 = 90th Percentile = (580 + 590)/2 = 585585425 430 430 435 435 435 435 435 440 440

440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 14: Chapter 3  Descriptive Statistics:  Numerical Methods

14 14

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

QuartilesQuartiles

Quartiles Quartiles (四分位数) (四分位数) are specific percentilesare specific percentiles First Quartile = 25th PercentileFirst Quartile = 25th Percentile Second Quartile = 50th Percentile = MedianSecond Quartile = 50th Percentile = Median Third Quartile = 75th PercentileThird Quartile = 75th Percentile

Page 15: Chapter 3  Descriptive Statistics:  Numerical Methods

15 15

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Third QuartileThird Quartile

Third quartile = 75th percentileThird quartile = 75th percentile

i i = (= (pp/100)/100)nn = (75/100)70 = 52.5 = = (75/100)70 = 52.5 = 5353

Third quartile = 525Third quartile = 525425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 16: Chapter 3  Descriptive Statistics:  Numerical Methods

16 16

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Measures of VariabilityMeasures of Variability

It is often desirable to consider measures of It is often desirable to consider measures of variability (dispersion), as well as measures of variability (dispersion), as well as measures of location.location.

For example, in choosing supplier A or supplier For example, in choosing supplier A or supplier B we might consider not only the average B we might consider not only the average delivery time for each, but also the variability delivery time for each, but also the variability in delivery time for each. in delivery time for each.

Page 17: Chapter 3  Descriptive Statistics:  Numerical Methods

17 17

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Measures of VariabilityMeasures of Variability

Range Range (极差)(极差) Interquartile Range Interquartile Range (四分位点内距)(四分位点内距) Variance Variance (方差)(方差) Standard Deviation Standard Deviation (标准差)(标准差) Coefficient of Variation Coefficient of Variation (变异系数)(变异系数)

Page 18: Chapter 3  Descriptive Statistics:  Numerical Methods

18 18

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

RangeRange

TheThe rangerange (极差)(极差) of a data set is the difference of a data set is the difference between the largest and smallest data values.between the largest and smallest data values.

It is the It is the simplest measuresimplest measure of variability. of variability. It is It is very sensitivevery sensitive to the smallest and largest to the smallest and largest

data values.data values.

Page 19: Chapter 3  Descriptive Statistics:  Numerical Methods

19 19

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

RangeRange

Range = largest value - smallest Range = largest value - smallest value value

Range = 615 - 425 = 190Range = 615 - 425 = 190425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 20: Chapter 3  Descriptive Statistics:  Numerical Methods

20 20

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Interquartile RangeInterquartile Range

The The interquartile rangeinterquartile range (四分位点内距)(四分位点内距) of a of a data set is the difference between the third data set is the difference between the third quartile and the first quartile.quartile and the first quartile.

It is the range for the It is the range for the middle 50%middle 50% of the data. of the data. It It overcomes the sensitivityovercomes the sensitivity to extreme data to extreme data

values.values.

Page 21: Chapter 3  Descriptive Statistics:  Numerical Methods

21 21

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Interquartile RangeInterquartile Range

3rd Quartile (3rd Quartile (QQ3) = 5253) = 525

1st Quartile (1st Quartile (QQ1) = 4451) = 445

Interquartile Range = Interquartile Range = QQ3 - 3 - QQ1 = 525 - 445 = 1 = 525 - 445 = 8080

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 22: Chapter 3  Descriptive Statistics:  Numerical Methods

22 22

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

VarianceVariance

TheThe variancevariance (方差)(方差) is a measure of is a measure of variability that utilizes all the data.variability that utilizes all the data.

It is based on the difference between the value It is based on the difference between the value of each observation (of each observation (xxii) and the mean () and the mean (xx for a for a sample, sample, for a population). for a population).

x

Page 23: Chapter 3  Descriptive Statistics:  Numerical Methods

23 23

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

VarianceVariance

The variance is the The variance is the average of the squared average of the squared differencesdifferences between each data value and the between each data value and the mean.mean.

If the data set is a sample, the variance is If the data set is a sample, the variance is denoted by denoted by ss22. .

If the data set is a population, the variance is If the data set is a population, the variance is denoted by denoted by 22. . (sigma)(sigma)

sxi x

n2

2

1

( )s

xi x

n2

2

1

( )

22

( )xNi 2

2

( )xNi

Page 24: Chapter 3  Descriptive Statistics:  Numerical Methods

24 24

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Standard DeviationStandard Deviation

The The standard deviationstandard deviation (标准差)(标准差) of a data set of a data set is the positive square root of the variance.is the positive square root of the variance.

It is measured in the It is measured in the same units as the datasame units as the data, , making it more easily comparable, than the making it more easily comparable, than the variance, to the mean.variance, to the mean.

If the data set is a sample, the standard If the data set is a sample, the standard deviation is denoted deviation is denoted ss..

If the data set is a population, the standard If the data set is a population, the standard deviation is denoted deviation is denoted (sigma). (sigma).

s s 2s s 2

2 2

Page 25: Chapter 3  Descriptive Statistics:  Numerical Methods

25 25

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Coefficient of VariationCoefficient of Variation The The coefficient of variationcoefficient of variation (变异系(变异系

数)数) indicates how large the standard deviation indicates how large the standard deviation is in relation to the mean.is in relation to the mean.

If the data set is a sample, the coefficient of If the data set is a sample, the coefficient of variation is computed as follows:variation is computed as follows:

If the data set is a population, the coefficient If the data set is a population, the coefficient of variation is computed as follows:of variation is computed as follows:

sx( )100sx( )100

( )100

( )100

Page 26: Chapter 3  Descriptive Statistics:  Numerical Methods

26 26

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

VarianceVariance

Standard DeviationStandard Deviation

Coefficient of VariationCoefficient of Variation

sxi x

n2

2

12 996 16

( ), .s

xi x

n2

2

12 996 16

( ), .

s s 2 2996 47 54 74. .s s 2 2996 47 54 74. .

sx

10054 74490 80

100 11 15..

.sx

10054 74490 80

100 11 15..

.

Page 27: Chapter 3  Descriptive Statistics:  Numerical Methods

27 27

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

课堂练习课堂练习

一项关于大学生体重状况的研究发现,男生的平均体重为 60kg ,标准差为 5kg ;女生的平均体重为 50kg,标准差为 5kg 。请回答下面的问题:

要求:( 1 )男生的体重差异大还是女生的体重差异大?为什么?

  ( 2 )以磅为单位( 1 磅 =2.2kg )求体重的平均数和标准差。

  ( 3 )粗略地估计一下,男生中有百分之几的人体重在 55kg ~ 65kg 之间?

  ( 4 )粗略地估计一下,女生中有百分之几的人体重在 40kg ~ 60kg 之间?

Page 28: Chapter 3  Descriptive Statistics:  Numerical Methods

28 28

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Chapter 3Chapter 3 Descriptive Statistics: Numerical Descriptive Statistics: Numerical

MethodsMethods

Measures of Relative Location and Detecting Measures of Relative Location and Detecting OutliersOutliers

Exploratory Data AnalysisExploratory Data Analysis Measures of Association Between Two Measures of Association Between Two

VariablesVariables The Weighted Mean and The Weighted Mean and

Working with Grouped DataWorking with Grouped Data

%%xx

Page 29: Chapter 3  Descriptive Statistics:  Numerical Methods

29 29

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Measures of Relative LocationMeasures of Relative Locationand Detecting Outliersand Detecting Outliers

z-Scores z-Scores (( Z-Z- 分数)分数) Chebyshev’s Theorem Chebyshev’s Theorem (切比雪夫定理)(切比雪夫定理) Empirical Rule Empirical Rule (经验法则)(经验法则) Detecting Outliers Detecting Outliers (异常值检测)(异常值检测)

Page 30: Chapter 3  Descriptive Statistics:  Numerical Methods

30 30

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

z-Scores z-Scores (( Z-Z- 分数)分数)

The The z-scorez-score is often called the standardized is often called the standardized value.value.

It denotes the number of standard deviations a It denotes the number of standard deviations a data value data value xxii is from the mean. is from the mean.

A data value less than the sample mean will A data value less than the sample mean will have a z-score less than zero.have a z-score less than zero.

A data value greater than the sample mean A data value greater than the sample mean will have a z-score greater than zero.will have a z-score greater than zero.

A data value equal to the sample mean will A data value equal to the sample mean will have a z-score of zero.have a z-score of zero.

zx xsii

zx xsii

Page 31: Chapter 3  Descriptive Statistics:  Numerical Methods

31 31

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

z-Score of Smallest Value (425)z-Score of Smallest Value (425)

Standardized Values for Apartment RentsStandardized Values for Apartment Rents

zx xsi

425 490 8054 74

1 20.

..z

x xsi

425 490 8054 74

1 20.

..

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Example: Apartment RentsExample: Apartment Rents

Page 32: Chapter 3  Descriptive Statistics:  Numerical Methods

32 32

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

ExampleExample :: Z-scores for the class-sizeZ-scores for the class-size

Sample mean: 44; sample standard Sample mean: 44; sample standard deviation:8 deviation:8

Page 33: Chapter 3  Descriptive Statistics:  Numerical Methods

33 33

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Chebyshev’s Theorem Chebyshev’s Theorem (切比雪夫定理)(切比雪夫定理)

At least (1 - 1/At least (1 - 1/zz22) of the items in ) of the items in anyany data set will be data set will be

within within zz standard deviations of the mean, where standard deviations of the mean, where z z isis

any value greater than 1.any value greater than 1.

• At least At least 75%75% of the items must be within of the items must be within

z z = 2 standard deviations= 2 standard deviations of the mean. of the mean.

• At least At least 89%89% of the items must be within of the items must be within

zz = 3 standard deviations = 3 standard deviations of the mean. of the mean.

• At least At least 94%94% of the items must be within of the items must be within

zz = 4 standard deviations = 4 standard deviations of the mean. of the mean.

与均值的距离必定在与均值的距离必定在 zz 个标准差以内的数据比例至少为个标准差以内的数据比例至少为 (1 - (1 - 1/1/zz22))

At least (1 - 1/At least (1 - 1/zz22) of the items in ) of the items in anyany data set will be data set will be

within within zz standard deviations of the mean, where standard deviations of the mean, where z z isis

any value greater than 1.any value greater than 1.

• At least At least 75%75% of the items must be within of the items must be within

z z = 2 standard deviations= 2 standard deviations of the mean. of the mean.

• At least At least 89%89% of the items must be within of the items must be within

zz = 3 standard deviations = 3 standard deviations of the mean. of the mean.

• At least At least 94%94% of the items must be within of the items must be within

zz = 4 standard deviations = 4 standard deviations of the mean. of the mean.

与均值的距离必定在与均值的距离必定在 zz 个标准差以内的数据比例至少为个标准差以内的数据比例至少为 (1 - (1 - 1/1/zz22))

Page 34: Chapter 3  Descriptive Statistics:  Numerical Methods

34 34

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: the midterm test scoresExample: the midterm test scores

the midterm test scores for 100 students in a the midterm test scores for 100 students in a college business statistics course had college business statistics course had a mean a mean of 70of 70 and a and a standard deviation of 5standard deviation of 5. How . How many students had test scores between 60 many students had test scores between 60 and 80? How many students had test scores and 80? How many students had test scores between 58 and 82?between 58 and 82?

60-80: 60-80:

• ZZ60 60 =(60-70)/5=-2 ; Z=(60-70)/5=-2 ; Z8080=(80-70)/5=2;=(80-70)/5=2;

• At least (1 - 1/(2)At least (1 - 1/(2)22) = 0.75 or 75% of the ) = 0.75 or 75% of the students have scores between 60 and 80.students have scores between 60 and 80.

58-82?58-82?

Page 35: Chapter 3  Descriptive Statistics:  Numerical Methods

35 35

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Chebyshev’s Theorem Chebyshev’s Theorem (切比雪夫定理)(切比雪夫定理)

Let Let zz = 1.5 with = 490.80 and = 1.5 with = 490.80 and ss = = 54.7454.74

At least (1 - 1/(1.5)At least (1 - 1/(1.5)22) = 1 - 0.44 = 0.56 or ) = 1 - 0.44 = 0.56 or 56% 56%

of the rent values must be betweenof the rent values must be between

- - zz((ss) = 490.80 - 1.5(54.74) = ) = 490.80 - 1.5(54.74) = 409409

andand

+ + zz((ss) = 490.80 + 1.5(54.74) = ) = 490.80 + 1.5(54.74) = 573573

xx

xx

xx

Page 36: Chapter 3  Descriptive Statistics:  Numerical Methods

36 36

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Chebyshev’s Theorem (continued)Chebyshev’s Theorem (continued)

Actually, 86% of the rent valuesActually, 86% of the rent values

are between 409 and 573. are between 409 and 573.

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Example: Apartment RentsExample: Apartment Rents

Page 37: Chapter 3  Descriptive Statistics:  Numerical Methods

37 37

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Empirical RuleEmpirical Rule (经验法则)(经验法则)

For data having a bell-shaped distribution:For data having a bell-shaped distribution:

• Approximately Approximately 68%68% of the data values will of the data values will be within be within oneone standard deviationstandard deviation of the of the mean.mean.

Page 38: Chapter 3  Descriptive Statistics:  Numerical Methods

38 38

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Empirical RuleEmpirical Rule

For data having a bell-shaped For data having a bell-shaped distribution:distribution:

• Approximately Approximately 95%95% of the data values will of the data values will be within be within twotwo standard deviationsstandard deviations of the of the mean.mean.

Page 39: Chapter 3  Descriptive Statistics:  Numerical Methods

39 39

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Empirical RuleEmpirical Rule

For data having a bell-shaped For data having a bell-shaped distribution:distribution:

• Almost allAlmost all (99.7%) of the items will be (99.7%) of the items will be within within threethree standard deviationsstandard deviations of the of the mean.mean.

Page 40: Chapter 3  Descriptive Statistics:  Numerical Methods

40 40

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Page 41: Chapter 3  Descriptive Statistics:  Numerical Methods

41 41

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Empirical RuleEmpirical Rule

IntervalInterval % in % in IntervalInterval

Within +/- 1Within +/- 1ss 434366.06 to 545.54.06 to 545.54 48/70 = 48/70 = 69%69%

Within +/- 2Within +/- 2ss 381.32 to 600.28381.32 to 600.28 68/70 = 68/70 = 97%97%

Within +/- 3Within +/- 3ss 326.58 to 655.02326.58 to 655.02 70/70 = 70/70 = 100%100%

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 42: Chapter 3  Descriptive Statistics:  Numerical Methods

42 42

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

应用:应用: six sigma(six sigma( 六西格玛六西格玛 ) )

用用““ σ”σ” 度量质量特性总体上对目标度量质量特性总体上对目标值的偏离程度。几个西格玛是一种值的偏离程度。几个西格玛是一种表示品质的统计尺度。任何一个工表示品质的统计尺度。任何一个工作程序或工艺过程都可用几个西格作程序或工艺过程都可用几个西格玛表示。玛表示。

六个西格玛可解释为每一百万个机六个西格玛可解释为每一百万个机会中有会中有 3.43.4 个出错的机会,即合格个出错的机会,即合格率是率是 99.9996699.99966 %。而三个西格玛%。而三个西格玛的合格率只有的合格率只有 93.3293.32 %。%。

六个西格玛的管理方法重点是将所六个西格玛的管理方法重点是将所有的工作作为一种流程,采用量化有的工作作为一种流程,采用量化的方法 分析流程中影响质量的因素的方法 分析流程中影响质量的因素,找出最关键的因素加以改进从而,找出最关键的因素加以改进从而达到更高的客户满意度。 达到更高的客户满意度。

Page 43: Chapter 3  Descriptive Statistics:  Numerical Methods

43 43

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Detecting Outliers Detecting Outliers (异常值检测)(异常值检测)

An An outlieroutlier is an unusually small or unusually is an unusually small or unusually large value in a data set.large value in a data set.

A data value with a z-score less than -3 or A data value with a z-score less than -3 or greater than +3 might be considered an greater than +3 might be considered an outlier. outlier.

It might be:It might be:

• an incorrectly recorded data valuean incorrectly recorded data value

• a data value that was incorrectly included in a data value that was incorrectly included in the data setthe data set

• a correctly recorded data value that belongs a correctly recorded data value that belongs in the data setin the data set

Page 44: Chapter 3  Descriptive Statistics:  Numerical Methods

44 44

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Detecting OutliersDetecting OutliersThe most extreme z-scores are -1.20 and The most extreme z-scores are -1.20 and

2.27.2.27.Using |Using |zz| | >> 3 as the criterion for an 3 as the criterion for an

outlier, outlier, there are no outliers in this data set. there are no outliers in this data set.

Standardized Values for Apartment RentsStandardized Values for Apartment Rents-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.350.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.451.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Page 45: Chapter 3  Descriptive Statistics:  Numerical Methods

45 45

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Exploratory Data Analysis Exploratory Data Analysis (探索性数据分(探索性数据分析)析)

Five-Number Summary Five-Number Summary (五数据概括法)(五数据概括法) Box Plot Box Plot (箱形图)(箱形图)

Page 46: Chapter 3  Descriptive Statistics:  Numerical Methods

46 46

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Five-Number SummaryFive-Number Summary

Smallest Value Smallest Value (最小值)(最小值) First Quartile First Quartile (第一四分位数)(第一四分位数) Median Median (中位数)(中位数) Third Quartile Third Quartile (第三四分位数)(第三四分位数) Largest Value Largest Value (最大值)(最大值)

Page 47: Chapter 3  Descriptive Statistics:  Numerical Methods

47 47

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Five-Number SummaryFive-Number Summary

Lowest Value = 425Lowest Value = 425 First Quartile First Quartile = 450= 450

Median = 475Median = 475

Third Quartile = 525 Largest Value Third Quartile = 525 Largest Value = 615= 615425 430 430 435 435 435 435 435 440 440

440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Page 48: Chapter 3  Descriptive Statistics:  Numerical Methods

48 48

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Box PlotBox Plot

A box is drawn with its ends located at the first and A box is drawn with its ends located at the first and third quartiles.third quartiles.

A vertical line is drawn in the box at the location of the A vertical line is drawn in the box at the location of the median.median.

Limits are located (not drawn) using the interquartile Limits are located (not drawn) using the interquartile range (IQR).range (IQR).

• The lower limit is located 1.5(IQR) below The lower limit is located 1.5(IQR) below QQ1.1.

• The upper limit is located 1.5(IQR) above The upper limit is located 1.5(IQR) above QQ3.3.

• Data outside these limits are considered Data outside these limits are considered outliersoutliers..

Page 49: Chapter 3  Descriptive Statistics:  Numerical Methods

49 49

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Box Plot (Continued)Box Plot (Continued)

Whiskers (dashed lines) are drawn from the Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest ends of the box to the smallest and largest data values inside the limits.data values inside the limits.

The locations of each outlier is shown with the The locations of each outlier is shown with the symbolsymbol * * ..

Page 50: Chapter 3  Descriptive Statistics:  Numerical Methods

50 50

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Box PlotBox Plot

Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5 = 337.5

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5= 637.5

There are no outliers.There are no outliers.

375375

400400

425425

450450

475475

500500

525525

550550 575575 600600 625625

Page 51: Chapter 3  Descriptive Statistics:  Numerical Methods

51 51

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Measures of Association Measures of Association Between Two VariablesBetween Two Variables

Covariance Covariance (协方差)(协方差) Correlation Coefficient Correlation Coefficient (相关系数)(相关系数)

Page 52: Chapter 3  Descriptive Statistics:  Numerical Methods

52 52

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

The The covariancecovariance (协方差)(协方差) is a measure of the is a measure of the linear association between two variables.linear association between two variables.

If the data sets are samples, the covariance is If the data sets are samples, the covariance is denoted by denoted by ssxyxy..

If the data sets are populations, the covariance If the data sets are populations, the covariance is denoted by .is denoted by .

CovarianceCovariance

sx x y ynxy

i i

( )( )

1s

x x y ynxy

i i

( )( )

1

xyi x i yx y

N

( )( )

xy

i x i yx y

N

( )( )

xyxy

Page 53: Chapter 3  Descriptive Statistics:  Numerical Methods

53 53

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

CovarianceCovariance

Positive values indicate a positive relationship.Positive values indicate a positive relationship. Negative values indicate a negative relationship.Negative values indicate a negative relationship.

Page 54: Chapter 3  Descriptive Statistics:  Numerical Methods

54 54

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Correlation Coefficient Correlation Coefficient (相关系数)(相关系数)

The coefficient can take on values between -1 and The coefficient can take on values between -1 and +1.+1.

Values near -1 indicate a Values near -1 indicate a strong negative linear strong negative linear relationshiprelationship..

Values near +1 indicate a Values near +1 indicate a strong positive linear strong positive linear relationshiprelationship..

If the data sets are samples, the coefficient is If the data sets are samples, the coefficient is rrxyxy..

If the data sets are populations, the coefficient If the data sets are populations, the coefficient is .is .

rs

s sxyxy

x yrs

s sxyxy

x y

xyxy

x y

xyxy

x y

xyxy

Page 55: Chapter 3  Descriptive Statistics:  Numerical Methods

55 55

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

The Weighted Mean andThe Weighted Mean andWorking with Grouped DataWorking with Grouped Data

Weighted Mean Weighted Mean (加权平均值)(加权平均值) Mean for Grouped Data Mean for Grouped Data (分组数据均值)(分组数据均值) Variance for Grouped Data Variance for Grouped Data (分组数据方差)(分组数据方差) Standard Deviation for Grouped Data Standard Deviation for Grouped Data (分组数(分组数

据标准差)据标准差)

Page 56: Chapter 3  Descriptive Statistics:  Numerical Methods

56 56

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Weighted Mean Weighted Mean (加权平均值)(加权平均值)

When the mean is computed by giving each When the mean is computed by giving each data value a weight that reflects its data value a weight that reflects its importance, it is referred to as a importance, it is referred to as a weighted weighted meanmean..

In the computation of a grade point average In the computation of a grade point average (GPA), the weights are the number of credit (GPA), the weights are the number of credit hours earned for each grade.hours earned for each grade.

When data values vary in importance, the When data values vary in importance, the analyst must choose the weight that best analyst must choose the weight that best reflects the importance of each value.reflects the importance of each value.

Page 57: Chapter 3  Descriptive Statistics:  Numerical Methods

57 57

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Weighted MeanWeighted Mean

xx = = wwi i xxii

wwii

where:where:

xxii = value of observation = value of observation ii

wwi i = weight for observation = weight for observation ii

Page 58: Chapter 3  Descriptive Statistics:  Numerical Methods

58 58

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Grouped DataGrouped Data

The weighted mean computation can be used to The weighted mean computation can be used to obtain approximations of the mean, variance, obtain approximations of the mean, variance, and standard deviation for the grouped data.and standard deviation for the grouped data.

To compute the weighted mean, we treat the To compute the weighted mean, we treat the midpoint of each classmidpoint of each class as though it were the as though it were the mean of all items in the class.mean of all items in the class.

We compute a weighted mean of the class We compute a weighted mean of the class midpoints using the midpoints using the class frequenciesclass frequencies as weights. as weights.

Similarly, in computing the variance and Similarly, in computing the variance and standard deviation, the class frequencies are standard deviation, the class frequencies are used as weights.used as weights.

Page 59: Chapter 3  Descriptive Statistics:  Numerical Methods

59 59

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Sample DataSample Data

Population DataPopulation Data

where: where:

ffi i = frequency of class = frequency of class ii

MMi i = midpoint of class = midpoint of class ii

Mean for Grouped Data Mean for Grouped Data (分组数据均值)(分组数据均值)

i

ii

f

Mfx

i

ii

f

Mfx

N

Mf iiN

Mf ii

Page 60: Chapter 3  Descriptive Statistics:  Numerical Methods

60 60

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Given below is the previous sample of monthly Given below is the previous sample of monthly rentsrents

for one-bedroom apartments presented here as for one-bedroom apartments presented here as groupedgrouped

data in the form of a frequency distribution. data in the form of a frequency distribution.

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Rent ($) Frequency420-439 8440-459 17460-479 12480-499 8500-519 7520-539 4540-559 2560-579 4580-599 2600-619 6

Page 61: Chapter 3  Descriptive Statistics:  Numerical Methods

61 61

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Mean for Grouped DataMean for Grouped Data

This This approximationapproximation differs by $2.41 fromdiffers by $2.41 from

the actual the actual samplesample mean of $490.80.mean of $490.80.

Rent ($) f i M i f iM i

420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0

Total 70 34525.0

Rent ($) f i M i f iM i

420-439 8 429.5 3436.0440-459 17 449.5 7641.5460-479 12 469.5 5634.0480-499 8 489.5 3916.0500-519 7 509.5 3566.5520-539 4 529.5 2118.0540-559 2 549.5 1099.0560-579 4 569.5 2278.0580-599 2 589.5 1179.0600-619 6 609.5 3657.0

Total 70 34525.0

x 34 52570

493 21,

.x 34 52570

493 21,

.

Page 62: Chapter 3  Descriptive Statistics:  Numerical Methods

62 62

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Variance for Grouped Data Variance for Grouped Data (分组数据方差(分组数据方差))

Sample DataSample Data

Population DataPopulation Data

sf M xn

i i22

1

( )s

f M xn

i i22

1

( )

22

f M

Ni i( ) 2

2

f M

Ni i( )

Page 63: Chapter 3  Descriptive Statistics:  Numerical Methods

63 63

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Example: Apartment RentsExample: Apartment Rents

Variance for Grouped DataVariance for Grouped Data

Standard Deviation for Grouped Data Standard Deviation for Grouped Data (分组数(分组数据标准差)据标准差)

This approximation differs by only $.20 This approximation differs by only $.20

from the actual standard deviation of $54.74. from the actual standard deviation of $54.74.

s2 3 017 89 , .s2 3 017 89 , .

s 3 017 89 54 94, . .s 3 017 89 54 94, . .

Page 64: Chapter 3  Descriptive Statistics:  Numerical Methods

64 64

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

小结小结

中心位置的度量:均值、中位数、众数中心位置的度量:均值、中位数、众数 数据集其它位置的描述:百分位数,四分位点数据集其它位置的描述:百分位数,四分位点 变异程度或分散程度:极差、四分位点内距、方差、变异程度或分散程度:极差、四分位点内距、方差、

标准差、变异系数、标准差、变异系数、 ZZ 分数、切比雪夫定理分数、切比雪夫定理 构建五数概括法和箱形图构建五数概括法和箱形图 两变量之间的协方差和相关系数两变量之间的协方差和相关系数 加权平均值、分组数据的均值、方差和标准差加权平均值、分组数据的均值、方差和标准差

Page 65: Chapter 3  Descriptive Statistics:  Numerical Methods

65 65

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Page 66: Chapter 3  Descriptive Statistics:  Numerical Methods

66 66

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

Page 67: Chapter 3  Descriptive Statistics:  Numerical Methods

67 67

Copyright © 2010, HJ Shanghai Normal Uni. Copyright © 2010, HJ Shanghai Normal Uni.

End of Chapter 3, Part BEnd of Chapter 3, Part B