45
Lesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45

Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Lesson Plan

Representing data with graphs

Representing data with statistics

Homework:

1-8, 1-18, 2-6, 2-18

– p. 1/45

Page 2: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Data

Are there good ways and bad ways to represent data?

Yes, depending on the nature of the data, therepresentation may differ.

Therefore, we first need to realize what kind of data can wepossibly observe.

In order to do that, it is necessary to understand whydata can be different!!

A population is composed of individuals (in a generalsense!! They could be persons, financial stocks,nations, trees, etc..).A sample is a collection of individuals.On each individual we can measure several differentcharacteristics ( variables ).

– p. 2/45

Page 3: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Example

Name Gender Salary Education # Family membersAA F 50 B 2BB M 43 B 3CC F 65 M 1DD M 200 B 2EE M 60 M 4FF M 25 S 2GG F 15 S 0HH F 80 D 3II M 22 S 1JJ F 69 B 4KK F 70 M 2

– p. 3/45

Page 4: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Where

M stands for male

F stands for female

S stands for senior high

B stands for bachelor

M stands for master

D stands for doctorate

Do the data have same nature?Can they be represented them all in the same way?Can they be analyzed in the same way?

– p. 4/45

Page 5: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Terminology

A variable can be usually characterized according to manydifferent criteria. We we will say that

Variables can beQualitative (representing characteristics whichcannot be naturally associated to a number)

Ordinal (although not naturally associated with anumber can be somehow ordered)Non-ordinal

Quantitative (characteristics on which it is possible toapply arithmetic operations)

Discrete (assuming values only on a discrete setlike the integers)Continuous

– p. 5/45

Page 6: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Gender

According to the data, there are 5 males and 6 females inthe sample. A very simple representation is the bar chartrepresentation

Gender

Fre

quen

cy

01

23

45

6

F M

– p. 6/45

Page 7: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary: frequencies

The analysis of the salary is a little bit more complicated. Anatural approach would be to divide data into categories.For example smaller or bigger than 100 (thousands).

This representation is called histogram

Salary

Fre

quen

cy

0 50 100 150 200

02

46

810

– p. 7/45

Page 8: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary: distribution

The analysis of the salary can also be conducted in relativeterms

Salary

Den

sity

0 50 100 150 200

0.00

00.

002

0.00

40.

006

0.00

8

– p. 8/45

Page 9: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

The construction of the distribution requires a little bit ofattention since it is based on the relative frequencies

Relative frequency =frequency

sumof all frequencies

f<100 = # salaries<100# salaries

= 1011

f≥100 = # salaries≥100# salaries

= 111

Then the relative frequencies must be distributed evenlyalong the cells

height<100 = f<100

base<100

= 1011×(100−0) = 0.009

height≥100 =f≥100

base≥100

= 111×(200−100) = 0.001

– p. 9/45

Page 10: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

There is an important property for the histogram built interms of relative terms:

The total area of the rectangles is equal to 1

height<100 × base<100 + height≥100 × base≥100 =

10

11 × (100 − 0)× (100− 0) +

1

11 × (100 − 0)× (100− 0) = 1

This property is worth to remember because it willwill be recalled later in the course.

– p. 10/45

Page 11: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary: frequencies (higher detail)

The number of cells is however arbitrary (there is no naturaldistinction as in the gender case).

For example, it is possible group data according to thefollowing scheme (0,50,100,150,200)

Salary

Fre

quen

cy

0 50 100 150 200

01

23

45

– p. 11/45

Page 12: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary: frequencies (even higher detail)

For example, it is possible group data according to thefollowing scheme(0,20,40,60,80,100,120,140,160,180,200)

Salary

Fre

quen

cy

0 50 100 150 200

01

23

4

– p. 12/45

Page 13: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Education

The education variable is qualitative as the gender.However, for the gender there is no natural ordering.

For the education variable there seems to be a generalordering: from SENIOR HIGH to DOCTORATE level.

Education

Fre

quen

cy

01

23

4

SH B M D– p. 13/45

Page 14: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Number of family members

Finally the the variable number-of-family-members isquantitative as the salary.

However, the number-of-family-members is restricted toa the set of integer numbers.

Family Members

Fre

quen

cy

0 1 2 3 4

01

23

4

– p. 14/45

Page 15: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

P-Percentile

Definition

The p-percentile is the observation within the samplesuch that p% of the remaining observations are equalor lower than the p-percentile.

The position of the p − percentile is p100(n + 1)

the 25% − percentile is called 1stquartilethe 50% − percentile is called Medianthe 75% − percentile is called 3stquartile

It is important to notice that in order to evaluate thepercentile, the data must be qualitative ordinal orquantitative

– p. 15/45

Page 16: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

P-Percentile in practice (Salary)

For the salary

The original data are: (50 43 65 200 60 25 15 80 22 6970)

The sorted data are : (15 22 25 43 50 60 65 69 70 80200)

salary[0.25∗(11+1)] = salary[3] = 25

salary[0.5∗(11+1)] = salary[6] = 60

salary[0.75∗(11+1)] = salary[9] = 70

What if only 10 data were available? A little bit more difficult!

IMPORTANT the evaluation of percentile may differaccording to the book!!

– p. 16/45

Page 17: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

P-Percentile (Number of family members)

For the salary

The original data are: (2 3 1 2 4 2 0 3 1 4 2)

The sorted data are : (0 1 1 2 2 2 2 3 3 4 4)

nfm[0.25∗(11+1)] = nfm[3] = 1

nfm[0.5∗(11+1)] = nfm[6] = 2

nfm[0.75∗(11+1)] = nfm[9] = 3

In some cases, the p-percentile does not coincideexactly with an observation. This can be a problemfor quantitative discrete data.

– p. 17/45

Page 18: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

P-Percentile (Tricky Case)

Suppose you have the following data

y=1,3,6,8What are the q1,median, q3?

According to the percentile formula the locations are0.25 × 5 = 1.25

0.5 × 5 = 2.5

0.755 × 5 = 3.75

Thereforeq1 = 1 + (3 − 1) ∗ 0.25 = 1.5 (or between 1 and 2)median = 3 + (6 − 3) ∗ 0.5 = 4.5 (or between 3 and 6)q3 = 36(8 − 6) ∗ 0.75 = 7.5 (or between 6 and 8)

What about qualitative ordinal and quantitative discretevariables?

– p. 18/45

Page 19: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

The Box-plot

The box plot is a particular graphical representation whichcombines

min

q1

median

q3

max

The box-plot cannot be built for the Gender variableThe Education variable is ordinal. Therefore it is intheory possible to determine q1,median, q3,min,max.However, the spatial connotation would be arbitrary.

– p. 19/45

Page 20: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Box-plot for Salary

50 100 150 200

Salary

– p. 20/45

Page 21: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Box-plot for Number-of-family-members

0 1 2 3 4

Family Members

– p. 21/45

Page 22: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Describing a distribution with numbers

So far we have learned how to represent a distributionwith graphs.

Graphs are good at “marketing” your analysis, butnot necessarily the best way to achieve a deepunderstanding of the phenomenon.

On the other hand, a table with lots of numbers givesyou lots of information but it is not “appealing”.

What is the right way?

It depends mostly on the person you are talking to.

– p. 22/45

Page 23: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Center of a Distribution

The most obvious starting point is the center of thedistributions.

Suppose I want to buy a house in San Francisco and Ihave no-perspective, no-clue, not even a vague idea ofwhat the price for a studio is. What should I do?

I should probably contact a friend in San Franciscoand askOn average, how much should I expect to pay

if I want to but a decent studio?I am basically asking what is the center of thedistribution!!!

– p. 23/45

Page 24: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Terminology

We will say that

A parameter is a descriptive measure of a population.

A statistic is a descriptive measure of a sample.

In many cases we will say (later in the course) that statisticsare approximations of parameters.

Some approximations will be better than others.

We will see which are good approximations.

For the moment just know that we are focusing on thesample part.

– p. 24/45

Page 25: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Mode

The mode of a variable is the most frequent observationoccurring in the data set.

Mode(gender)=FEMALE [6(F) and 5(M)]

Mode(education)=BACHELOR [4(B), 3(S), 3(M), 1(D)]

Mode(# family members)=2 [4(2), 2(1), 2(3), 2(4), 1(0)]

If you look at the graphs you will realize that the modesusually coincide with the peaks of the graphicalrepresentation.

What about Mode(salary)?Because the variable is continuous, we need togroup the data. But this is an arbitrary operation,therefore the mode will depend on the choice of thegroups.

– p. 25/45

Page 26: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary Mode: TRICKY

SalaryD

ensi

ty

0 50 100 150 200

0.00

00.

005

0.01

00.

015

Salary

Den

sity

0 50 100 150 200

0.00

00.

002

0.00

40.

006

0.00

80.

010

0.01

20.

014

What is the mode?Example of suggesting a fact which is not “TRUE”

– p. 26/45

Page 27: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Mean

Given y1, y2, . . . , yn are the n variable observations within asample, then the mean is defined to be

y =

nP

i=1

yi

n

What about the mean of the gender and education?

They cannot be computed because they are qualitativevariable.

You cannot apply arithmetic operations on words!!

mean(# family members) = 2+3+1+2+4+2+0+3+1+4+211 =

2.18

2.18 people doesn’t make much sense! But it is nota big deal after all.

mean(salary) =? Again it can be tricky!!!– p. 27/45

Page 28: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary Mean: TRICKY

If the data are NOT grouped then the mean is equal to

50 + 43 + 65 + 200 + 60 + 25 + 15 + 80 + 22 + 69 + 70

11= 63.54

if data are grouped0-20 20-40 40-60 60-80 80-180 180-200

ni 1 2 3 4 0 1ni

n.09 .18 .27 .36 0 .09

ni

n×base.005 .009 .014 .018 0 .005

1×10+2×30+3×50+4×70+0×130+1×19011 = 62.73

.09×10+.18×30+.27×50+.36×70+0×130+.09×1901 = 62.1

.005×10+.009×30+.014×50+.018×70+0×130+.005×1901 × 20 =

64.6– p. 28/45

Page 29: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Median

The median is a statistic based on the order of theobservation.

It is a particular case of percentile: the 50-percentile.Mode(education)=Bachelor

since [SSSBB(B)BMMMD]Mode(# family members)=2

since [01122(2)233]median(gender) =?

It cannot be evaluated because the variable is notordinal

median(Salary) =?Like usual it takes extra care.

– p. 29/45

Page 30: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary Median: TRICKY

If the data are NOT grouped then the median is 60Since [15 22 25 43 50 (60) 65 69 70 80 200]

if data are grouped0-20 20-40 [40-60] 60-80 80-180 180-200

ni 1 2 3 4 0 1ni

n.09 .18 .27 .36 0 .09

ni

n×base.005 .009 .014 .018 0 .005

k∑

i=1ni 1 3 (6) 10 10 11

k∑

i=1ni

ni

n0.09 0.27 (0.54) 0.90 0.90 1∗

k∑

i=1

ni

n×base0.005 0.014 (0.028) 0.046 0.046 0.050∗

– p. 30/45

Page 31: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Left Skewness

Left skewed

x

Fre

quen

cy

−0.07 −0.06 −0.05 −0.04 −0.03 −0.02 −0.01 0.00

020

040

060

080

010

0012

0014

00 meanmedianmode

– p. 31/45

Page 32: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Right Skewness

Right skewed

x

Fre

quen

cy

0.00 0.01 0.02 0.03 0.04 0.05 0.06

050

010

0015

00

meanmedianmode

– p. 32/45

Page 33: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Symmetry

Symmetric

x

Fre

quen

cy

−4 −2 0 2 4

020

040

060

080

0 meanmedianmode

– p. 33/45

Page 34: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

When to use what

Mean: quantitative data and the frequency distributionis approximately symmetric

Median: quantitative data and the frequency distributionis skewed (left or right)

Mode: When most frequent observation is desiredmeasure of central tendency or the data are qualitative

– p. 34/45

Page 35: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Spread of a Distribution

When describing a distribution, the indication of the “center”may be not sufficient.

In many case we want to know how much data aredispersed.

For example: financial analysts are usually interest in theexpected returns (think of it as the mean) and the risk(think of it as the spread)

In this sense, the “spread” indicates how muchuncertainty characterizes the expected return.

consider two hypothetical stock returns at time 1,2,3,4,5:

-2,-1,0,1,2 with mean 0

-200,-100,0,100,200 with mean 0

– p. 35/45

Page 36: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Range

The range is simply defined as

range = max − min

range(salary) = 200 − 15 = 185

range(# family members) = 4 − 0 = 4

range(gender) =not possiblerange(education) =not possible

– p. 36/45

Page 37: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Inter-quartile Range

The Inter-quartile range is simply defined as

IQR = q3 − q1

where q1 and q3 are the 1st and 3rd quartile.Therefore

IQR(salary) = 70 − 25 = 45

IQR(# family members) = 3 − 1 = 2

IQR(gender) =not possible

IQR(education) =not possible

– p. 37/45

Page 38: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Variance

The variance, s2, is computed as the sum of the squareddeviations about the mean, x, divided by (n-1).

s2 =

n∑

i=1(xi − x)2

n − 1

variance(gender) = not possible!!

variance(education) = not possible!!

variance(# family members) =

[(2 − 2.18)2 + (3 − 2.18)2 + (1 − 2.18)2 + (2 − 2.18)2 + (4 −

2.18)2 + (2 − 2.18)2 + (0 − 2.18)2 + (3 − 2.18)2 + (1 −

2.18)2 + (4 − 2.18)2 + (2 − 2.18)2]/(11 − 1) = 1.56

– p. 38/45

Page 39: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Salary Variance

If data are not grouped

variance(salary) =

[(50−63.54)2+(43−63.54)2+(65−63.54)2+(200−63.54)2+

(60−63.54)2 +(25−63.54)2 +(15−63.54)2 +(80−63.54)2 +

(22−63.54)2+(69−63.54)2+(70−63.54)2]/(11−1) = 2515.07

If data are grouped, please read the book!!!!

The idea is simple and similar to the meanYou need to approximate each group around thecentral value!

– p. 39/45

Page 40: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Standard Deviation

The standard deviation is defined as the square root of thevariance:

sd(gender) = not possible!!

sd(education) = not possible!!

sd(# family members) = sqrt(1.56) = 1.25

sd(salary) = sqrt(2515.07) = 50.15 (if data are notgrouped)

IMPORTANT: why do we need the standard deviation if wealready have the variance?

– p. 40/45

Page 41: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Degrees of freedom

The degrees of freedom represent the effective number ofvalues free to vary in the computation a statistic??!!!

What does that mean?What is the variance of a sample characterized byone observation (ex y=3)? Why dividing by (n − 1)and not n?

var(y) =(3 − y)2

1

since y = 31 = 3

var(y) =(3 − 3)2

1= 0

essentially we cannot use y = 3 twice.

– p. 41/45

Page 42: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Empirical Rule

If a distribution is roughly bell shaped, then

Approximately 68

Approximately 95

Approximately 99.7

– p. 42/45

Page 43: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Bell−Shaped

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4 68%

95%99.7%

– p. 43/45

Page 44: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Outliers

The outliers are “strange” values which seem not be inaccordance with the rest of the distribution.

Think about the salary variable. One person earns200000$. Much more than the rest of the people in thesample.

Generally we can consider this observation anoutlier.

As a general rule we can define an outlier to be any value

yi < q1 − 1.5 × IQR

yi > q3 + 1.5 × IQR

In the salary variable it turns out that these values are

25 − 1.5 × (70 − 25) = −42.25

70 + 1.5 × (70 − 25) = 137.5 < 200 !!!!!– p. 44/45

Page 45: Lesson Plan - Statistical Sciencecm160/chapter2.pdfLesson Plan Representing data with graphs Representing data with statistics Homework: 1-8, 1-18, 2-6, 2-18 – p. 1/45 Data Are there

Linear Transformation

Please read the book!

– p. 45/45