131
GIST141B, Fall, 1999 Revised 8/11/99 1 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

Embed Size (px)

Citation preview

Page 1: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 1

Modeling Populations

Introduction to the Normal Distribution

Page 2: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 2

A Look AheadWhat you will be able to do after this lecture

Understand two reasons why experiments provide an imperfect window on reality

Understand and describe: Population Population parameter Sample

Given a statement involving a statistical inference identify the population to which it applies

Distinguish between a statistical inference and a statement that merely summarizes the results of a study.

Page 3: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 3

ExperimentsThe Scientist’s Imperfect Window on Reality

The way things re

ally are

An Experiment

Page 4: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 4

Two Reasons That the Window is Imperfect

1 The data exhibit variation

2 The experimental data are an incomplete sample the real system

A goal of scienceTo use our data to make generalizations

about the way things really are.

A goal of scienceTo use our data to make generalizations

about the way things really are.

Page 5: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 5

The Effects of Variation and Sampling

Page 6: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 6

Definitions

Population The collection of all entities or quantities about which

you wish to make a generalization

Population parameter The particular characteristic of the population that

you wish to study The quantity whose value you wish to estimate

Sample A subset of the population which you select and then

measure The source of the data that you will use to estimate

the population parameter

Page 7: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 7

The Population: What “Really is”

Page 8: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 8

The Sample:What We See

Page 9: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 9

Example #1

A researcher is trying to determine the average income of JMU students one year after graduation. He randomly selects 100 students from each graduating class and the contacts them and obtains their annual income.

Population:

Population parameter:

Sample:

All JMU graduates one year past graduation

100 graduates from each class

Average annual income

Page 10: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 10

Example #2

A chemical engineer is evaluating a new chemical process for making nylon polymer. He wishes to determine the average tensile strength of nylon yarn that is produced by the new process. He manufactures 500 samples of nylon yarn and measures the tensile strength of each sample.

Population:

Population parameter:

Sample:

All nylon yarn that will be made with the new process

Average tensile strength

500 samples of nylon yarn

Page 11: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 11

Example #3

An environmental scientist is trying to determine the rate at which stratospheric ozone is depleting over the South Pole. Over a five-year period, she measures ozone levels by means of instrumentation on a weather balloon that is released daily from a research station near the South Pole.

Population:

Population parameter:

Sample:

The stratosphere over the South Pole

The rate at which ozoneis decreasing.

Daily samples over a station near the South Pole (approximately 1825 samples)

Page 12: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 12

Two Different Types of Populations

A collection of distinct entities People, bacteria, etc

A theoretical or existing system The stratosphere A new chemical process

Page 13: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 13

DefinitionA statistical inference is a generalization about the population that is made from a sample of data.

Example 1 Sample data: A particular pair of shoes was purchased

from a shoe store at a price that was much cheaper than the same pair could have been purchased elsewhere.

Generalization: Any shoes purchased from this store will be cheaper than what they would cost elsewhere. (Population: All the shoes purchased from this store)

Example 2 Sample data: Professor Snigglehopper gave easy tests

in English 102. 1st Generalization: Professor Snigglehopper will give

easy tests in English 425. (Population: All of Professor Snigglehopper’s tests)

2nd Generalization: All English professors give easy tests (Population: All tests given by all English profs)

Page 14: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 14

Exercise

Galileo rolled balls down inclined ramps and measured the distance covered in a fixed time. Suppose he did 25 replicate runs on a ramp with an incline of 5o. Based on these runs, Galileo found the average distance traveled in one second was 0.41 meters. For each statement below, determine if the statement is a statistical inference or if it is not.• “On average, a ball will roll 0.41 meters down a 5o incline in

1 second.”• “The balls traveled an average of 0.41 meters down the

incline in 1 second in my experiments.”• “A ball will generally roll about 0.4 meters down a 5o incline

in 1 second.”• “A ball will generally roll somewhere between 0.1 and 7

meters down a 5o incline in 1 second.”• “The balls did not all travel exactly the same distance in 1

sec.”

Page 15: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 15

Wrap-upWhat you should be able to do

Understand two reasons why experiments provide an imperfect window on reality

Understand and describe: Population Population parameter Sample

Given a statement involving a statistical inference identify the population to which it applies

Distinguish between a statistical inference and a statement that merely summarizes the results of a study.

Page 16: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 16

The Normal Distribution Model

Basic Theory

Page 17: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 17

A Look AheadWhat you will be able to do after this lecture

Explain the importance of the normal distribution for modeling populations

State the properties of the normal distribution Bell-shaped, symmetric Areas under the curve represent proportions of

the population Bell centered over Width of bell determined by N(,) notation

Page 18: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 18

The Normal Distribution Model

A mathematical model for describing the likelihood of getting particular measured values from the population whenever you take a sample

Accounts for the variability in the population Uses probability to model the randomness in the

population Provides a basis for making statistical inferences

from sample data Predictions about the overall makeup of the population Inferences about the population mean Comparisons of means of two or more populations

Page 19: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 19

ExampleVerbal SAT scores for the Class of 2000.

Verbal SAT Scores: JMU Class of 2000

0

100

200

300

400

500

600

700

27

5 to

29

9

32

5 to

34

9

37

5 to

39

9

42

5 to

44

9

47

5 to

49

9

52

5 to

54

9

57

5 to

59

9

62

5 to

64

9

67

5 to

69

9

72

5 to

74

9

77

5 to

79

9

X XX XX

XX

X

XXX

XX X

Data from the first 14 students

Data from the first 14 students

Page 20: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 20

ExampleVerbal SAT scores for the Class of 2000.

Verbal SAT Scores: JMU Class of 2000

0

100

200

300

400

500

600

700

27

5 to

29

9

32

5 to

34

9

37

5 to

39

9

42

5 to

44

9

47

5 to

49

9

52

5 to

54

9

57

5 to

59

9

62

5 to

64

9

67

5 to

69

9

72

5 to

74

9

77

5 to

79

9

# S

tud

en

ts

Data from all 3165 students

Data from all 3165 students

Page 21: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 21

ExampleVerbal SAT scores for the Class of 2000.

Verbal SAT Scores: JMU Class of 2000

0

100

200

300

400

500

600

700

27

5 to

29

9

32

5 to

34

9

37

5 to

39

9

42

5 to

44

9

47

5 to

49

9

52

5 to

54

9

57

5 to

59

9

62

5 to

64

9

67

5 to

69

9

72

5 to

74

9

77

5 to

79

9

# S

tud

en

tsThis is the shape of

the normal distribution.

It models the shape of this histogram

Page 22: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 22

The Normal DistributionModels the shape of many different populations

Dependent variable (Y) = probability density Larger values indicate values that are more

common Values near zero indicate values that are

uncommon Independent variable (X) = possible value in the

population (i.e. Verbal SAT score) Parameters: (population mean) and

(population standard deviation Model wheree

2

1Y

2

2

2

X

,)(

= population standard deviation

= population mean

and are based on measuring every entity in the population (i.e. not based on sample data)

and are based on measuring every entity in the population (i.e. not based on sample data)

Page 23: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 23

The Normal Distribution

X = Possible Values in the Population

Y =

Pro

bab

ilit

y D

ensi

ty

The Normal DistributionModels the shape of many different populations

2

2

2

X

e2

1Y

)(

Page 24: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 24

Important Facts About the Normal Distribution

The Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

Area = 1.0(100% of the population)

Total area under the normal curve

is equal to 1.0(100% of the population)

Total area under the normal curve

is equal to 1.0(100% of the population)

Page 25: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 25

Important Facts About the Normal Distribution

The Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

Area = proportion of the population with values

between A and B.

Area = proportion of the population with values

between A and B.

A B

Page 26: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 26

The Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

Important Facts About the Normal Distribution

“Bell” is centered over the

Population Mean

“Bell” is centered over the

Population Mean

Values near the mean are much more common than values far from the mean

Values near the mean are much more common than values far from the mean

Most likelyrange ofvalues

Least likely range of values

Page 27: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 27

The Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

Values below the mean are just as likely as values above the mean

Values below the mean are just as likely as values above the mean

Important Facts About the Normal Distribution

Page 28: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 28

Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

68.3% of the population is within ONE standard

deviation from the mean

68.3% of the population is within ONE standard

deviation from the mean68%

Important Facts About the Normal Distribution

1 1

Page 29: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 29

Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

95.6% of the population is within TWO

standard deviations from

the mean

95.6% of the population is within TWO

standard deviations from

the mean 95%

Important Facts About the Normal Distribution

1.96 1.96

Page 30: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 30

Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

99.7% of the population is within THREE

standard deviations from

the mean

99.7% of the population is within THREE

standard deviations from

the mean99.7%

Important Facts About the Normal Distribution

3 3

Page 31: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 31

Exercise

JMU’s class of 2000 has verbal SAT scores that follow a normal distribution with a mean of 584 and a standard deviation of 67. Give a range of SAT scores within which 95% of the scores from the class of 2000 will fall.

A: 95% will have scores between 517 & 651

B: 95% will have scores between 450 & 718

C: 95% will have scores between 555 & 613

D: I haven’t got a clue

Page 32: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 32

The Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

The population standard deviation

() determines the width of the bell

The population standard deviation

() determines the width of the bell

1

2

1 > 21 > 2

Important Facts About the Normal DistributionImportant Facts About the Normal Distribution

The larger the standard

deviation, the wider the bell.

The larger the standard

deviation, the wider the bell.

Page 33: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 33

N(,)

Normal Distribution: N(, )

Possible Values in the Population

Pro

bab

ility

De

nsi

ty

Notation for the Normal Distribution

determines the width of the

bell

determines the width of the

bell

determines the location of the bell

determines the location of the bell

N( , ) refers to a normal distribution model

with a mean of and a

standard deviation of

2

2

2

X

e2

1Y

)(

Page 34: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 34

Example TerminologyThe population of Verbal SAT scores for the JMU Class of 2000 follow a N(584,67) distribution

Verbal SAT Scores: JMU Class of 2000

0

100

200

300

400

500

600

7002

75

to 2

99

32

5 to

34

9

37

5 to

39

9

42

5 to

44

9

47

5 to

49

9

52

5 to

54

9

57

5 to

59

9

62

5 to

64

9

67

5 to

69

9

72

5 to

74

9

77

5 to

79

9

# S

tud

en

ts

2

2

672

584SAT

e267

1Y )(

)(

*

Sample data from the

class of 2000

Page 35: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 35

Example TerminologyVerbal SAT scores for the JMU Class of 2000 follow a N(584,67) distribution

NOTATION“Verbal SAT scores ~ N(584,67)” means

“The population of verbal SAT scores follow a N(584,67) distribution.”

NOTATION“Verbal SAT scores ~ N(584,67)” means

“The population of verbal SAT scores follow a N(584,67) distribution.”

Page 36: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 36

Match and to the correct distribution

The Normal Distribution

-2 -1 0 1 2 3 4 5 6

Possible Values in the Population

Pro

bab

ility

Den

sity

N(3,0.5)

N(2,1.0)

N(4.5,0.5)

N(3,1.0)

Page 37: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 37

Wrap-upWhat you should be able to do

Explain the importance of the normal distribution for modeling populations

State the properties of the normal distribution Bell-shaped, symmetric Areas under the curve represent proportions of

the population Bell centered over Width of bell determined by N(,) notation

Page 38: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 38

Applying theNormal Distribution Model

Using the Normal Distributionto Make Statistical InferencesAbout the Population

Page 39: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 39

A Look AheadWhat you will be able to do after this lecture

Understand Z-scores Definition/formula Interpretation Use the table of the Standard Normal (z) Distribution

and Z-scores to find areas under the normal curve

Determine what proportion of a normally distributed population falls in a given range

Determine a range of values within which a specified proportion of a normally distributed population will fall

Page 40: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 40

Using the Normal Distribution Model

The Normal Distribution

X = Possible Values in the Population

Y =

Pro

ba

bili

ty D

en

sit

y

A B

By calculating areas under the N(m,s) curve, we can predict

how often certain measurement values will

occur

By calculating areas under the N(m,s) curve, we can predict

how often certain measurement values will

occur

Area = probability that future observed values will fall

between A and B

Area = probability that future observed values will fall

between A and B

Page 41: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 41

Example: Average Monthly Ozone Readings at Syowa ~ N(300,40) distribution.

N(300,40) Distribution Modelfor Ozone at Syowa

160

195

230

265

300

335

370

405

440

Possible Ozone Readings(Monthly Average)

Pro

bab

ility

De

nsi

ty

Area = probability that a randomly chosen month will have an

average reading above 335 Dobson

units

Area = probability that a randomly chosen month will have an

average reading above 335 Dobson

units

Area = probability that a randomly

chosen month will have an average

reading between 230 and 300 Dobson units

Area = probability that a randomly

chosen month will have an average

reading between 230 and 300 Dobson units

Page 42: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 42

Using Z-scores to Calculate Probabilities from a Normal Distribution

Definition: The Z-score for a value (X) from a normal distribution is equal to that value’s distance from the mean, in standard deviations, i.e.

Z scoreX

Z-score converts the scale of the data from a N(m,s) distribution to a N(0,1) distribution (the standard normal distribution)

The larger the Z-score, the further X is from the population mean

Uses: To find areas under the N(m,s) curve

Page 43: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 43

Exercise: Syowa Ozone ~ N(300,40)Calculate the Z-scores for these ozone values

Ozone at Syowa

260

390

388

Z-score

-1.00

2.25

2.20

A monthly average of 260 is 1 standard deviation

below the mean

A monthly average of 390 is 2.25 standard deviations

above the mean

A monthly average of 388 is 2.20 standard deviations

above the mean

Page 44: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 44

Areas under the Standard Normal CurveSee page 714 in the Triola Text

0 1 2 3-3 -2 -1

9.87% of the population lies between the mean

and 0.25 standard deviations above the

mean

9.87% of the population lies between the mean

and 0.25 standard deviations above the

meanArea =0.0987

Standard Normal (z) DistributionZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.02390.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.06360.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.10260.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.14060.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772

Table A-2

Page 45: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 45

Standard Normal (z) DistributionZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.02390.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.06360.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.10260.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.14060.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772

Table A-2

Areas under the Standard Normal CurveTable entries are the area to the left of Z.

0 1 2 3-3 -2 -1

15.54% of the population is between

the mean and 0.40 standard deviations

above the mean

15.54% of the population is between

the mean and 0.40 standard deviations

above the mean

Area =0.1554

Page 46: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 46

Areas under the Standard Normal CurveTable entries are the area to the left of Z.

0 1 2 3-3 -2 -1

Standard Normal (z) DistributionZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.02390.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.06360.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.10260.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.14060.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772

Table A-2

0.1554 - 0.0987 = 0.0567Hence, 5.67% of the

population is between 0.25 and 0.40 standard

deviations above the mean

0.1554 - 0.0987 = 0.0567Hence, 5.67% of the

population is between 0.25 and 0.40 standard

deviations above the mean

Page 47: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 47

ExerciseUse the table of “Areas Under the Standard Normal Curve” to calculate the proportion of the population falling in the ranges specified.

What fraction of the population will be at most 2 standard deviations above the mean? A: 2.28% B: 97.72% C: 57.93% D: 47.72%

What percent of the population will be at least 2.27 standard deviations above the mean? A: 1.16% B: 98.84% C: 98.82% D: Don’t know

How much of the population will have values somewhere between 1.03 standard deviations below the mean and 1.75 standard deviations above the mean? A: 0.1515% B: 95.99% C: 0.8084% D: 80.84%

Page 48: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 48

Applying Z-scores and Areas Under the Standard Normal Curve

Hint: Draw a picture of the N(500,100) distribution and shade in the area under the

curve that you are interested in. Then convert to z-scores

and find the area.

Suppose that it is known that the VERBAL SAT scores of all H. S. Seniors in the U.S. follow a N(500,100) distribution. What fraction of students receive a score of 650 or less?

500 600 700 800200 300 4001.50Z-score

for SAT of 650

Page 49: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 49

Applying Z-scores and Areas Under the Standard Normal Curve

500 600 700 800200 300 4001.50Z-score

for SAT = 650

Suppose that it is known that the VERBAL SAT scores of all H. S. Seniors in the U.S. follow a N(500,100) distribution. What fraction of students receive a score of 650 or less?

1.5 0.43321.6 0.44521.7 0.45541.8 0.46411.9 0.4713

From Table A-2in Triola,

with Z = 1.50

Area =0.4332

Area= 0.5

TotalArea =0.9332

Page 50: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 50

Applying Z-scores and Areas Under the Standard Normal Curve

Hence, 93.3% of H.S. seniors

receive a score of 650 or less.

500 600 700 800200 300 4001.50Z-score

for SAT = 650

Suppose that it is known that the VERBAL SAT scores of all H. S. Seniors in the U.S. follow a N(500,100) distribution. What fraction of students receive a score of 650 or less?

TotalArea =0.9332

Page 51: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 51

Exercise

What fraction of H.S. seniors get a score of 325 or less on the verbal SAT? A: 95.99% B: 4.01% C: 95.91% D: Don’t know

How many students score above 730 on the verbal SAT? A: 0.99% B: 98.9% C: 48.9% D: 1.07%

Page 52: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 52

Exercise

One of the Big Three automakers sells four-door sedan with an advertised highway fuel efficiency of 33 mpg. The fact is that the population of autos of this type will average 33 mpg, with some variation around that average. Suppose that the actual efficiency achieved by this population of autos is normally distributed with a mean of 33 and and a standard deviation of 6.4 mpg. What proportion of autos of this design will get mileage levels exceeding 40 mpg?

A: 1.09% B: 0.8621% C: 13.79% D: 86.21%

Page 53: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 53

The automaker is considering giving a 95% range of mpg ratings for this automobile. Under this thinking, the sales sticker will display a range of values (centered over the average rating of 33 mpg) within which 95% of autos of this type will fall. What range of values should the automaker use?

Z-scorescale

Z-scorescale

mpg scalempg scale

Area =0.95

1.96-1.96

33

0

20.5 45.5

The automaker can advertise that 95% of these cars

will have fuel efficiency

between 20.5 and 45.5 mpg.

The automaker can advertise that 95% of these cars

will have fuel efficiency

between 20.5 and 45.5 mpg.

Page 54: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 54

Wrap-upWhat you should be able to do

Understand Z-scores Definition/formula Interpretation Use the table of Areas Under the Standard Normal

Curve Using Z-scores to find areas under the standard

normal curve

Determine what proportion of a normally distributed population falls in a given range

Determine a range of values within which a specified proportion of a normally distributed population will fall

Page 55: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 55

Statistical Inferences About the Population Mean

Using the Normal Probability Model to Make Statistical Inferences

Page 56: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 56

A Look AheadWhat you will be able to do after this lecture

State the requirements for a statistical inference to be valid and apply them to evaluate the validity of a given inference.

Calculate and interpret a (1 - )% confidence interval for the mean (by hand and by using JMP IN®) When the exact value of the population standard

deviation is known When the exact value of is not known

Use the (1-)% confidence interval to make inferences about the population mean

Distinguish between valid and invalid interpretations of the (1 - )% confidence interval.

Page 57: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 57

Many Studies are Aimed at Making Inferences about the Population Mean

Objective of the studyTo estimate the average annual income of JMU graduates one year after graduation.

PopulationAll JMU graduates 1 year after graduation

Population parameterMean annual income (in dollars)

Example #1

Page 58: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 58

Objective of the studyTo determine the average tensile strength of nylon yarn made from a new manufacturing process.

PopulationAll nylon yarn made from the new process

Population parameterMean tensile strength (in g/cm2)

Example #2

Many Studies are Aimed at Making Inferences about the Population Mean

Page 59: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 59

Objective of the studyTo determine the rate (in grams/mile) at which the 1998 Ford Taurus engine emits hydrocarbons (under normal driving conditions).

PopulationAll Ford Taurus automobiles

Population parameterMean emissions rate of HC’s (in grams/mile)

Example #3

Many Studies are Aimed at Making Inferences about the Population Mean

Page 60: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 60

Making Statistical Inferences About the Population Mean

Goal

To make a valid statistical inference

about the value of based on the

value of the sample estimate

Goal

To make a valid statistical inference

about the value of based on the

value of the sample estimate X

Page 61: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 61

Two Requirements for a Statistical Inference to be Valid

1 The data come from an unbiased sample of the population Samples were randomly selected Every subject in the population had an equal

chance of being selected

2 The inference accurately states the degree of certainty in the conclusion

Page 62: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 62

Example of a Credible InferenceAbout the Population MeanIn order to determine the distance traveled in one second by a ball rolling down a 5o incline, suppose Galileo made 20 replicate runs and recorded the distance traveled (in meters) in one second for each run. The 20 replicates yielded an average distance of 0.41 meters. In addition, suppose that we know that the recorded distances traveled in one second follow a normal distribution with a standard deviation of 0.06 meters. Based on these data we can state with 95% confidence that the true average distance that a ball will travel in one second on this incline is between 0.38 and 0.44 meters.

Population =

Population Parameter =

Sample data =

All possible distances the ball could travel in 1 second The mean distance that a ball will travel in 1 second Distances from 20 replicate runs

Page 63: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 63

In order to determine the distance traveled in one second by a ball rolling down a 5o incline, suppose Galileo made 20 replicate runs and recorded the distance traveled (in meters) in one second for each run. The 20 replicates yielded an average distance of 0.41 meters. In addition, suppose that we know that the recorded distances traveled in one second follow a normal distribution with a standard deviation of 0.06 meters. Based on these data we can state with 95% confidence that the true average distance that a ball will travel in one second on this incline is between 0.38 and 0.44 meters.

The first three sentences summarize whathappened in the experiment

(i.e. 20 reps; avg distance of 0.41 meters) and what we know about this population

(i.e. the population ~ N(,0.06)).They DO NOT state a statistical inference

Example of a Credible InferenceAbout the Population Mean

Page 64: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 64

In order to determine the distance traveled in one second by a ball rolling down a 5o incline, suppose Galileo made 20 replicate runs and recorded the distance traveled (in meters) in one second for each run. The 20 replicates yielded an average distance of 0.41 meters. In addition, suppose that we know that the recorded distances traveled in one second follow a normal distribution with a standard deviation of 0.06 meters. Based on these data we can state with 95% confidence that the true average distance that a ball will travel in one second on this incline is between 0.38 and 0.44 meters.

Notice the statement of the degree of certainty in its truth.

Example of a Credible InferenceAbout the Population Mean

The last sentence uses the experimental data to generalize the results to the entire.population. This is a statistical inference

Page 65: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 65

Definition:The Confidence Interval for the Mean

Example If we want to have 95% confidence, we use = 0.05

Names for Type I error probability -level significance level

A (1-)% confidence interval for the mean is a range of values running from a lower bound to an upper bound wherein we can be (1-)% confident that the true population mean falls.

Page 66: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 66

Formula for Calculating a (1-)% Confidence Interval for the Mean When the Exact Value of is known

Lower Bound (LB) Upper Bound (UB)

nZX

2 nZX

2

Where n = # of samples and where

is chosen so that the area under the

normal curve to the right of is

Where n = # of samples and where

is chosen so that the area under the

normal curve to the right of is 2

Z

2Z

Page 67: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 67

A Return to the Earlier Galileo Example: Calculating a 95% Confidence Interval for the Mean

n = 20 sample (20 replicate runs)

meters 410X .

= 0.06 meters

= 0.05 (because (1-) = 0.95)

96102502

.ZZ .

Page 68: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 68

A Return to the Earlier Galileo Example: Calculating a 95% Confidence Interval for the Mean

38020

060961410

2

.

...

nZXLB

44020

060961410

2

.

...

nZXUB

We are 95% certain that the true average distance traveled down the 5o incline in one second is between 0.38 and 0.44 meters.

We are 95% certain that the true average distance traveled down the 5o incline in one second is between 0.38 and 0.44 meters.

Page 69: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 69

Valid Interpretations of the Confidence Interval for the Mean

You must state the range and the confidence level.“We are (1 - )% confident that the mean is

somewhere between LB and UB.”

You must state the range and the confidence level.“We are (1 - )% confident that the mean is

somewhere between LB and UB.”

Valid“We are 95% confident that the mean distance traveled in 1 second is between 0.38 and 0.44 meters.”

Not valid“The mean distance traveled in 1 second is between 0.38 and 0.44 meters.”

Page 70: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 70

Valid Interpretations of the Confidence Interval for the Mean

Valid“We can claim with 95% confidence that the mean distance traveled in 1 second is different than 0.48 meters.”

Not Valid“We can claim that the mean distance traveled is different than 0.48 meters.” (need to state the confidence level)

We CAN CLAIM that the population mean is different from any of the values OUTSIDE the

confidence interval, provided we state the confidence level of our claim [(1 - )%

confidence].

We CAN CLAIM that the population mean is different from any of the values OUTSIDE the

confidence interval, provided we state the confidence level of our claim [(1 - )%

confidence].

Page 71: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 71

Valid Interpretations of the Confidence Interval for the Mean

Valid“The mean distance traveled in one second could be any value in the range from 0.38 to 0.44 meters.”

Not Valid“The mean distance traveled in 1 second will be 0.41 meters.”

You CANNOT CLAIM with any measurable confidence that the population mean is equal to any specific value INSIDE the confidence

interval.

You CANNOT CLAIM with any measurable confidence that the population mean is equal to any specific value INSIDE the confidence

interval.

Page 72: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 72

ExerciseCalculate a 90% confidence interval for the mean in the Galileo experiment

What is the value of ?A: 90% B: 0.90 C: 5% D: 0.10

What is the value of Z/2 ?A: 1.645 B: 1.28 C: 0.90 D: 0.95

What is the value of the lower bound (LB)?A: 0.02 meters B: 0.02 sec C: 0.39 sec D: 0.39 meters

What is the value of the upper bound (UB)?A: 0.43 sec B: 0.02 meters C: 0.43 meters D: 0.41 meters

Page 73: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 73

Confidence Intervals for the Mean When the Population Standard Deviation is Estimated from the Data

The value of is unknown

Introduces more uncertainty in the results

Cannot use Z-scores

Use “t-scores” Larger than corresponding Z-scores

Lead to wider confidence limits

See table A-3 “t-Distribution” on page 715 of Triola.

Page 74: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 74

Confidence Intervals for the Mean When the Population Standard Deviation is Estimated from the Data

s = the standard deviation (calculated from the data) n = number of samples in the data

n

stXLB

,df

2

n

stXUB

,df

2

tdf,/2 is from table A-3, page 715 of Triola

df = “degrees of freedom”– A measure of how much data you had– df = n-1 – If df = 30 or more, then the tdf,/2 is very close to

Z/2

Page 75: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 75

Reading the table of t-values, page 715 of Triola

t-Distribution

0.005 0.01 0.025 0.05 0.10 0.25Degrees (one tail) (one tail) (one tail) (one tail) (one tail) (one tail)

of 0.01 0.02 0.05 0.10 0.20 0.50freedom (two tails) (two tails) (two tails) (two tails) (two tails) (two tails)

1 63.657 31.821 12.706 6.314 3.078 1.0002 9.925 6.965 4.303 2.920 1.886 0.8163 5.841 4.541 3.182 2.353 1.638 0.7654 4.604 3.747 2.776 2.132 1.533 0.7415 4.032 3.365 2.571 2.015 1.476 0.727

6 3.707 3.143 2.447 1.943 1.440 0.718

Table A-3

For an 80% confidence interval, a = 0.20. Use t-values

in this column to calculate 80% confidence intervals

The value of refers to the “two tails” value

For a 95% confidence interval, = 0.05. Use t-values in this

column to calculate 95% confidence intervals

Page 76: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 76

Reading the table of t-values, page 715 of Triola

t-Distribution

0.005 0.01 0.025 0.05 0.10 0.25Degrees (one tail) (one tail) (one tail) (one tail) (one tail) (one tail)

of 0.01 0.02 0.05 0.10 0.20 0.50freedom (two tails) (two tails) (two tails) (two tails) (two tails) (two tails)

1 63.657 31.821 12.706 6.314 3.078 1.0002 9.925 6.965 4.303 2.920 1.886 0.8163 5.841 4.541 3.182 2.353 1.638 0.7654 4.604 3.747 2.776 2.132 1.533 0.7415 4.032 3.365 2.571 2.015 1.476 0.727

6 3.707 3.143 2.447 1.943 1.440 0.718

Table A-3

df = 6Use these t-values to calculate confidence

intervals whenever you have only 7 data points.

Page 77: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 77

Procedure for Calculating the Confidence Interval for the Mean

Collect the Datan = sample size

representative samples

Calculate thesample mean

Is value of

known?

Determine the value of

Get the value of Z/2 from table A-2

in Triola

LB X Zn

2

UB X Zn

2

Calculate thesample standard

deviation (s)

Get the value of tdf,/2 from table A-3

in Triola

n

stXUB

n

stXLB

,n

,n

21

21

STOP

Yes

No

Page 78: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 78

Example: Calculating a confidence interval when the population standard deviation is estimated from the data

A soft drink bottling plant makes 2-liter bottles of soft drink. Because of slight variations in the soft drink composition and variations in the bottling machine, the actual content of the filled bottles varies from bottle to bottle. Seven filled bottles were sampled and their contents measured. The data for these samples are given below (quantities reported are liters of soft drink found in the bottle). Find a 95% confidence interval for the average content (in liters) for bottles made at the plant.

1.83, 2.02, 1.76, 1.90, 1.95, 2.10, 1.88

Page 79: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 79

Bottling Plant Example

1- = 0.95 = 0.05 n = 7 (df = 6) tdf,2 = t6, 0.025 = 2.447

liters 921X .

s = 0.11

ttle)(liters/bo 1.82

7

0.11(2.447)1.92

n

s

2df,tXLB

ttle)(liters/bo 2.02

7

0.11(2.447)1.92

n

s

2df,tXUB

Page 80: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 80

Calculating a 95% Confidence Interval in JMP IN

Step 1Enter the data into JMP IN

Page 81: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 81

Calculating a 95% Confidence Interval in JMP IN

Step 2Select Analyze: Distribution of Y

We are 95%confident that the average volume of soft drink for all 2-liter bottles produced at the plant is between 1.81 and 2.03

liters

We are 95%confident that the average volume of soft drink for all 2-liter bottles produced at the plant is between 1.81 and 2.03

liters

Page 82: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 82

Wrap-upWhat you should be able to do

State the requirements for a statistical inference to be valid and apply them to evaluate the validity of a given inference.

Calculate and interpret a (1 - )% confidence interval for the mean (by hand and by using JMP IN®) When the exact value of the population standard

deviation is known When the exact value of is not known

Use the (1-)% confidence interval to make inferences about the population mean

Distinguish between valid and invalid interpretations of the (1 - )% confidence interval.

Page 83: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 83

Statistical Inferences About the Comparison of Two Population Means

Confidence Intervalfor the Difference of Two Population Means

200.0

300.0

400.0

500.0

600.0

RESOLUTE SYOWAO

zon

e (D

ob

son

Un

its)

Page 84: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 84

A Look AheadWhat you will be able to do after this lecture

Calculate and interpret a (1 - )% confidence interval for the difference of two population means

By hand

By using JMP IN®

Use the (1 - )% confidence interval for the difference of two population means to determine if the two means are different.

Distinguish between valid and invalid statements that interpret the meaning of a (1 - )% confidence interval for the difference of two population means.

Page 85: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 85

The Goal of Many Studies is to Compare the Means from Two Populations

Example 1A group of engineers is conducting a study to compare the energy efficiency of two different fuels. The team will burn

several replicate samples of each fuel type and measure the energy generated (in calories) for each replicate.

Population #1The energy efficiencies of all possible samples of fuel type #1

Population #2The energy efficiencies of all possible samples of fuel type #2

Population ParametersThe mean of energy efficiency of fuel type #1, compared to the mean energy efficiency of fuel type #2. (fuel type #1 vs fuel type #2)

Page 86: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 86

Example #2

A sociologist compared the value placed on human life by individuals who watch more than 20 hours of TV each week

with those who watch less than 5 hours each week. She randomly selected 50 individuals from each category and gave each a test to evaluate each individual’s value for human life.

Population #1Test scores from all people watching over 20 hours of TV each week.

Population #2Test scores from all people watching under 5 hours of TV each week.

Population ParametersThe mean test score from the “over 20 hours crowd,” compared to the mean score from the “under 5 hours crowd” (over 20 hrs vs under 5 hrs)

Page 87: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 87

Example #3

A medical doctor is comparing two different surgical techniques for repairing a torn anterior cruciate ligament (ACL) in the knee.

Fifteen randomly selected patients with torn ACL’s are treated with the old technique, and fifteen other randomly selected patients are treated with the new technique. The time required for each

person to recover 90% of motion in the injured knee is recorded.

Population #1All possible recovery times under the old method.

Population #2All possible recovery times under the new method.

Population ParametersMean recovery time with the old technique, compared to the mean recovery time with the new technique. (old technique vs new technique)

Page 88: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 88

Making Statistical Inferences About the Comparison of Two Population Means

Assumptions and Requirements Each population follows a normal distribution (or

something reasonably close to normal)

The populations have the same standard deviation

We have a representative sample from each population Samples were randomly selected Every entity in the population had an equal chance of being

selected

Page 89: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 89

Exercise We wish to compare the average height of JMU students who were born in the months of January through June to the average height of JMU students who were born in the months July through December.

Population #1All JMU students born in Jan-Jun

Population #2All JMU students born in Jul-Dec

Population ParametersAverage height in these two populations

DataHeights of 10 randomly selected students• 5 born in Jan - Jun• 5 born in Jul - Dec.

Page 90: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 90

Data Sheet for Recording the Height DataBorn in Jan -

JunBorn in July -

Aug

MeanStd. Dev.

Page 91: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 91

True or False?

The average height of students in our sample that were born in January - June is different than the average height of students in our sample that were born in July - December.

The average height of all JMU students born in January - June is different than the average height of all JMU students that were born in July - December.

Page 92: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 92

True or False?

The average height of students in our sample that were born in January - June is different than the average height of students in our sample that were born in July - December.

The average height of all JMU students born in January - June is different than the average height of all JMU students that were born in July - December.

The 1st statement is true.It is only a summary of our data.

It does not generalize the results and apply them to the underlying population parameters.

Page 93: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 93

True or False?

The average height of students in our sample that were born in January - June is different than the average height of students in our sample that were born in July - December.

The average height of all JMU students born in January - June is different than the average height of all JMU students that were born in July - December.

We cannot tell if the 2nd statement is reasonable.It is a statistical inference because it generalizes

what what we saw in our data and applies it to the underlying population parameters.

Page 94: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 94

Recall: Two Requirements for a Statistical Inference to be Valid

1 The data come from an unbiased sample of each population

2 The inference accurately states the degree of certainty in the conclusion

Page 95: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 95

Definition: Confidence Interval for the Difference of Two Population Means

Same interpretation of as before. Names for

Type I error probability -level significance level

A (1-)% confidence interval for the difference of two population means is a range of values running from a lower bound to an upper bound wherein we can be (1-)% confident that the true difference falls.

Page 96: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 96

Formula for Calculating a (1-)% Confidence Interval for the difference of Two Population Means

Lower Bound (LB)

Upper Bound (UB)

21222111

21 nnst)XX( p ,nn

21222111

21 nnst)XX( p ,nn

• s1 and s2 are the standard deviations of the two groups of sample data

• n1 = sample size from the first population

• n2 = sample size from the second population

• tn1+n2-2, a/2 is from the

table A-3 in Triola

2

11

21

222

211

nn

snsnsp

Page 97: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 97

Example

An automobile manufacturing company was trying to determine which type of tire to install on its new models. Six sets of tire

brand “A” and six sets of tire brand “B” were installed on 12 new automobiles and the number of miles of use before 60% of the tread was worn off was measured. The results are given in the

table below. Calculate a 95% confidence interval for the difference in average mileage between the two brands of tires.

Brand A Brand B38610 3102734840 2982735793 2881232833 3006439477 2852433752 29500

Tire Mileage Results

Page 98: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 98

Calculations for the 95% Confidence Interval

miles35884 A brand from mileage average X 1

miles s 6.26561

miles s 24.9042 miles29626 B brand from mileage average X 2

61 n

62 n

miles

nn

snsnsp

3.1984266

24.90456.26565

2

11

22

21

221

211

95% confidence

= 0.05

/2 = 0.025

tn1+n2-2, /2 = t10, 0.025 = 2.228

Page 99: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 99

Calculations for the 95% Confidence Interval

miles

.

..

nnst)XX( Bound Lower p ,nn

3706

525526258

6

1

6

13198422822962635884

11

21222121

miles

.

nnst)XX( Bound Upper p ,nn

8810

525526258

11

21222121

Page 100: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 100

95% Confidence Interval for Tire Tread Example

We are 95% confident that Brand A will average between 3706 and 8810 more miles

than Brand B before using up 60% of its tread.

Page 101: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 101

Using JMP IN to Calculate the 95% confidence interval for the difference of two population means

1 Create two columns in JMP IN®

One column containing all the measurements

A second column indicating which population each measurement comes from

Make sure the second columnis set to “N” (for “nominal”)

Page 102: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 102

2 Select Analyze: Fit Y by X Place the measured

value on the Y-axis Place the population

ID on the X-axis

Mileage By Brand

27500

30000

32500

35000

37500

40000

Brand A Brand B

Brand

Page 103: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 103

3 Select Means: Anova/t-test under the Analysis button

t-Test

EstimateStd ErrorLower 95%Upper 95%

Difference 6258.47 1145.65 3705.77 8811.16

t-Test 5.463

DF 10

Prob>|t| 0.0003

Assuming equal variances

Upper and Lower Bounds for the 95% confidence interval

Page 104: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 104

Valid Interpretations of the Confidence Interval for the Difference between Two Population Means

You must state the range and the confidence level.“We are (1 - )% confident that the difference

1 - 2 is somewhere between LB and UB.”

You must state the range and the confidence level.“We are (1 - )% confident that the difference

1 - 2 is somewhere between LB and UB.”

Valid“We are 95% confident that the mean mileage difference (brand A - brand B) is between 3706 and 8810 miles.”

Not valid“The mean difference in mileage between brand A and brand B is 6258 miles.”

Page 105: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 105

Valid Interpretations of the Confidence Interval for the Difference between Two Population Means

Valid“Since the confidence interval for the difference in average mileage does not include zero, we can claim with 95% confidence that that the average mileage for brand A is different than the average mileage for brand B.”

We CAN CLAIM with (1-)% confidence that the means are different if the confidence interval does

not include the value “0.”

We CAN CLAIM with (1-)% confidence that the means are different if the confidence interval does

not include the value “0.”

Page 106: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 106

Wrap-upWhat you should be able to do

Calculate and interpret a (1 - )% confidence interval for the difference of two population means By hand By using JMP IN®

How to use the (1 - )% confidence interval for the difference of two population means to determine if the two means are different.

How to distinguish between valid and invalid statements that interpret the meaning of a (1 - )% confidence interval for the difference of two population means.

Page 107: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 107

Introduction to Statistical Hypothesis Testing

The meansare equal?

The meansare not equal?

Page 108: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 108

A Look AheadWhat you will be able to do after this lecture

Define the two types of statistical hypotheses and how each is used Null hypothesis Alternative hypothesis

Given a description of a hypothesis testing problem, determine null and alternative hypotheses both in words and in equation form

Use confidence intervals to do the following hypothesis tests Test whether the population mean is different from a specified value Test whether two population have the same mean value

Describe the two types of errors that can occur in hypothesis testing

Describe in words what each type of error would consist of in a given real-life problem.

Page 109: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 109

Definition: Hypothesis

In statistical testing, a hypothesis is a claim or statement about a property of a population

Example hypotheses Medical researchers claim that the average body temperature

of healthy adults is actually not equal to 98.6oF. (i.e. 98.6) The average GPA of JMU students who study 8 hours or more

per week is higher than the average GPA for JMU students who do not. (i.e. at least 8 hrs/week > less than 8 hrs/week )

Many hypotheses of interest assert something about the value of the mean of one or more populations.

Page 110: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 110

Difference between an hypothesis and a statistical inference

Both assert something about a population parameter

Hypothesis is an assertion that will be tested by the data (it’s made “before looking at the data”)

Statistical inference can look just like an hypothesis, except that it is a conclusion that is made, based on a look at the data

The process of making a statistical inference can be thought of as using the data to “test” which hypothesis is most reasonable

Page 111: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 111

Two Types of HypothesisThe alternative hypothesis (H1)

States that the population parameter is different from some specified value

Also refered to as the research hypothesis Usually is the claim that we wish to really evaluate

(and often that we hope is true)

The null hypothesis (H0) States the opposite claim as the alternative

hypothesis Usually states that the population parameter(s) is NOT

different than a specified value (or are NOT different different from each other).

Page 112: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 112

Two Cases Considered in this lecture

Case 1: Testing hypotheses about the value of a single population mean Referred to as the single population case

Case 2: Testing hypotheses about how the means of two different populations compare Referred to as the two population case

Page 113: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 113

The single population case:Testing if the population mean is equal to a specified value

Goal: To determine if the population mean differs from a specified value

Data: A random sample of n measurements from a single population

Examples Determine if the average starting salary of JMU

graduates differs from the nationwide mean of $30,000/year.

Determine if average shelf life of a new battery exceeds 12 months.

Page 114: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 114

Single Population Case

Null Hypothesis Asserts that the population mean IS NOT DIFFERENT than

the specified value Notation

H0: = 0 , where is the population mean, and where 0 is the specified value we are comparing against.

Alternative Hypothesis Asserts that the population mean IS DIFFERENT than the

specified value Notation

H1:

Page 115: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 115

Single Population Case

Method for testing the two hypothesesConstruct a (1-)% confidence interval for the mean. If 0 is outside the interval, we can conclude with (1-)% confidence that H1 is true. Otherwise, we say that we “cannot reject” H0.

Page 116: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 116

ExampleStarting Salary of JMU Graduates

A study was conducted to determine if the average starting salary of JMU graduates is different than the national average of $30,000 for college graduates nationwide. 25 JMU graduates were randomly selected and their starting salaries recorded. The average starting salary in the sample was $33,796. A 99% confidence interval for the average runs from $29,297 to $38,296

Null HypothesisIn words H0: The average starting salary of JMU graduates is equal to $30,000In equation form H0: JMU graduates = $30,000

Alternative HypothesisIn words H1: The average starting salary of JMU graduates is not equal to $30,000 In equation form H1: JMU graduates $30,000

Page 117: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 117

ExampleAn electrical firm conducted a study to determine if the average useable lifetime of a new light bulb design differs from the 800 hour average associated with the old design. Thirty-six of the new design bulbs were randomly selected and used until failure. The average lifetime of the 36 bulbs was 902 hours and the standard deviation was 30 hours. Can we conclude with 95% confidence that the new bulbs differ from the standard?

Null HypothesisIn words H0: The average life time is not

different from 800 hoursUsing equations H0: new light bulbs = 800.

Alternative HypothesisIn words H1: The average life time differs

from 800 hrsUsing equations H1: new light bulbs 800

Page 118: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 118

Light Bulb Exercise (continued)

95% confidence interval

893

89902

596190236

30902 025035

2

.

.

t

n

stXLB

.,

,df

912

5961902

2

.n

stXUB

,df

Page 119: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 119

Light Bulb Exercise (continued)

Since the benchmark value of 800 hours is not included in the 95% confidence interval, we can reject the null hypothesis and conclude with 95% confidence that the average life of the new bulbs is different from 800 hours.

Page 120: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 120

Possible Outcomes of Hypothesis Testing

Null Hypotheses?

Alpha and Beta Errors?

Page 121: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 121

Possible OutcomesThe Risks in Hypothesis Testing

Wh

at i

s co

ncl

ud

ed RejectNullHyp.

Do notReject

NullHyp.

The results of the experiment, data analysis and hypothesis test.

The results of the experiment, data analysis and hypothesis test.

What is Actually TrueNull hypothesis Alternative Hypothesis

This is seldom

known with certainty

This is seldom

known with certainty

Page 122: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 122

Possible OutcomesThe Risks in Hypothesis Testing

Wh

at i

s co

ncl

ud

ed RejectNullHyp.

Do notReject

NullHyp.

What is Actually TrueNull hypothesis Alternative Hypothesis

Correct decision

Correct decision

Type 1 error(-value gives probability of such an error)

Type 2 error (probability not

given here)

Page 123: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 123

ExerciseState in your own words what a Type I and a Type II error would consist of with the light bulb example

Type I error A type I error would consist of concluding that

the average lifetime of the new bulb is different from 800 hours when in fact it is not different.

Type II error A type II error would consist of concluding that

the average lifetime of the new bulb is NOT different from 800 hours when in fact it really is different.

Page 124: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 124

Two Population CaseTesting if the means of two populations differ from one another

Goal: To determine if the means from two populations differ from each other.

Data: A random sample of n1 measurements from one population and n2 measurements from the other population

Examples Determine if the average weight loss under one

diet/exercise plan differs from the average weight loss under another plan.

Determine if the average nicotine content from one brand of cigarettes is different than the average nicotene content from another brand.

Page 125: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 125

Two Population Case

Null Hypothesis Asserts that the two population means ARE NOT

DIFFERENT Notation

H0: 1 = 2 , where 1 is the mean of the first population, and where 2 is the mean of the second population.

Alternative Hypothesis Asserts that the two population means ARE DIFFERENT Notation

H1: 1 2

Page 126: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 126

Two Population Case

Method of testing the two hypothesesConstruct a (1-)% confidence interval for the difference of the two means. If the interval does not include the value zero, then we can conclude with (1-)% confidence that H1 is true. Otherwise, we say that we “cannot reject” H0.

Page 127: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 127

Example: Two Population Case

A study was made to compare the effects of two different weight lifting programs on overall strength improvement. Ten randomly selected individuals were assigned to use program A, and eight were assigned to use program B. After twelve weeks, the gain in strength was measured on each individual (expressed as the change in the maximum number of pounds that the individual could bench press). The group using program A showed an average increase of 55 pounds, with a standard deviation of 12 pounds. The group using program B showed an average increase of 40 pounds with a standard deviation of 14 pounds. Perform a statistical test to determine if the average strength gain differs between the two programs. Use a confidence level of 99%

Page 128: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 128

Weight training example (continued)

Null HypothesisIn words H0: The average increase in

bench press pounds is the same for both programsUsing equations H0: program A = program B

Alternative HypothesisIn words H1: The average increase in bench press pounds

differs between the two programsUsing equations H1: program A program B

pounds 9.12

2810

147129

2nn

s1ns1ns

22

21

222

211

p

Page 129: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 129

Weight Training Exercise (continued)

99% confidence interval for the difference

pounds

..

nnst)XX( Bound Lower p ,nn

3

1815

8

1

10

191292124055

11

21222121

pounds

nnst)XX( Bound Upper p ,nn

33

1815

11

21222121

Page 130: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 130

Weight Training Exercise (continued)

We are 99% certain that, over a 12-week period, the the two programs will lead to average strength gains that differ by somewhere between -3 and 33 pounds.

Since the 99% confidence interval includes the value “0,” we cannot conclude with 99% confidence that the two programs result in different average gains in strength.

Page 131: GIST141B, Fall, 1999 Revised 8/11/991 Modeling Populations Introduction to the Normal Distribution

GIST141B, Fall, 1999 Revised 8/11/99 131

Wrap-upWhat you should be able to do

Define the two types of statistical hypotheses and how each is used Null hypothesis Alternative hypothesis

Given a description of a hypothesis testing problem, determine null and alternative hypotheses both in words and in equation form

Use confidence intervals to do the following hypothesis tests Test whether the population mean is different from a specified value Test whether two population have the same mean value

Describe the two types of errors that can occur in hypothesis testing

Describe in words what each type of error would consist of in a given real-life problem.