Upload
herbert-poole
View
215
Download
2
Embed Size (px)
Citation preview
GIST141B, Fall, 1999 Revised 8/11/99 1
Modeling Populations
Introduction to the Normal Distribution
GIST141B, Fall, 1999 Revised 8/11/99 2
A Look AheadWhat you will be able to do after this lecture
Understand two reasons why experiments provide an imperfect window on reality
Understand and describe: Population Population parameter Sample
Given a statement involving a statistical inference identify the population to which it applies
Distinguish between a statistical inference and a statement that merely summarizes the results of a study.
GIST141B, Fall, 1999 Revised 8/11/99 3
ExperimentsThe Scientist’s Imperfect Window on Reality
The way things re
ally are
An Experiment
GIST141B, Fall, 1999 Revised 8/11/99 4
Two Reasons That the Window is Imperfect
1 The data exhibit variation
2 The experimental data are an incomplete sample the real system
A goal of scienceTo use our data to make generalizations
about the way things really are.
A goal of scienceTo use our data to make generalizations
about the way things really are.
GIST141B, Fall, 1999 Revised 8/11/99 5
The Effects of Variation and Sampling
GIST141B, Fall, 1999 Revised 8/11/99 6
Definitions
Population The collection of all entities or quantities about which
you wish to make a generalization
Population parameter The particular characteristic of the population that
you wish to study The quantity whose value you wish to estimate
Sample A subset of the population which you select and then
measure The source of the data that you will use to estimate
the population parameter
GIST141B, Fall, 1999 Revised 8/11/99 7
The Population: What “Really is”
GIST141B, Fall, 1999 Revised 8/11/99 8
The Sample:What We See
GIST141B, Fall, 1999 Revised 8/11/99 9
Example #1
A researcher is trying to determine the average income of JMU students one year after graduation. He randomly selects 100 students from each graduating class and the contacts them and obtains their annual income.
Population:
Population parameter:
Sample:
All JMU graduates one year past graduation
100 graduates from each class
Average annual income
GIST141B, Fall, 1999 Revised 8/11/99 10
Example #2
A chemical engineer is evaluating a new chemical process for making nylon polymer. He wishes to determine the average tensile strength of nylon yarn that is produced by the new process. He manufactures 500 samples of nylon yarn and measures the tensile strength of each sample.
Population:
Population parameter:
Sample:
All nylon yarn that will be made with the new process
Average tensile strength
500 samples of nylon yarn
GIST141B, Fall, 1999 Revised 8/11/99 11
Example #3
An environmental scientist is trying to determine the rate at which stratospheric ozone is depleting over the South Pole. Over a five-year period, she measures ozone levels by means of instrumentation on a weather balloon that is released daily from a research station near the South Pole.
Population:
Population parameter:
Sample:
The stratosphere over the South Pole
The rate at which ozoneis decreasing.
Daily samples over a station near the South Pole (approximately 1825 samples)
GIST141B, Fall, 1999 Revised 8/11/99 12
Two Different Types of Populations
A collection of distinct entities People, bacteria, etc
A theoretical or existing system The stratosphere A new chemical process
GIST141B, Fall, 1999 Revised 8/11/99 13
DefinitionA statistical inference is a generalization about the population that is made from a sample of data.
Example 1 Sample data: A particular pair of shoes was purchased
from a shoe store at a price that was much cheaper than the same pair could have been purchased elsewhere.
Generalization: Any shoes purchased from this store will be cheaper than what they would cost elsewhere. (Population: All the shoes purchased from this store)
Example 2 Sample data: Professor Snigglehopper gave easy tests
in English 102. 1st Generalization: Professor Snigglehopper will give
easy tests in English 425. (Population: All of Professor Snigglehopper’s tests)
2nd Generalization: All English professors give easy tests (Population: All tests given by all English profs)
GIST141B, Fall, 1999 Revised 8/11/99 14
Exercise
Galileo rolled balls down inclined ramps and measured the distance covered in a fixed time. Suppose he did 25 replicate runs on a ramp with an incline of 5o. Based on these runs, Galileo found the average distance traveled in one second was 0.41 meters. For each statement below, determine if the statement is a statistical inference or if it is not.• “On average, a ball will roll 0.41 meters down a 5o incline in
1 second.”• “The balls traveled an average of 0.41 meters down the
incline in 1 second in my experiments.”• “A ball will generally roll about 0.4 meters down a 5o incline
in 1 second.”• “A ball will generally roll somewhere between 0.1 and 7
meters down a 5o incline in 1 second.”• “The balls did not all travel exactly the same distance in 1
sec.”
GIST141B, Fall, 1999 Revised 8/11/99 15
Wrap-upWhat you should be able to do
Understand two reasons why experiments provide an imperfect window on reality
Understand and describe: Population Population parameter Sample
Given a statement involving a statistical inference identify the population to which it applies
Distinguish between a statistical inference and a statement that merely summarizes the results of a study.
GIST141B, Fall, 1999 Revised 8/11/99 16
The Normal Distribution Model
Basic Theory
GIST141B, Fall, 1999 Revised 8/11/99 17
A Look AheadWhat you will be able to do after this lecture
Explain the importance of the normal distribution for modeling populations
State the properties of the normal distribution Bell-shaped, symmetric Areas under the curve represent proportions of
the population Bell centered over Width of bell determined by N(,) notation
GIST141B, Fall, 1999 Revised 8/11/99 18
The Normal Distribution Model
A mathematical model for describing the likelihood of getting particular measured values from the population whenever you take a sample
Accounts for the variability in the population Uses probability to model the randomness in the
population Provides a basis for making statistical inferences
from sample data Predictions about the overall makeup of the population Inferences about the population mean Comparisons of means of two or more populations
GIST141B, Fall, 1999 Revised 8/11/99 19
ExampleVerbal SAT scores for the Class of 2000.
Verbal SAT Scores: JMU Class of 2000
0
100
200
300
400
500
600
700
27
5 to
29
9
32
5 to
34
9
37
5 to
39
9
42
5 to
44
9
47
5 to
49
9
52
5 to
54
9
57
5 to
59
9
62
5 to
64
9
67
5 to
69
9
72
5 to
74
9
77
5 to
79
9
X XX XX
XX
X
XXX
XX X
Data from the first 14 students
Data from the first 14 students
GIST141B, Fall, 1999 Revised 8/11/99 20
ExampleVerbal SAT scores for the Class of 2000.
Verbal SAT Scores: JMU Class of 2000
0
100
200
300
400
500
600
700
27
5 to
29
9
32
5 to
34
9
37
5 to
39
9
42
5 to
44
9
47
5 to
49
9
52
5 to
54
9
57
5 to
59
9
62
5 to
64
9
67
5 to
69
9
72
5 to
74
9
77
5 to
79
9
# S
tud
en
ts
Data from all 3165 students
Data from all 3165 students
GIST141B, Fall, 1999 Revised 8/11/99 21
ExampleVerbal SAT scores for the Class of 2000.
Verbal SAT Scores: JMU Class of 2000
0
100
200
300
400
500
600
700
27
5 to
29
9
32
5 to
34
9
37
5 to
39
9
42
5 to
44
9
47
5 to
49
9
52
5 to
54
9
57
5 to
59
9
62
5 to
64
9
67
5 to
69
9
72
5 to
74
9
77
5 to
79
9
# S
tud
en
tsThis is the shape of
the normal distribution.
It models the shape of this histogram
GIST141B, Fall, 1999 Revised 8/11/99 22
The Normal DistributionModels the shape of many different populations
Dependent variable (Y) = probability density Larger values indicate values that are more
common Values near zero indicate values that are
uncommon Independent variable (X) = possible value in the
population (i.e. Verbal SAT score) Parameters: (population mean) and
(population standard deviation Model wheree
2
1Y
2
2
2
X
,)(
= population standard deviation
= population mean
and are based on measuring every entity in the population (i.e. not based on sample data)
and are based on measuring every entity in the population (i.e. not based on sample data)
GIST141B, Fall, 1999 Revised 8/11/99 23
The Normal Distribution
X = Possible Values in the Population
Y =
Pro
bab
ilit
y D
ensi
ty
The Normal DistributionModels the shape of many different populations
2
2
2
X
e2
1Y
)(
GIST141B, Fall, 1999 Revised 8/11/99 24
Important Facts About the Normal Distribution
The Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
Area = 1.0(100% of the population)
Total area under the normal curve
is equal to 1.0(100% of the population)
Total area under the normal curve
is equal to 1.0(100% of the population)
GIST141B, Fall, 1999 Revised 8/11/99 25
Important Facts About the Normal Distribution
The Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
Area = proportion of the population with values
between A and B.
Area = proportion of the population with values
between A and B.
A B
GIST141B, Fall, 1999 Revised 8/11/99 26
The Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
Important Facts About the Normal Distribution
“Bell” is centered over the
Population Mean
“Bell” is centered over the
Population Mean
Values near the mean are much more common than values far from the mean
Values near the mean are much more common than values far from the mean
Most likelyrange ofvalues
Least likely range of values
GIST141B, Fall, 1999 Revised 8/11/99 27
The Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
Values below the mean are just as likely as values above the mean
Values below the mean are just as likely as values above the mean
Important Facts About the Normal Distribution
GIST141B, Fall, 1999 Revised 8/11/99 28
Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
68.3% of the population is within ONE standard
deviation from the mean
68.3% of the population is within ONE standard
deviation from the mean68%
Important Facts About the Normal Distribution
1 1
GIST141B, Fall, 1999 Revised 8/11/99 29
Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
95.6% of the population is within TWO
standard deviations from
the mean
95.6% of the population is within TWO
standard deviations from
the mean 95%
Important Facts About the Normal Distribution
1.96 1.96
GIST141B, Fall, 1999 Revised 8/11/99 30
Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
99.7% of the population is within THREE
standard deviations from
the mean
99.7% of the population is within THREE
standard deviations from
the mean99.7%
Important Facts About the Normal Distribution
3 3
GIST141B, Fall, 1999 Revised 8/11/99 31
Exercise
JMU’s class of 2000 has verbal SAT scores that follow a normal distribution with a mean of 584 and a standard deviation of 67. Give a range of SAT scores within which 95% of the scores from the class of 2000 will fall.
A: 95% will have scores between 517 & 651
B: 95% will have scores between 450 & 718
C: 95% will have scores between 555 & 613
D: I haven’t got a clue
GIST141B, Fall, 1999 Revised 8/11/99 32
The Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
The population standard deviation
() determines the width of the bell
The population standard deviation
() determines the width of the bell
1
2
1 > 21 > 2
Important Facts About the Normal DistributionImportant Facts About the Normal Distribution
The larger the standard
deviation, the wider the bell.
The larger the standard
deviation, the wider the bell.
GIST141B, Fall, 1999 Revised 8/11/99 33
N(,)
Normal Distribution: N(, )
Possible Values in the Population
Pro
bab
ility
De
nsi
ty
Notation for the Normal Distribution
determines the width of the
bell
determines the width of the
bell
determines the location of the bell
determines the location of the bell
N( , ) refers to a normal distribution model
with a mean of and a
standard deviation of
2
2
2
X
e2
1Y
)(
GIST141B, Fall, 1999 Revised 8/11/99 34
Example TerminologyThe population of Verbal SAT scores for the JMU Class of 2000 follow a N(584,67) distribution
Verbal SAT Scores: JMU Class of 2000
0
100
200
300
400
500
600
7002
75
to 2
99
32
5 to
34
9
37
5 to
39
9
42
5 to
44
9
47
5 to
49
9
52
5 to
54
9
57
5 to
59
9
62
5 to
64
9
67
5 to
69
9
72
5 to
74
9
77
5 to
79
9
# S
tud
en
ts
2
2
672
584SAT
e267
1Y )(
)(
*
Sample data from the
class of 2000
GIST141B, Fall, 1999 Revised 8/11/99 35
Example TerminologyVerbal SAT scores for the JMU Class of 2000 follow a N(584,67) distribution
NOTATION“Verbal SAT scores ~ N(584,67)” means
“The population of verbal SAT scores follow a N(584,67) distribution.”
NOTATION“Verbal SAT scores ~ N(584,67)” means
“The population of verbal SAT scores follow a N(584,67) distribution.”
GIST141B, Fall, 1999 Revised 8/11/99 36
Match and to the correct distribution
The Normal Distribution
-2 -1 0 1 2 3 4 5 6
Possible Values in the Population
Pro
bab
ility
Den
sity
N(3,0.5)
N(2,1.0)
N(4.5,0.5)
N(3,1.0)
GIST141B, Fall, 1999 Revised 8/11/99 37
Wrap-upWhat you should be able to do
Explain the importance of the normal distribution for modeling populations
State the properties of the normal distribution Bell-shaped, symmetric Areas under the curve represent proportions of
the population Bell centered over Width of bell determined by N(,) notation
GIST141B, Fall, 1999 Revised 8/11/99 38
Applying theNormal Distribution Model
Using the Normal Distributionto Make Statistical InferencesAbout the Population
GIST141B, Fall, 1999 Revised 8/11/99 39
A Look AheadWhat you will be able to do after this lecture
Understand Z-scores Definition/formula Interpretation Use the table of the Standard Normal (z) Distribution
and Z-scores to find areas under the normal curve
Determine what proportion of a normally distributed population falls in a given range
Determine a range of values within which a specified proportion of a normally distributed population will fall
GIST141B, Fall, 1999 Revised 8/11/99 40
Using the Normal Distribution Model
The Normal Distribution
X = Possible Values in the Population
Y =
Pro
ba
bili
ty D
en
sit
y
A B
By calculating areas under the N(m,s) curve, we can predict
how often certain measurement values will
occur
By calculating areas under the N(m,s) curve, we can predict
how often certain measurement values will
occur
Area = probability that future observed values will fall
between A and B
Area = probability that future observed values will fall
between A and B
GIST141B, Fall, 1999 Revised 8/11/99 41
Example: Average Monthly Ozone Readings at Syowa ~ N(300,40) distribution.
N(300,40) Distribution Modelfor Ozone at Syowa
160
195
230
265
300
335
370
405
440
Possible Ozone Readings(Monthly Average)
Pro
bab
ility
De
nsi
ty
Area = probability that a randomly chosen month will have an
average reading above 335 Dobson
units
Area = probability that a randomly chosen month will have an
average reading above 335 Dobson
units
Area = probability that a randomly
chosen month will have an average
reading between 230 and 300 Dobson units
Area = probability that a randomly
chosen month will have an average
reading between 230 and 300 Dobson units
GIST141B, Fall, 1999 Revised 8/11/99 42
Using Z-scores to Calculate Probabilities from a Normal Distribution
Definition: The Z-score for a value (X) from a normal distribution is equal to that value’s distance from the mean, in standard deviations, i.e.
Z scoreX
Z-score converts the scale of the data from a N(m,s) distribution to a N(0,1) distribution (the standard normal distribution)
The larger the Z-score, the further X is from the population mean
Uses: To find areas under the N(m,s) curve
GIST141B, Fall, 1999 Revised 8/11/99 43
Exercise: Syowa Ozone ~ N(300,40)Calculate the Z-scores for these ozone values
Ozone at Syowa
260
390
388
Z-score
-1.00
2.25
2.20
A monthly average of 260 is 1 standard deviation
below the mean
A monthly average of 390 is 2.25 standard deviations
above the mean
A monthly average of 388 is 2.20 standard deviations
above the mean
GIST141B, Fall, 1999 Revised 8/11/99 44
Areas under the Standard Normal CurveSee page 714 in the Triola Text
0 1 2 3-3 -2 -1
9.87% of the population lies between the mean
and 0.25 standard deviations above the
mean
9.87% of the population lies between the mean
and 0.25 standard deviations above the
meanArea =0.0987
Standard Normal (z) DistributionZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.02390.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.06360.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.10260.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.14060.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772
Table A-2
GIST141B, Fall, 1999 Revised 8/11/99 45
Standard Normal (z) DistributionZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.02390.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.06360.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.10260.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.14060.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772
Table A-2
Areas under the Standard Normal CurveTable entries are the area to the left of Z.
0 1 2 3-3 -2 -1
15.54% of the population is between
the mean and 0.40 standard deviations
above the mean
15.54% of the population is between
the mean and 0.40 standard deviations
above the mean
Area =0.1554
GIST141B, Fall, 1999 Revised 8/11/99 46
Areas under the Standard Normal CurveTable entries are the area to the left of Z.
0 1 2 3-3 -2 -1
Standard Normal (z) DistributionZ 0.00 0.01 0.02 0.03 0.04 0.05 0.06
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.02390.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.06360.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.10260.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.14060.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772
Table A-2
0.1554 - 0.0987 = 0.0567Hence, 5.67% of the
population is between 0.25 and 0.40 standard
deviations above the mean
0.1554 - 0.0987 = 0.0567Hence, 5.67% of the
population is between 0.25 and 0.40 standard
deviations above the mean
GIST141B, Fall, 1999 Revised 8/11/99 47
ExerciseUse the table of “Areas Under the Standard Normal Curve” to calculate the proportion of the population falling in the ranges specified.
What fraction of the population will be at most 2 standard deviations above the mean? A: 2.28% B: 97.72% C: 57.93% D: 47.72%
What percent of the population will be at least 2.27 standard deviations above the mean? A: 1.16% B: 98.84% C: 98.82% D: Don’t know
How much of the population will have values somewhere between 1.03 standard deviations below the mean and 1.75 standard deviations above the mean? A: 0.1515% B: 95.99% C: 0.8084% D: 80.84%
GIST141B, Fall, 1999 Revised 8/11/99 48
Applying Z-scores and Areas Under the Standard Normal Curve
Hint: Draw a picture of the N(500,100) distribution and shade in the area under the
curve that you are interested in. Then convert to z-scores
and find the area.
Suppose that it is known that the VERBAL SAT scores of all H. S. Seniors in the U.S. follow a N(500,100) distribution. What fraction of students receive a score of 650 or less?
500 600 700 800200 300 4001.50Z-score
for SAT of 650
GIST141B, Fall, 1999 Revised 8/11/99 49
Applying Z-scores and Areas Under the Standard Normal Curve
500 600 700 800200 300 4001.50Z-score
for SAT = 650
Suppose that it is known that the VERBAL SAT scores of all H. S. Seniors in the U.S. follow a N(500,100) distribution. What fraction of students receive a score of 650 or less?
1.5 0.43321.6 0.44521.7 0.45541.8 0.46411.9 0.4713
From Table A-2in Triola,
with Z = 1.50
Area =0.4332
Area= 0.5
TotalArea =0.9332
GIST141B, Fall, 1999 Revised 8/11/99 50
Applying Z-scores and Areas Under the Standard Normal Curve
Hence, 93.3% of H.S. seniors
receive a score of 650 or less.
500 600 700 800200 300 4001.50Z-score
for SAT = 650
Suppose that it is known that the VERBAL SAT scores of all H. S. Seniors in the U.S. follow a N(500,100) distribution. What fraction of students receive a score of 650 or less?
TotalArea =0.9332
GIST141B, Fall, 1999 Revised 8/11/99 51
Exercise
What fraction of H.S. seniors get a score of 325 or less on the verbal SAT? A: 95.99% B: 4.01% C: 95.91% D: Don’t know
How many students score above 730 on the verbal SAT? A: 0.99% B: 98.9% C: 48.9% D: 1.07%
GIST141B, Fall, 1999 Revised 8/11/99 52
Exercise
One of the Big Three automakers sells four-door sedan with an advertised highway fuel efficiency of 33 mpg. The fact is that the population of autos of this type will average 33 mpg, with some variation around that average. Suppose that the actual efficiency achieved by this population of autos is normally distributed with a mean of 33 and and a standard deviation of 6.4 mpg. What proportion of autos of this design will get mileage levels exceeding 40 mpg?
A: 1.09% B: 0.8621% C: 13.79% D: 86.21%
GIST141B, Fall, 1999 Revised 8/11/99 53
The automaker is considering giving a 95% range of mpg ratings for this automobile. Under this thinking, the sales sticker will display a range of values (centered over the average rating of 33 mpg) within which 95% of autos of this type will fall. What range of values should the automaker use?
Z-scorescale
Z-scorescale
mpg scalempg scale
Area =0.95
1.96-1.96
33
0
20.5 45.5
The automaker can advertise that 95% of these cars
will have fuel efficiency
between 20.5 and 45.5 mpg.
The automaker can advertise that 95% of these cars
will have fuel efficiency
between 20.5 and 45.5 mpg.
GIST141B, Fall, 1999 Revised 8/11/99 54
Wrap-upWhat you should be able to do
Understand Z-scores Definition/formula Interpretation Use the table of Areas Under the Standard Normal
Curve Using Z-scores to find areas under the standard
normal curve
Determine what proportion of a normally distributed population falls in a given range
Determine a range of values within which a specified proportion of a normally distributed population will fall
GIST141B, Fall, 1999 Revised 8/11/99 55
Statistical Inferences About the Population Mean
Using the Normal Probability Model to Make Statistical Inferences
GIST141B, Fall, 1999 Revised 8/11/99 56
A Look AheadWhat you will be able to do after this lecture
State the requirements for a statistical inference to be valid and apply them to evaluate the validity of a given inference.
Calculate and interpret a (1 - )% confidence interval for the mean (by hand and by using JMP IN®) When the exact value of the population standard
deviation is known When the exact value of is not known
Use the (1-)% confidence interval to make inferences about the population mean
Distinguish between valid and invalid interpretations of the (1 - )% confidence interval.
GIST141B, Fall, 1999 Revised 8/11/99 57
Many Studies are Aimed at Making Inferences about the Population Mean
Objective of the studyTo estimate the average annual income of JMU graduates one year after graduation.
PopulationAll JMU graduates 1 year after graduation
Population parameterMean annual income (in dollars)
Example #1
GIST141B, Fall, 1999 Revised 8/11/99 58
Objective of the studyTo determine the average tensile strength of nylon yarn made from a new manufacturing process.
PopulationAll nylon yarn made from the new process
Population parameterMean tensile strength (in g/cm2)
Example #2
Many Studies are Aimed at Making Inferences about the Population Mean
GIST141B, Fall, 1999 Revised 8/11/99 59
Objective of the studyTo determine the rate (in grams/mile) at which the 1998 Ford Taurus engine emits hydrocarbons (under normal driving conditions).
PopulationAll Ford Taurus automobiles
Population parameterMean emissions rate of HC’s (in grams/mile)
Example #3
Many Studies are Aimed at Making Inferences about the Population Mean
GIST141B, Fall, 1999 Revised 8/11/99 60
Making Statistical Inferences About the Population Mean
Goal
To make a valid statistical inference
about the value of based on the
value of the sample estimate
Goal
To make a valid statistical inference
about the value of based on the
value of the sample estimate X
GIST141B, Fall, 1999 Revised 8/11/99 61
Two Requirements for a Statistical Inference to be Valid
1 The data come from an unbiased sample of the population Samples were randomly selected Every subject in the population had an equal
chance of being selected
2 The inference accurately states the degree of certainty in the conclusion
GIST141B, Fall, 1999 Revised 8/11/99 62
Example of a Credible InferenceAbout the Population MeanIn order to determine the distance traveled in one second by a ball rolling down a 5o incline, suppose Galileo made 20 replicate runs and recorded the distance traveled (in meters) in one second for each run. The 20 replicates yielded an average distance of 0.41 meters. In addition, suppose that we know that the recorded distances traveled in one second follow a normal distribution with a standard deviation of 0.06 meters. Based on these data we can state with 95% confidence that the true average distance that a ball will travel in one second on this incline is between 0.38 and 0.44 meters.
Population =
Population Parameter =
Sample data =
All possible distances the ball could travel in 1 second The mean distance that a ball will travel in 1 second Distances from 20 replicate runs
GIST141B, Fall, 1999 Revised 8/11/99 63
In order to determine the distance traveled in one second by a ball rolling down a 5o incline, suppose Galileo made 20 replicate runs and recorded the distance traveled (in meters) in one second for each run. The 20 replicates yielded an average distance of 0.41 meters. In addition, suppose that we know that the recorded distances traveled in one second follow a normal distribution with a standard deviation of 0.06 meters. Based on these data we can state with 95% confidence that the true average distance that a ball will travel in one second on this incline is between 0.38 and 0.44 meters.
The first three sentences summarize whathappened in the experiment
(i.e. 20 reps; avg distance of 0.41 meters) and what we know about this population
(i.e. the population ~ N(,0.06)).They DO NOT state a statistical inference
Example of a Credible InferenceAbout the Population Mean
GIST141B, Fall, 1999 Revised 8/11/99 64
In order to determine the distance traveled in one second by a ball rolling down a 5o incline, suppose Galileo made 20 replicate runs and recorded the distance traveled (in meters) in one second for each run. The 20 replicates yielded an average distance of 0.41 meters. In addition, suppose that we know that the recorded distances traveled in one second follow a normal distribution with a standard deviation of 0.06 meters. Based on these data we can state with 95% confidence that the true average distance that a ball will travel in one second on this incline is between 0.38 and 0.44 meters.
Notice the statement of the degree of certainty in its truth.
Example of a Credible InferenceAbout the Population Mean
The last sentence uses the experimental data to generalize the results to the entire.population. This is a statistical inference
GIST141B, Fall, 1999 Revised 8/11/99 65
Definition:The Confidence Interval for the Mean
Example If we want to have 95% confidence, we use = 0.05
Names for Type I error probability -level significance level
A (1-)% confidence interval for the mean is a range of values running from a lower bound to an upper bound wherein we can be (1-)% confident that the true population mean falls.
GIST141B, Fall, 1999 Revised 8/11/99 66
Formula for Calculating a (1-)% Confidence Interval for the Mean When the Exact Value of is known
Lower Bound (LB) Upper Bound (UB)
nZX
2 nZX
2
Where n = # of samples and where
is chosen so that the area under the
normal curve to the right of is
Where n = # of samples and where
is chosen so that the area under the
normal curve to the right of is 2
Z
2Z
GIST141B, Fall, 1999 Revised 8/11/99 67
A Return to the Earlier Galileo Example: Calculating a 95% Confidence Interval for the Mean
n = 20 sample (20 replicate runs)
meters 410X .
= 0.06 meters
= 0.05 (because (1-) = 0.95)
96102502
.ZZ .
GIST141B, Fall, 1999 Revised 8/11/99 68
A Return to the Earlier Galileo Example: Calculating a 95% Confidence Interval for the Mean
38020
060961410
2
.
...
nZXLB
44020
060961410
2
.
...
nZXUB
We are 95% certain that the true average distance traveled down the 5o incline in one second is between 0.38 and 0.44 meters.
We are 95% certain that the true average distance traveled down the 5o incline in one second is between 0.38 and 0.44 meters.
GIST141B, Fall, 1999 Revised 8/11/99 69
Valid Interpretations of the Confidence Interval for the Mean
You must state the range and the confidence level.“We are (1 - )% confident that the mean is
somewhere between LB and UB.”
You must state the range and the confidence level.“We are (1 - )% confident that the mean is
somewhere between LB and UB.”
Valid“We are 95% confident that the mean distance traveled in 1 second is between 0.38 and 0.44 meters.”
Not valid“The mean distance traveled in 1 second is between 0.38 and 0.44 meters.”
GIST141B, Fall, 1999 Revised 8/11/99 70
Valid Interpretations of the Confidence Interval for the Mean
Valid“We can claim with 95% confidence that the mean distance traveled in 1 second is different than 0.48 meters.”
Not Valid“We can claim that the mean distance traveled is different than 0.48 meters.” (need to state the confidence level)
We CAN CLAIM that the population mean is different from any of the values OUTSIDE the
confidence interval, provided we state the confidence level of our claim [(1 - )%
confidence].
We CAN CLAIM that the population mean is different from any of the values OUTSIDE the
confidence interval, provided we state the confidence level of our claim [(1 - )%
confidence].
GIST141B, Fall, 1999 Revised 8/11/99 71
Valid Interpretations of the Confidence Interval for the Mean
Valid“The mean distance traveled in one second could be any value in the range from 0.38 to 0.44 meters.”
Not Valid“The mean distance traveled in 1 second will be 0.41 meters.”
You CANNOT CLAIM with any measurable confidence that the population mean is equal to any specific value INSIDE the confidence
interval.
You CANNOT CLAIM with any measurable confidence that the population mean is equal to any specific value INSIDE the confidence
interval.
GIST141B, Fall, 1999 Revised 8/11/99 72
ExerciseCalculate a 90% confidence interval for the mean in the Galileo experiment
What is the value of ?A: 90% B: 0.90 C: 5% D: 0.10
What is the value of Z/2 ?A: 1.645 B: 1.28 C: 0.90 D: 0.95
What is the value of the lower bound (LB)?A: 0.02 meters B: 0.02 sec C: 0.39 sec D: 0.39 meters
What is the value of the upper bound (UB)?A: 0.43 sec B: 0.02 meters C: 0.43 meters D: 0.41 meters
GIST141B, Fall, 1999 Revised 8/11/99 73
Confidence Intervals for the Mean When the Population Standard Deviation is Estimated from the Data
The value of is unknown
Introduces more uncertainty in the results
Cannot use Z-scores
Use “t-scores” Larger than corresponding Z-scores
Lead to wider confidence limits
See table A-3 “t-Distribution” on page 715 of Triola.
GIST141B, Fall, 1999 Revised 8/11/99 74
Confidence Intervals for the Mean When the Population Standard Deviation is Estimated from the Data
s = the standard deviation (calculated from the data) n = number of samples in the data
n
stXLB
,df
2
n
stXUB
,df
2
tdf,/2 is from table A-3, page 715 of Triola
df = “degrees of freedom”– A measure of how much data you had– df = n-1 – If df = 30 or more, then the tdf,/2 is very close to
Z/2
GIST141B, Fall, 1999 Revised 8/11/99 75
Reading the table of t-values, page 715 of Triola
t-Distribution
0.005 0.01 0.025 0.05 0.10 0.25Degrees (one tail) (one tail) (one tail) (one tail) (one tail) (one tail)
of 0.01 0.02 0.05 0.10 0.20 0.50freedom (two tails) (two tails) (two tails) (two tails) (two tails) (two tails)
1 63.657 31.821 12.706 6.314 3.078 1.0002 9.925 6.965 4.303 2.920 1.886 0.8163 5.841 4.541 3.182 2.353 1.638 0.7654 4.604 3.747 2.776 2.132 1.533 0.7415 4.032 3.365 2.571 2.015 1.476 0.727
6 3.707 3.143 2.447 1.943 1.440 0.718
Table A-3
For an 80% confidence interval, a = 0.20. Use t-values
in this column to calculate 80% confidence intervals
The value of refers to the “two tails” value
For a 95% confidence interval, = 0.05. Use t-values in this
column to calculate 95% confidence intervals
GIST141B, Fall, 1999 Revised 8/11/99 76
Reading the table of t-values, page 715 of Triola
t-Distribution
0.005 0.01 0.025 0.05 0.10 0.25Degrees (one tail) (one tail) (one tail) (one tail) (one tail) (one tail)
of 0.01 0.02 0.05 0.10 0.20 0.50freedom (two tails) (two tails) (two tails) (two tails) (two tails) (two tails)
1 63.657 31.821 12.706 6.314 3.078 1.0002 9.925 6.965 4.303 2.920 1.886 0.8163 5.841 4.541 3.182 2.353 1.638 0.7654 4.604 3.747 2.776 2.132 1.533 0.7415 4.032 3.365 2.571 2.015 1.476 0.727
6 3.707 3.143 2.447 1.943 1.440 0.718
Table A-3
df = 6Use these t-values to calculate confidence
intervals whenever you have only 7 data points.
GIST141B, Fall, 1999 Revised 8/11/99 77
Procedure for Calculating the Confidence Interval for the Mean
Collect the Datan = sample size
representative samples
Calculate thesample mean
Is value of
known?
Determine the value of
Get the value of Z/2 from table A-2
in Triola
LB X Zn
2
UB X Zn
2
Calculate thesample standard
deviation (s)
Get the value of tdf,/2 from table A-3
in Triola
n
stXUB
n
stXLB
,n
,n
21
21
STOP
Yes
No
GIST141B, Fall, 1999 Revised 8/11/99 78
Example: Calculating a confidence interval when the population standard deviation is estimated from the data
A soft drink bottling plant makes 2-liter bottles of soft drink. Because of slight variations in the soft drink composition and variations in the bottling machine, the actual content of the filled bottles varies from bottle to bottle. Seven filled bottles were sampled and their contents measured. The data for these samples are given below (quantities reported are liters of soft drink found in the bottle). Find a 95% confidence interval for the average content (in liters) for bottles made at the plant.
1.83, 2.02, 1.76, 1.90, 1.95, 2.10, 1.88
GIST141B, Fall, 1999 Revised 8/11/99 79
Bottling Plant Example
1- = 0.95 = 0.05 n = 7 (df = 6) tdf,2 = t6, 0.025 = 2.447
liters 921X .
s = 0.11
ttle)(liters/bo 1.82
7
0.11(2.447)1.92
n
s
2df,tXLB
ttle)(liters/bo 2.02
7
0.11(2.447)1.92
n
s
2df,tXUB
GIST141B, Fall, 1999 Revised 8/11/99 80
Calculating a 95% Confidence Interval in JMP IN
Step 1Enter the data into JMP IN
GIST141B, Fall, 1999 Revised 8/11/99 81
Calculating a 95% Confidence Interval in JMP IN
Step 2Select Analyze: Distribution of Y
We are 95%confident that the average volume of soft drink for all 2-liter bottles produced at the plant is between 1.81 and 2.03
liters
We are 95%confident that the average volume of soft drink for all 2-liter bottles produced at the plant is between 1.81 and 2.03
liters
GIST141B, Fall, 1999 Revised 8/11/99 82
Wrap-upWhat you should be able to do
State the requirements for a statistical inference to be valid and apply them to evaluate the validity of a given inference.
Calculate and interpret a (1 - )% confidence interval for the mean (by hand and by using JMP IN®) When the exact value of the population standard
deviation is known When the exact value of is not known
Use the (1-)% confidence interval to make inferences about the population mean
Distinguish between valid and invalid interpretations of the (1 - )% confidence interval.
GIST141B, Fall, 1999 Revised 8/11/99 83
Statistical Inferences About the Comparison of Two Population Means
Confidence Intervalfor the Difference of Two Population Means
200.0
300.0
400.0
500.0
600.0
RESOLUTE SYOWAO
zon
e (D
ob
son
Un
its)
GIST141B, Fall, 1999 Revised 8/11/99 84
A Look AheadWhat you will be able to do after this lecture
Calculate and interpret a (1 - )% confidence interval for the difference of two population means
By hand
By using JMP IN®
Use the (1 - )% confidence interval for the difference of two population means to determine if the two means are different.
Distinguish between valid and invalid statements that interpret the meaning of a (1 - )% confidence interval for the difference of two population means.
GIST141B, Fall, 1999 Revised 8/11/99 85
The Goal of Many Studies is to Compare the Means from Two Populations
Example 1A group of engineers is conducting a study to compare the energy efficiency of two different fuels. The team will burn
several replicate samples of each fuel type and measure the energy generated (in calories) for each replicate.
Population #1The energy efficiencies of all possible samples of fuel type #1
Population #2The energy efficiencies of all possible samples of fuel type #2
Population ParametersThe mean of energy efficiency of fuel type #1, compared to the mean energy efficiency of fuel type #2. (fuel type #1 vs fuel type #2)
GIST141B, Fall, 1999 Revised 8/11/99 86
Example #2
A sociologist compared the value placed on human life by individuals who watch more than 20 hours of TV each week
with those who watch less than 5 hours each week. She randomly selected 50 individuals from each category and gave each a test to evaluate each individual’s value for human life.
Population #1Test scores from all people watching over 20 hours of TV each week.
Population #2Test scores from all people watching under 5 hours of TV each week.
Population ParametersThe mean test score from the “over 20 hours crowd,” compared to the mean score from the “under 5 hours crowd” (over 20 hrs vs under 5 hrs)
GIST141B, Fall, 1999 Revised 8/11/99 87
Example #3
A medical doctor is comparing two different surgical techniques for repairing a torn anterior cruciate ligament (ACL) in the knee.
Fifteen randomly selected patients with torn ACL’s are treated with the old technique, and fifteen other randomly selected patients are treated with the new technique. The time required for each
person to recover 90% of motion in the injured knee is recorded.
Population #1All possible recovery times under the old method.
Population #2All possible recovery times under the new method.
Population ParametersMean recovery time with the old technique, compared to the mean recovery time with the new technique. (old technique vs new technique)
GIST141B, Fall, 1999 Revised 8/11/99 88
Making Statistical Inferences About the Comparison of Two Population Means
Assumptions and Requirements Each population follows a normal distribution (or
something reasonably close to normal)
The populations have the same standard deviation
We have a representative sample from each population Samples were randomly selected Every entity in the population had an equal chance of being
selected
GIST141B, Fall, 1999 Revised 8/11/99 89
Exercise We wish to compare the average height of JMU students who were born in the months of January through June to the average height of JMU students who were born in the months July through December.
Population #1All JMU students born in Jan-Jun
Population #2All JMU students born in Jul-Dec
Population ParametersAverage height in these two populations
DataHeights of 10 randomly selected students• 5 born in Jan - Jun• 5 born in Jul - Dec.
GIST141B, Fall, 1999 Revised 8/11/99 90
Data Sheet for Recording the Height DataBorn in Jan -
JunBorn in July -
Aug
MeanStd. Dev.
GIST141B, Fall, 1999 Revised 8/11/99 91
True or False?
The average height of students in our sample that were born in January - June is different than the average height of students in our sample that were born in July - December.
The average height of all JMU students born in January - June is different than the average height of all JMU students that were born in July - December.
GIST141B, Fall, 1999 Revised 8/11/99 92
True or False?
The average height of students in our sample that were born in January - June is different than the average height of students in our sample that were born in July - December.
The average height of all JMU students born in January - June is different than the average height of all JMU students that were born in July - December.
The 1st statement is true.It is only a summary of our data.
It does not generalize the results and apply them to the underlying population parameters.
GIST141B, Fall, 1999 Revised 8/11/99 93
True or False?
The average height of students in our sample that were born in January - June is different than the average height of students in our sample that were born in July - December.
The average height of all JMU students born in January - June is different than the average height of all JMU students that were born in July - December.
We cannot tell if the 2nd statement is reasonable.It is a statistical inference because it generalizes
what what we saw in our data and applies it to the underlying population parameters.
GIST141B, Fall, 1999 Revised 8/11/99 94
Recall: Two Requirements for a Statistical Inference to be Valid
1 The data come from an unbiased sample of each population
2 The inference accurately states the degree of certainty in the conclusion
GIST141B, Fall, 1999 Revised 8/11/99 95
Definition: Confidence Interval for the Difference of Two Population Means
Same interpretation of as before. Names for
Type I error probability -level significance level
A (1-)% confidence interval for the difference of two population means is a range of values running from a lower bound to an upper bound wherein we can be (1-)% confident that the true difference falls.
GIST141B, Fall, 1999 Revised 8/11/99 96
Formula for Calculating a (1-)% Confidence Interval for the difference of Two Population Means
Lower Bound (LB)
Upper Bound (UB)
21222111
21 nnst)XX( p ,nn
21222111
21 nnst)XX( p ,nn
• s1 and s2 are the standard deviations of the two groups of sample data
• n1 = sample size from the first population
• n2 = sample size from the second population
• tn1+n2-2, a/2 is from the
table A-3 in Triola
2
11
21
222
211
nn
snsnsp
GIST141B, Fall, 1999 Revised 8/11/99 97
Example
An automobile manufacturing company was trying to determine which type of tire to install on its new models. Six sets of tire
brand “A” and six sets of tire brand “B” were installed on 12 new automobiles and the number of miles of use before 60% of the tread was worn off was measured. The results are given in the
table below. Calculate a 95% confidence interval for the difference in average mileage between the two brands of tires.
Brand A Brand B38610 3102734840 2982735793 2881232833 3006439477 2852433752 29500
Tire Mileage Results
GIST141B, Fall, 1999 Revised 8/11/99 98
Calculations for the 95% Confidence Interval
miles35884 A brand from mileage average X 1
miles s 6.26561
miles s 24.9042 miles29626 B brand from mileage average X 2
61 n
62 n
miles
nn
snsnsp
3.1984266
24.90456.26565
2
11
22
21
221
211
95% confidence
= 0.05
/2 = 0.025
tn1+n2-2, /2 = t10, 0.025 = 2.228
GIST141B, Fall, 1999 Revised 8/11/99 99
Calculations for the 95% Confidence Interval
miles
.
..
nnst)XX( Bound Lower p ,nn
3706
525526258
6
1
6
13198422822962635884
11
21222121
miles
.
nnst)XX( Bound Upper p ,nn
8810
525526258
11
21222121
GIST141B, Fall, 1999 Revised 8/11/99 100
95% Confidence Interval for Tire Tread Example
We are 95% confident that Brand A will average between 3706 and 8810 more miles
than Brand B before using up 60% of its tread.
GIST141B, Fall, 1999 Revised 8/11/99 101
Using JMP IN to Calculate the 95% confidence interval for the difference of two population means
1 Create two columns in JMP IN®
One column containing all the measurements
A second column indicating which population each measurement comes from
Make sure the second columnis set to “N” (for “nominal”)
GIST141B, Fall, 1999 Revised 8/11/99 102
2 Select Analyze: Fit Y by X Place the measured
value on the Y-axis Place the population
ID on the X-axis
Mileage By Brand
27500
30000
32500
35000
37500
40000
Brand A Brand B
Brand
GIST141B, Fall, 1999 Revised 8/11/99 103
3 Select Means: Anova/t-test under the Analysis button
t-Test
EstimateStd ErrorLower 95%Upper 95%
Difference 6258.47 1145.65 3705.77 8811.16
t-Test 5.463
DF 10
Prob>|t| 0.0003
Assuming equal variances
Upper and Lower Bounds for the 95% confidence interval
GIST141B, Fall, 1999 Revised 8/11/99 104
Valid Interpretations of the Confidence Interval for the Difference between Two Population Means
You must state the range and the confidence level.“We are (1 - )% confident that the difference
1 - 2 is somewhere between LB and UB.”
You must state the range and the confidence level.“We are (1 - )% confident that the difference
1 - 2 is somewhere between LB and UB.”
Valid“We are 95% confident that the mean mileage difference (brand A - brand B) is between 3706 and 8810 miles.”
Not valid“The mean difference in mileage between brand A and brand B is 6258 miles.”
GIST141B, Fall, 1999 Revised 8/11/99 105
Valid Interpretations of the Confidence Interval for the Difference between Two Population Means
Valid“Since the confidence interval for the difference in average mileage does not include zero, we can claim with 95% confidence that that the average mileage for brand A is different than the average mileage for brand B.”
We CAN CLAIM with (1-)% confidence that the means are different if the confidence interval does
not include the value “0.”
We CAN CLAIM with (1-)% confidence that the means are different if the confidence interval does
not include the value “0.”
GIST141B, Fall, 1999 Revised 8/11/99 106
Wrap-upWhat you should be able to do
Calculate and interpret a (1 - )% confidence interval for the difference of two population means By hand By using JMP IN®
How to use the (1 - )% confidence interval for the difference of two population means to determine if the two means are different.
How to distinguish between valid and invalid statements that interpret the meaning of a (1 - )% confidence interval for the difference of two population means.
GIST141B, Fall, 1999 Revised 8/11/99 107
Introduction to Statistical Hypothesis Testing
The meansare equal?
The meansare not equal?
GIST141B, Fall, 1999 Revised 8/11/99 108
A Look AheadWhat you will be able to do after this lecture
Define the two types of statistical hypotheses and how each is used Null hypothesis Alternative hypothesis
Given a description of a hypothesis testing problem, determine null and alternative hypotheses both in words and in equation form
Use confidence intervals to do the following hypothesis tests Test whether the population mean is different from a specified value Test whether two population have the same mean value
Describe the two types of errors that can occur in hypothesis testing
Describe in words what each type of error would consist of in a given real-life problem.
GIST141B, Fall, 1999 Revised 8/11/99 109
Definition: Hypothesis
In statistical testing, a hypothesis is a claim or statement about a property of a population
Example hypotheses Medical researchers claim that the average body temperature
of healthy adults is actually not equal to 98.6oF. (i.e. 98.6) The average GPA of JMU students who study 8 hours or more
per week is higher than the average GPA for JMU students who do not. (i.e. at least 8 hrs/week > less than 8 hrs/week )
Many hypotheses of interest assert something about the value of the mean of one or more populations.
GIST141B, Fall, 1999 Revised 8/11/99 110
Difference between an hypothesis and a statistical inference
Both assert something about a population parameter
Hypothesis is an assertion that will be tested by the data (it’s made “before looking at the data”)
Statistical inference can look just like an hypothesis, except that it is a conclusion that is made, based on a look at the data
The process of making a statistical inference can be thought of as using the data to “test” which hypothesis is most reasonable
GIST141B, Fall, 1999 Revised 8/11/99 111
Two Types of HypothesisThe alternative hypothesis (H1)
States that the population parameter is different from some specified value
Also refered to as the research hypothesis Usually is the claim that we wish to really evaluate
(and often that we hope is true)
The null hypothesis (H0) States the opposite claim as the alternative
hypothesis Usually states that the population parameter(s) is NOT
different than a specified value (or are NOT different different from each other).
GIST141B, Fall, 1999 Revised 8/11/99 112
Two Cases Considered in this lecture
Case 1: Testing hypotheses about the value of a single population mean Referred to as the single population case
Case 2: Testing hypotheses about how the means of two different populations compare Referred to as the two population case
GIST141B, Fall, 1999 Revised 8/11/99 113
The single population case:Testing if the population mean is equal to a specified value
Goal: To determine if the population mean differs from a specified value
Data: A random sample of n measurements from a single population
Examples Determine if the average starting salary of JMU
graduates differs from the nationwide mean of $30,000/year.
Determine if average shelf life of a new battery exceeds 12 months.
GIST141B, Fall, 1999 Revised 8/11/99 114
Single Population Case
Null Hypothesis Asserts that the population mean IS NOT DIFFERENT than
the specified value Notation
H0: = 0 , where is the population mean, and where 0 is the specified value we are comparing against.
Alternative Hypothesis Asserts that the population mean IS DIFFERENT than the
specified value Notation
H1:
GIST141B, Fall, 1999 Revised 8/11/99 115
Single Population Case
Method for testing the two hypothesesConstruct a (1-)% confidence interval for the mean. If 0 is outside the interval, we can conclude with (1-)% confidence that H1 is true. Otherwise, we say that we “cannot reject” H0.
GIST141B, Fall, 1999 Revised 8/11/99 116
ExampleStarting Salary of JMU Graduates
A study was conducted to determine if the average starting salary of JMU graduates is different than the national average of $30,000 for college graduates nationwide. 25 JMU graduates were randomly selected and their starting salaries recorded. The average starting salary in the sample was $33,796. A 99% confidence interval for the average runs from $29,297 to $38,296
Null HypothesisIn words H0: The average starting salary of JMU graduates is equal to $30,000In equation form H0: JMU graduates = $30,000
Alternative HypothesisIn words H1: The average starting salary of JMU graduates is not equal to $30,000 In equation form H1: JMU graduates $30,000
GIST141B, Fall, 1999 Revised 8/11/99 117
ExampleAn electrical firm conducted a study to determine if the average useable lifetime of a new light bulb design differs from the 800 hour average associated with the old design. Thirty-six of the new design bulbs were randomly selected and used until failure. The average lifetime of the 36 bulbs was 902 hours and the standard deviation was 30 hours. Can we conclude with 95% confidence that the new bulbs differ from the standard?
Null HypothesisIn words H0: The average life time is not
different from 800 hoursUsing equations H0: new light bulbs = 800.
Alternative HypothesisIn words H1: The average life time differs
from 800 hrsUsing equations H1: new light bulbs 800
GIST141B, Fall, 1999 Revised 8/11/99 118
Light Bulb Exercise (continued)
95% confidence interval
893
89902
596190236
30902 025035
2
.
.
t
n
stXLB
.,
,df
912
5961902
2
.n
stXUB
,df
GIST141B, Fall, 1999 Revised 8/11/99 119
Light Bulb Exercise (continued)
Since the benchmark value of 800 hours is not included in the 95% confidence interval, we can reject the null hypothesis and conclude with 95% confidence that the average life of the new bulbs is different from 800 hours.
GIST141B, Fall, 1999 Revised 8/11/99 120
Possible Outcomes of Hypothesis Testing
Null Hypotheses?
Alpha and Beta Errors?
GIST141B, Fall, 1999 Revised 8/11/99 121
Possible OutcomesThe Risks in Hypothesis Testing
Wh
at i
s co
ncl
ud
ed RejectNullHyp.
Do notReject
NullHyp.
The results of the experiment, data analysis and hypothesis test.
The results of the experiment, data analysis and hypothesis test.
What is Actually TrueNull hypothesis Alternative Hypothesis
This is seldom
known with certainty
This is seldom
known with certainty
GIST141B, Fall, 1999 Revised 8/11/99 122
Possible OutcomesThe Risks in Hypothesis Testing
Wh
at i
s co
ncl
ud
ed RejectNullHyp.
Do notReject
NullHyp.
What is Actually TrueNull hypothesis Alternative Hypothesis
Correct decision
Correct decision
Type 1 error(-value gives probability of such an error)
Type 2 error (probability not
given here)
GIST141B, Fall, 1999 Revised 8/11/99 123
ExerciseState in your own words what a Type I and a Type II error would consist of with the light bulb example
Type I error A type I error would consist of concluding that
the average lifetime of the new bulb is different from 800 hours when in fact it is not different.
Type II error A type II error would consist of concluding that
the average lifetime of the new bulb is NOT different from 800 hours when in fact it really is different.
GIST141B, Fall, 1999 Revised 8/11/99 124
Two Population CaseTesting if the means of two populations differ from one another
Goal: To determine if the means from two populations differ from each other.
Data: A random sample of n1 measurements from one population and n2 measurements from the other population
Examples Determine if the average weight loss under one
diet/exercise plan differs from the average weight loss under another plan.
Determine if the average nicotine content from one brand of cigarettes is different than the average nicotene content from another brand.
GIST141B, Fall, 1999 Revised 8/11/99 125
Two Population Case
Null Hypothesis Asserts that the two population means ARE NOT
DIFFERENT Notation
H0: 1 = 2 , where 1 is the mean of the first population, and where 2 is the mean of the second population.
Alternative Hypothesis Asserts that the two population means ARE DIFFERENT Notation
H1: 1 2
GIST141B, Fall, 1999 Revised 8/11/99 126
Two Population Case
Method of testing the two hypothesesConstruct a (1-)% confidence interval for the difference of the two means. If the interval does not include the value zero, then we can conclude with (1-)% confidence that H1 is true. Otherwise, we say that we “cannot reject” H0.
GIST141B, Fall, 1999 Revised 8/11/99 127
Example: Two Population Case
A study was made to compare the effects of two different weight lifting programs on overall strength improvement. Ten randomly selected individuals were assigned to use program A, and eight were assigned to use program B. After twelve weeks, the gain in strength was measured on each individual (expressed as the change in the maximum number of pounds that the individual could bench press). The group using program A showed an average increase of 55 pounds, with a standard deviation of 12 pounds. The group using program B showed an average increase of 40 pounds with a standard deviation of 14 pounds. Perform a statistical test to determine if the average strength gain differs between the two programs. Use a confidence level of 99%
GIST141B, Fall, 1999 Revised 8/11/99 128
Weight training example (continued)
Null HypothesisIn words H0: The average increase in
bench press pounds is the same for both programsUsing equations H0: program A = program B
Alternative HypothesisIn words H1: The average increase in bench press pounds
differs between the two programsUsing equations H1: program A program B
pounds 9.12
2810
147129
2nn
s1ns1ns
22
21
222
211
p
GIST141B, Fall, 1999 Revised 8/11/99 129
Weight Training Exercise (continued)
99% confidence interval for the difference
pounds
..
nnst)XX( Bound Lower p ,nn
3
1815
8
1
10
191292124055
11
21222121
pounds
nnst)XX( Bound Upper p ,nn
33
1815
11
21222121
GIST141B, Fall, 1999 Revised 8/11/99 130
Weight Training Exercise (continued)
We are 99% certain that, over a 12-week period, the the two programs will lead to average strength gains that differ by somewhere between -3 and 33 pounds.
Since the 99% confidence interval includes the value “0,” we cannot conclude with 99% confidence that the two programs result in different average gains in strength.
GIST141B, Fall, 1999 Revised 8/11/99 131
Wrap-upWhat you should be able to do
Define the two types of statistical hypotheses and how each is used Null hypothesis Alternative hypothesis
Given a description of a hypothesis testing problem, determine null and alternative hypotheses both in words and in equation form
Use confidence intervals to do the following hypothesis tests Test whether the population mean is different from a specified value Test whether two population have the same mean value
Describe the two types of errors that can occur in hypothesis testing
Describe in words what each type of error would consist of in a given real-life problem.