Upload
others
View
6
Download
2
Embed Size (px)
Citation preview
IE 3320 – Engineering Statistics Lecture 11: Sampling Distributions and Data Descriptions
Eduardo Perez, Ph.D.
Ingram School of Engineering
Texas State University
601 University Dr., San Marcos, TX 78666
http://uweb.txstate.edu/~e_p86
E. Perez IE TXSTATE 1
Ch.8 Fundamental Sampling Distributions and Data Descriptions
Today’s Agenda
Chapter 8 Fundamental Sampling Distributions and Data
Descriptions
Statistical Inference
Random Sampling
Some Important Statistics
Sampling Distributions
Sampling Distributions of Means and the Central Limit Theorem
Sampling Distribution of 𝑆2
t-Distribution
E. Perez IE TXSTATE 2
Statistics Probability
Measures of Location (mean)
Measures of Variability (variance)
Sample
Event
Counting Sample Points
Probability of an Event
Random Variables
Probability Distributions
Discrete
Continuous
Joint Probability Distributions
Binomial
Hypergeometric
Negative Binomial
Geometric
Poisson
Uniform
Normal
Gamma
Exponential
Lognormal
E. Perez IE TXSTATE 3
Sampling Distributions
Learning Objectives
Learning Objectives
Explain the general concepts of estimating the parameters of
a population or a probability distribution
Explain the important role of the normal distribution as a
sampling distribution
Understand the central limit theorem
E. Perez IE TXSTATE 4
Statistical Inference
E. Perez IE TXSTATE 5
Statistical Inference
E. Perez IE TXSTATE 6
_________ is concerned with making inferences about
populations and population characteristics.
Population Sample
Deductive Reasoning
(Probability)
Inductive Reasoning
(Inferential Statistics)
Statistical Inference
Statistical inference is always focused on _______________
about one or more parameters of a __________.
An important part of this process is obtaining __________ of the
_____________.
Suppose that we want to obtain a ____________(a reasonable
value) of a population parameter.
We know that before the data are collected, the observations
are considered to be random variables, say, 𝑋1, 𝑋2, … , 𝑋3
Therefore, any function of the observation, or any _______, is
also a _______________.
E. Perez IE TXSTATE 7
Statistical Inference
For example,
the sample _____ 𝑋 and
the sample ________ 𝑆2
are statistics and they are also _______________.
Since a statistic is a random variable, it has a __________
distribution.
We call the probability distribution of a statistic a _________
distribution.
E. Perez IE TXSTATE 8
Random Sampling
E. Perez IE TXSTATE 9
Random Sampling
In this lecture, we focus on _________ from distributions or
populations
We provide introduction to the role that:
the sample mean 𝑋 and
the sample variance 𝑆2
will play in statistical inference in later lectures.
E. Perez IE TXSTATE 10
Population and Samples
The totality of observations (finite or infinite), constitutes
what we call a ___________
Example: If there are 600 students in the school whom we classified
according to blood type, we say that we have a population of size _____.
E. Perez IE TXSTATE 11
Definition 8.1
600
Population and Samples Each __________ in a population is a value of a ________
_________ X having some probability distribution f(x).
Example: If one is inspecting items coming off an assembly line for
defects, then each observation in the population might be a value 0 or 1 of
the Bernoulli random variable X with probability distribution.
where 0 indicates a nondefective item and 1 indicates a defective item.
E. Perez IE TXSTATE 12
http://www.williamsonimages.com/#/industry/manufacturing/_Y3C1423
Population and Samples In statistical inference, we are interested in
Arriving at __________ concerning a ________ when it is impossible
or impractical to observe the entire _________________.
Example: To determine the average length of life of a certain brand of light
bulb, it would be impossible to test all such bulbs if we are to have any left
to sell.
Therefore, we depend on a _________________from the population
to make inferences concerning that ________________.
E. Perez IE TXSTATE 13
http://earthfourenergy.blogspot.com/2012/08/sanjit-bunker-roy-brings-light-to-dark.html
Population and Samples
If our ________ from the sample to the population are to be _____,
Obtain samples that are ______________ of the population.
A _________ procedure that produces inferences that consistently
overestimate or
underestimate
some characteristic of the population is said to be _______.
To eliminate any possibility of bias in the sampling procedure,
Choose a random sample were the observations are made ___________ and
at ___________.
E. Perez IE TXSTATE 14
Definition 8.2
Population and Samples
E. Perez IE TXSTATE 15
Definition 8.3
Some Important Statistics
E. Perez IE TXSTATE 16
Some Important Statistics Our main purpose in selecting random samples is to draw
information about the __________________ parameters.
It is impossible to question every person in order to compute p= x/N representing the population ______________.
Instead, a large random sample is selected
The proportion 𝒑 = 𝒙/𝒏 of people in this sample favoring the brand of coffee in
question is calculated.
𝑝 is now used to make an ________ concerning the ____ proportion p.
E. Perez IE TXSTATE 17
http://www.nih.gov/researchmatters/june2012/06042012coffee.htm
Example: Suppose that we wish to
arrive at a conclusion concerning the
proportion of coffee-drinkers in the
United States who prefer a certain
brand of coffee
Many random samples are possible from the ____ population
we would expect 𝑝 to vary somewhat from ________________
𝑝 is a value of a ________________ that we represent by P.
Such a random variable is called a ________.
E. Perez IE TXSTATE 18
Definition 8.4
Some Important Statistics
In Lecture 7, we introduced the two parameters μ and 𝜎2, which
measure the
center of location and
variability
of a probability distribution.
These are ________ population parameters and are not influenced
by the observations of a _______________.
Now, we define some important statistics that describe
corresponding measures of a ________________.
The most commonly used statistics for measuring the center of a set of
data are the ______, _______, and ______
E. Perez IE TXSTATE 19
Location Measures of a Sample
Let 𝑋1, 𝑋2, … , 𝑋𝑛 represent n random variables.
E. Perez IE TXSTATE 20
Location Measures of a Sample
Sample Mean Sample Median
Sample Mode: is the value of the sample that occurs most often
Example: Suppose a data set consists of the following observations:
0.32, 0.53, 0.28, 0.37, 0.47, 0.43, 0.36, 0.42, 0.38, 0.43
The mode is _______
Let 𝑋1, 𝑋2, … , 𝑋𝑛 represent n random variables.
E. Perez IE TXSTATE 21
Variability Measures of a Sample
Sample Variance Sample Standard Deviation Sample Range
Example: A comparison of coffee prices at 4 randomly selected grocery stores in
San Diego showed increases from the previous month of 12, 15, 17, and 20 cents
for a 1-pound bag. Find the variance of this random sample of price increases.
Sampling Distributions
E. Perez IE TXSTATE 22
Sampling Distributions The field of statistical inference is concerned with
_____________ and ____________.
The population being sampled here is again finite but very
small.
A ________ is a random variable that depends only on the
_____________, and it must have a probability distribution.
E. Perez IE TXSTATE 23
Example: We might state that the average
cost to build a residence in Charleston,
South Carolina, is between $330,000 and
$335,000, based on the estimates of 3
contractors selected at random from the 30
now building in this city http://spot.colorado.edu/~koslicki/photos_house.html
For example, the probability distribution of 𝑋 is called the
sampling distribution ____________.
The sampling distribution of a statistic depends on:
___________ of the population,
________ of the samples,
_________ of choosing the samples
We will study several of the important _______ distributions
of frequently used statistics.
Applications of the sampling distributions are considered throughout
most of the remaining of this course.
E. Perez IE TXSTATE 24
Definition 8.5
Sampling Distributions
The sampling distributions of 𝑋 and 𝑆2 are the mechanisms from
which we will be able to make inferences on the parameters:
The sampling distribution of 𝑋 with sample size n is the
distribution that results when
an experiment is conducted over and over (always with sample size n) and
the ________________ result.
This sampling distribution describes the ___________ of sample averages
around the _______________ μ.
The same principle applies in the case of the _______________.
The sampling distribution produces information about the variability of 𝑠2
values around 𝜎2 in repeated experiments.
E. Perez IE TXSTATE 25
What is the sampling distribution of 𝑋 ?
Sampling Distribution of Means and the Central
Limit Theorem
E. Perez IE TXSTATE 26
Sampling Distribution of Means and the Central Limit
Theorem
The first important sampling distribution to be considered is that of
the _________
If we are sampling from a population with unknown distribution,
either finite or infinite,
The sampling distribution of 𝑋 will is approximately _______ with mean μ and
variance 𝜎2 /n, provided that the sample size is _______.
This amazing result is an immediate consequence of the following
theorem
E. Perez IE TXSTATE 27
Theorem 8.2
Sampling Distribution of Means and the Central Limit
Theorem
The normal approximation for 𝑋 depends on the sample size 𝑛
The figure shows:
the distribution obtained for throws of
a six-sided true die
the distribution of 𝑋 becomes closer
to _______ as n grows _______,
The ______ approximation for 𝑋 will
generally be good if _______
The sample size n = 30 is a
guideline to use for the Central Limit
Theorem.
E. Perez IE TXSTATE 28
Montgomery and Runger 5th edition
Example: An electrical firm manufactures light bulbs that
have a length of life that is approximately normally
distributed, with mean equal to 800 hours and a standard
deviation of 40 hours. Find the probability that a random
sample of 16 bulbs will have an average life of less than
775 hours.
E. Perez IE TXSTATE 29
E. Perez IE TXSTATE 30
Example: Traveling between two campuses of a university in a city via
shuttle bus takes, on average, 28 minutes with a standard deviation of 5
minutes. In a given week, a bus transported passengers 40 times. What
is the probability that the average transport time was more than 30
minutes? Assume the mean time is measured to the nearest minute.
E. Perez IE TXSTATE 31
Suppose that we have _____ populations,
the first with mean 𝜇1 and variance 𝜎12, and
the second with mean 𝜇2 and variance 𝜎22.
Let the statistic
𝑋 1 represent the mean of a random sample of size 𝑛1 selected from
the first population
𝑋 2 represent the mean of a random sample of size 𝑛2 selected from
the second population
What can we say about the sampling distribution of the
difference 𝑋 1− 𝑋 2 for repeated samples of size 𝑛1 and 𝑛2?
The ___________________can be easily extended to the two-
sample, two-population case.
E. Perez IE TXSTATE 32
Sampling Distribution of the Difference between Two
Means
E. Perez IE TXSTATE 33
Sampling Distribution of the Difference between Two
Means
Theorem 8.3
Example: The effective life of a component used in a jet-turbine
aircraft engine is a random variable with mean 5000 hours and
standard deviation 40 hours. The distribution of effective life is
fairly close to a normal distribution. The engine manufacturer
introduces an improvement into the manufacturing process for this
component that increases the mean life to 5050 hours and
decreases the standard deviation to 30 hours. Suppose that a
random sample of 𝑛1 = 16 components is selected from the “old”
process and a random sample of 𝑛2 = 25 components is selected
from the “improved” process. What is the probability that the
difference in the two 𝑋 2− 𝑋 1 sample means is at least 25 hours?
Assume that the old and improved processes can be regarded as
independent populations.
E. Perez IE TXSTATE 34
Example: The television picture tubes of manufacturer A have a mean lifetime of 6.5
years and a standard deviation of 0.9 year, while those of manufacturer B have a mean
lifetime of 6.0 years and a standard deviation of 0.8 year. What is the probability that a
random sample of 36 tubes from manufacturer A will have a mean lifetime that is at least
1 year more than the mean lifetime of a sample of 49 tubes from manufacturer B?
E. Perez IE TXSTATE 35
Sampling Distribution of 𝑆2
E. Perez IE TXSTATE 36
If an engineer is interested in the _____________
resistance of a certain type of resistor,
The sampling distribution of 𝑿 will be exploited once the sample
information is gathered.
However, if the __________ in resistance is to be studied,
The sampling distribution of 𝑺𝟐 will be used in learning about the
population variance 𝜎2.
If a random sample of size n is drawn from a normal
population with mean μ and variance 𝜎2,
we obtain a value of the ___________ by computing the sample
variance
E. Perez IE TXSTATE 37
Sampling Distribution of 𝑆2
The values of the random variable 𝜒2 are calculated from
each ________ by the formula
E. Perez IE TXSTATE 38
Theorem 8.4
Sampling Distribution of 𝑆2
The _________ that a random sample produces a 𝜒2 value
greater than some specified value is equal to
the area under the curve to the _______________.
It is customary to let 𝜒𝛼2 represent the 𝜒2 value above
which we find an area of α.
Table A.5 gives values of 𝜒𝛼2 for various values of α and v.
E. Perez IE TXSTATE 39
Sampling Distribution of 𝑆2
Example: A manufacturer of car batteries guarantees that the
batteries will last, on average, 3 years with a standard deviation of
1 year. If five of these batteries have lifetimes of 1.9, 2.4, 3.0, 3.5,
and 4.2 years, should the manufacturer still be convinced that the
batteries have a standard deviation of 1 year? Assume that the
battery lifetime follows a normal distribution..
E. Perez IE TXSTATE 40
http://www.amazingbattery.com/
t-Distribution
E. Perez IE TXSTATE 41
Earlier in this lecture, we discussed the utility of the Central
Limit Theorem.
Its applications revolve around inferences on a _________
mean or the difference between ____________ means.
However, it was assumed that the population standard
deviation is __________.
Assumption holds in situations where the engineer is familiar with the
system
In many experimental scenarios, _____________is certainly
no more reasonable than knowledge of
the population mean μ.
E. Perez IE TXSTATE 42
Sampling Distribution of 𝑆2
Often, an estimate of __ must be _________ by the same sample
information that produced the sample average 𝑥 .
As a result, a natural statistic to consider dealing with inferences
on μ is
since S is the sample analog to 𝜎
If the sample size is large enough, say n≥ 30, the ____________
does not differ considerably from the ________________.
E. Perez IE TXSTATE 43
Sampling Distribution of 𝑆2
E. Perez IE TXSTATE 44
Theorem 8.5
Sampling Distribution of 𝑆2
E. Perez IE TXSTATE 45
Corollary 8.1
Sampling Distribution of 𝑆2
The distribution of T is similar to the distribution of Z in that
they both are
___________________about a mean of zero.
The t-distribution is more variable, owing to the fact that the
T-values depend on the fluctuations of two quantities,
whereas the Z-values depend only on the changes in __ from sample
to sample.
It is customary to let 𝑡𝛼 represent the t-value above which
we find an ______ equal to α.
E. Perez IE TXSTATE 46
What does the t-Distribution Look Like?
Since the t-distribution is symmetric about a mean of zero,
we have __________________
The t-value leaving an area of 1− α to the right and
therefore an area of α to the left is equal to
The negative t-value that leaves an area of α in the right tail of the
distribution.
That is, 𝑡0.95 = −𝑡0.05, have 𝑡0.99 = −𝑡0.01 , and so forth.
E. Perez IE TXSTATE 47
What does the t-Distribution Look Like?
Example: The t-value with v = 14 degrees of freedom that leaves an area of
0.025 to the left, and therefore an area of 0.975 to the right, is
Example: Find 𝑃(−𝑡0.025 < 𝑇 < 𝑡0.05)
E. Perez IE TXSTATE 48
Example: Find k such that 𝑃(𝑘 < 𝑇 < −1.761) = 0.045 for a random sample
of size 15 selected from a normal distribution and 𝑋 −𝜇
𝑆 𝑛
E. Perez IE TXSTATE 49
Example: A chemical engineer claims that the population mean yield of a
certain batch process is 500 grams per milliliter of raw material. To check this
claim he samples 25 batches each month. If the computed t-value falls
between−𝑡0.05 and 𝑡0.05, he is satisfied with this claim. What conclusion should
he draw from a sample that has a mean 𝑥 = 518 grams per milliliter and a
sample standard deviation s = 40 grams? Assume the distribution of yields to
be approximately normal.
E. Perez IE TXSTATE 50
Today’s Agenda
Chapter 8 Fundamental Sampling Distributions and Data
Descriptions
Statistical Inference
Random Sampling
Some Important Statistics
Sampling Distributions
Sampling Distributions of Means and the Central Limit Theorem
Sampling Distribution of 𝑆2
t-Distribution
E. Perez IE TXSTATE 51