Math 321 - Dr. Minnotte 1
Math 321 - Dr. Minnotte 2
Introduction: What is Statistics?
Definition: Statistics is the science of measurement and decision-making under conditions of uncertainty, randomness, and variability.
More briefly: Statistics is the field of dealing with data.
In statistics, we make observations, to collect information, to help make decisions.
If that sounds familiar, it should. We do that sort of thing every day, in every field of study, and in our everyday life.
In statistics, we simply formalize this process mathematically. This allows us to recognize smaller differences than might otherwise be found, and to make decisions under conditions of greater uncertainty.
Math 321 - Dr. Minnotte 3
The term statistic is also used to describe any bit of numerical information, like the 6.3% unemployment rate in April, 2014 or the 15,143 students enrolled at UND in Fall, 2013.
These numerical bits of data are thrown at us every time we read the newspaper, or watch TV news, or read a journal in our field.
Just as words should be read with understanding, so should statistics. If we uncritically accept the numbers others give us, we open ourselves to believing misinformation.
Math 321 - Dr. Minnotte 4
Math 321 - Dr. Minnotte 2
Statistics are an important tool in almost every field. In this class, well look at examples like:
How can doctors tell if a new vaccine really works?
How can irrigation engineers use past river flow rates to predict future flows?
How can polltakers use responses from a few thousand voters to predict the results of an election in which more than a hundred million people vote?
What are some other examples of statistics in practice?
Math 321 - Dr. Minnotte 5
The Challenger Disaster: A Statistical Cautionary Tale
In 1986, a lack of statistical thinking contributed to a tragedy: the explosion of the space shuttle Challenger.
The destruction of the Challenger killed seven astronauts, including Christa McAuliffe, a 37-year-old teacher selected to be the first teacher in space, and set the U.S. manned space program back several years.
Math 321 - Dr. Minnotte 6
Math 321 - Dr. Minnotte 7
Math 321 - Dr. Minnotte 3
The solid rocket motors used to launch the space shuttles are shipped to the Kennedy Space Center in four pieces. Large rubber O-rings are used to seal the three joints between the pieces.
The Challenger explosion occurred when one of the O-rings failed to seal quickly enough to prevent hot gasses from escaping from the rocket and igniting the large external fuel tank.
Implicated in the failure was the unusually cold (for Florida) launch temperature of 29F.
Math 321 - Dr. Minnotte 8
The night before the launch, forecasters predicted a temperature of 31F for the launch time.
A three-hour teleconference took place between people at:
Morton Thiokol (manufacturer of the rocket motors)
Marshall Space Flight Center (NASA center for motor design control), and
Kennedy Space Center.
Math 321 - Dr. Minnotte 9
There was concern that the cold temperatures could lead to problems with the O-rings.
In 7 out of 23 previous launches, some O-ring damage had occurred.
Some participants recommended delaying the launch until the temperature rose above 53F, the lowest previous launch temperature, in which the greatest number of damaged O-rings occurred.
Math 321 - Dr. Minnotte 10
Math 321 - Dr. Minnotte 4
In the end, the recommendation was made to launch on schedule, in part because of the following plot.
The plot shows temperature vs. number of damaged O-rings for the 7 affected launches.
The relationship seems limited, at most.
What error was made preparing this plot?
Math 321 - Dr. Minnotte 11
Math 321 - Dr. Minnotte 12
Math 321 - Dr. Minnotte 13
Math 321 - Dr. Minnotte 5
By only including the launches in which incidents occurred, the investigators left out some important information!
When the data from all 23 launches is plotted, a temperature dependence becomes obvious.
All of the 4 launches below 66F had damage. Only 3 out 16 flights above that temperature
suffered damage.
Note where 31F or 29F would appear on that plot.
Math 321 - Dr. Minnotte 14
More sophisticated analyses are possible, but unnecessary.
Had the concerned engineers presented the complete data in such a format, they might well have convinced the decision-makers to delay the launch and prevented the tragedy.
Theres more to this story, so well return to it later in the semester.
Math 321 - Dr. Minnotte 15
Math 321 - Dr. Minnotte 16
Chapter 1: Univariate Data -Populations and Samples Definition: A population consists of all
potential observations from a distribution of interest.
In an enumerative study, the population will be tangible, real and finite, and might be represented by a sampling frame listing the members of the population.
o Examples include populations of people, or corporations, or items in a shipment.
Math 321 - Dr. Minnotte 6
Math 321 - Dr. Minnotte 17
In an analytic study, we study an ongoing process, and the conceptual population is infinite and simply a useful theoretical construct. No sampling frame is possible.
o Examples include populations of rainfall over time, or objects coming off an ongoing assembly line, or repeated measurements of the same underlying weight.
As an investigator, you have a great deal of flexibility in defining the population of interest.
Math 321 - Dr. Minnotte 18
Example: We are interested in the ages of UND students. What are some possible relevant populations?
Example: A quality engineer wishes to study the volume of milk in containers coming off a production line. What are possible populations?
Example: We wish to examine the incidence of obesity in preteen children. What is an appropriate population?
Math 321 - Dr. Minnotte 19
Once we have defined our population, we take a sample from that population.
Measurements from each member of the sample will be the observations which make up the dataset we will analyze.
Example: Student ages.
Math 321 - Dr. Minnotte 7
Experiments Suppose that a chemical engineer wants
to determine how the concentration of a catalyst affects the yield of a process.
The engineer can run the process several times, changing the concentration each time and compare the yields that result.
This sort of experiment is called a controlled experiment because the values of the concentration variable are under the control of the experimenter.
Math 321 - Dr. Minnotte 20
Observational Studies There are many situations in which scientists
cannot control the variables of interest. Many studies have been conducted to
determine the effect of cigarette smoking on the risk of lung cancer. In these studies, rates of cancer among smokers are compared with rates among nonsmokers.
The experimenter cannot control who smokes and who doesnt.
This kind of study is called an observational study.
Math 321 - Dr. Minnotte 21
Math 321 - Dr. Minnotte 22
When we study a sample, we must make sure it is representative of the population.
One option is a census, or complete enumeration, of everyone in the population. What are some problems with this approach?
Math 321 - Dr. Minnotte 8
Math 321 - Dr. Minnotte 23
Usually, the best solution is to take a random sample, choosing your sample with planned probability methods.
The most basic such method is called a simple random sample (SRS).
In a SRS, we draw individuals out of the population with the equivalent of drawing names out of a (well-mixed) hat.
Each subset of the population of the appropriate size is equally likely to make up the sample.
This is theoretically convenient, but often hard to arrange in practice.
Math 321 - Dr. Minnotte 24
When viewed in order, or over time, the observations of a SRS should not show any noticeable pattern or trend.
A SRS is not guaranteed to reflect the population perfectly.
SRSs always differ in some ways from each other; occasionally a sample is substantially different from the population.
This phenomenon is known as sampling variation.
Math 321 - Dr. Minnotte 25
Math 321 - Dr. Minnotte 9
The items in a sample are independent if knowing the values of some of the items does not help to predict the values of the others.
Items in a simple random sample may be treated as independent in most cases encountered in practice. The exception occurs when the population is finite and the sample comprises a large fraction (more than 5%) of the population.
Math 321 - Dr. Minnotte 26
Math 321 - Dr. Minnotte 27
Samples of Convenience A nonrandom sample, or sample of
convenience, may be easier to collect, but may be nonrepresentative in some important ways.
Such a sample may bias your results, making them worthless (or at least a whole lot less trustworthy).
Math 321 - Dr. Minnotte 28
Example: We are interested in the size of hometowns for all U.S. college students, but only sample at UND.
Example: We want to survey UND students on math anxiety, and pick a class to interview:
Math 321? Upper-division English?
Math 321 - Dr. Minnotte 10
Math 321 - Dr. Minnotte 29
Example: Not everyone will consent to test a new AIDS vaccine. We could give those who consent the vaccine, and leave those who dont alone to be the control group.
What about a historical control (compare vaccinated group with past infection rates)?
Terminology and Notation From each individual person or object in
our sample, we are generally interested only in a small number of characteristics.
Each characteristic we record will be called a variable, and assigned a letter from the end of the alphabet.
Math 321 - Dr. Minnotte 30
Math 321 - Dr. Minnotte 31
Data that we collect may be of two main types:
1) Categorical classifying the subject into one of several distinct groups.
o X = Sexo T = Hair Coloro W = Zip Code
2) Numerical data recorded as a number, where operations like averages make sense.
o Y = Ageo U = Rainfallo Z = Volume of milk
Math 321 - Dr. Minnotte 11
We also classify datasets based on how many variables we measure on each individual.
If we only collect a single variable (e.g. age), we say the dataset is univariate.
If we collect two variables for each individual (e.g. age and sex), we say it is bivariate.
With still more variables, we say that it is trivariate, quadrivariate, and so on, or more commonly, that it is multivariate.
Math 321 - Dr. Minnotte 32
Math 321 - Dr. Minnotte 33
We often use subscripts on the variable name (letter) to indicate specific observations in a dataset, such as X1, X2, , Xn.
A subscript of i (occasionally j or k) indicates a specific, but arbitrary, observation.
We usually reserve the label n for the number of observations (the sample size).
Math 321 - Dr. Minnotte 34
There are two primary branches of statistics:
1) Descriptive statistics simply attempts to simplify and understand a dataset.
2) Inferential statistics attempts to say (infer) something about the broader population or distribution from which the data was drawn.
Descriptive statistics are simpler, so well start there.
Math 321 - Dr. Minnotte 12
Math 321 - Dr. Minnotte 35
Summary Statistics (1.2) Given data X1, X2, , Xn, we frequently use
sample statistics to summarize the dataset.
A statistic is anything which may be calculated from a dataset. A sample statistic simply makes clear that it derives from a sample.
Use of sample statistics can improve our understanding of the data, as well as make it easier to communicate with others about it.
Math 321 - Dr. Minnotte 36
The Sample Mean The most important feature of a dataset to
describe is generally its location, or the location of its center.
The most commonly used statistic for center is the familiar average, or sample mean.
Definition: The sample mean of data X1, X2, , Xn is
Math 321 - Dr. Minnotte 37
Example: Stocks:
Math 321 - Dr. Minnotte 13
To understand how the mean works, suppose we were to take a very thin yardstick or similarly marked board, and place a small (equal) weight at the mark for each observations value.
The mean may be thought of as the point where this would balance.
Math 321 - Dr. Minnotte 38
Outliers An outlier is an observation which is very
different from the rest of the sample. For univariate data, this means it is much larger or much smaller than the rest.
Outliers should be carefully examined. Often they are the result of measurement or recording errors.
If so, they should be fixed or deleted. Correct but unusual values, however, should be kept.
Math 321 - Dr. Minnotte 39
Math 321 - Dr. Minnotte 40
The sample mean is not robust (resistant to outliers). Changing even one observation can change the sample mean as much as we want.
Example: Mistype the final stock return as 374 (instead of 37.4). What is the sample mean now?
Math 321 - Dr. Minnotte 14
Math 321 - Dr. Minnotte 41
Measures of Variability After center, the second-most-used
feature to describe a sample is its variability, or spread.
The simplest measure of variability is the range, the difference between the maximum and minimum values.
R = max(X) min(X)
Unfortunately, the range both wastes most of the data, and is maximally non-robust, using only the two extreme data points, so it is rarely used.
Math 321 - Dr. Minnotte 42
A better solution looks at the deviations from the mean, This removes the effect of the mean (location), and looks only at the variability around the mean.
One option: Look at the average deviation from the mean.
Problem: Positive deviations cancel out negative ones, and the average deviation from the mean is always 0.
Math 321 - Dr. Minnotte 43
Math 321 - Dr. Minnotte 15
Math 321 - Dr. Minnotte 44
We could take absolute values of the deviations, but for a few theoretical reasons, its better to look at the squared deviations instead.
Definition: The sample variance, s2, measures the spread of a dataset.
Definition: The sample standard deviation, s, is the square root of the sample variance.
Math 321 - Dr. Minnotte 45
Use of the definition formula is tedious, as it requires finding and squaring each of the n deviations from the mean.
It is usually simpler to calculate s2 using the following computation formula.
Math 321 - Dr. Minnotte 46
Example: What are the variance and standard deviation of the stocks data?
Math 321 - Dr. Minnotte 16
Math 321 - Dr. Minnotte 47
The sample variance and standard deviation are measures of the spread of a dataset, and estimates of the variance and standard deviation of the underlying population or distribution.
Like the sample mean, they are not robust.
Example: Stocks, replace 37.4 with 374: s2 = ? s = ?
Math 321 - Dr. Minnotte 48
While very useful practically and theoretically, the variance and standard deviation are a little tricky intuitively.
One helpful rule of thumb: About 2/3 of data should fall in About 95% of data should fall in Almost all data should fall in
Example: Stock data:
If X1, , Xn is a sample, and Yi = a + b Xi,where a and b are constants, then
This is most commonly needed if we change units for our data.
Math 321 - Dr. Minnotte 49
Math 321 - Dr. Minnotte 17
Example: Let X1,,Xn be a sample of temperatures measured in degrees Celsius, with = 30. Let Y1,,Yn be the same temperatures in degrees Fahrenheit, Yi = 9/5 Xi + 32. What is ?
Example: Let the variance of the Celsius temperatures be = 25.
What is the standard deviation? What is the variance of the Fahrenheit
temperatures? The s.d.?
Math 321 - Dr. Minnotte 50
Math 321 - Dr. Minnotte 51
Definition: The ith order statistic, X(i), is the ith smallest value when the Xs are sorted. The minimum is X(1), the second smallest X(2), and so on up to the maximum, X(n).
Order Statistics and Robust Measures of Center and Spread
Math 321 - Dr. Minnotte 52
Example: Stock data (sorted):
X(1) = -7.2, X(4) = 1.3, X(20) = 37.4, and so on.
Because outliers will always be in the first or last few order statistics, values computed from middle order statistics will be very robust.
Math 321 - Dr. Minnotte 18
Math 321 - Dr. Minnotte 53
Definition: The sample median, , is the middle of the sorted data.
If n is odd, the sample median is the (n+1)/2thorder statistic.
If n is even, it is the average of the n/2th and (n+2)/2th order statistics.
Example: Stocks: = ?
Math 321 - Dr. Minnotte 54
The sample median has 50% of the data on either side of it.
The sample median is very robust; changing one or a few observations wont change it much, if at all.
Example: Stocks: Replace 37.4 with 374, and the sample median remains 17.6
Math 321 - Dr. Minnotte 55
Quartiles The quartiles of the data divide the sample
into quarters. The first quartile, Q1, splits the lowest quarter
of the sample from the rest. If (n+1)/4 is an integer, Q1 is the (n+1)/4 order
statistic. If (n+1)/4 is not an integer, Q1 is the average of
the two order statistics on either side.
The third quartile, Q3, splits the highest quarter from the rest.
Find it as Q1, but using 3(n+1)/4.
Math 321 - Dr. Minnotte 19
Math 321 - Dr. Minnotte 56
Example: Sorted stocks:
Q1 = ? Q3 = ?
Math 321 - Dr. Minnotte 57
Definition: The sample interquartile range is a robust measure of spread, found as the difference between the sample quartiles, IQR = Q3 Q1.
Example: Stocks: IQR = ?
Note: Changing 37.4 to 374 doesnt change Q1, Q3, or IQR.
Math 321 - Dr. Minnotte 58
Percentiles Definition: The pth sample percentile, has
(roughly) p% of the data below it, and (100-p)% above it.
Compute p(n + 1)/100. If this is an integer, use that order statistic. If not, average the two closest order statistics.
The median and quartiles are just special names for the 50th, 25th, and 75thpercentiles.
Math 321 - Dr. Minnotte 20
Example: Descriptive Statistics in Minitab
Descriptive Statistics: Stock Returns 1976-1995
Variable Mean StDev Variance Minimum Q1 Median Q3 MaximumStock Returns 19 15.37 13.66 186.49 -7.20 5.48 17.60 28.90 37.40
Variable IQRStock Returns 19 23.43
Math 321 - Dr. Minnotte 59
Math 321 - Dr. Minnotte 60
Basic Statistical Graphics (1.3) Some of the most powerful tools available
for understanding a dataset are graphics which we can use to look at our data.
Its very hard to get much useful out of large tables or long columns of numbers. But the human eye is very good at picking out patterns in pictures.
Math 321 - Dr. Minnotte 61
Bar Charts Given categorical data, the most useful
plot available is usually a simple bar chart.
A bar is drawn for each category, with the height proportional to the count (frequency) or percentage found in that category.
Other measurements for each category may also be compared.
Math 321 - Dr. Minnotte 21
Math 321 - Dr. Minnotte 62
Example: Television Picture Grades Perfect, Good, Satisfactory, Fail
Math 321 - Dr. Minnotte 63
Category CountPerfect 64Good 47Satisfactory 33Fail 6Total 150
Math 321 - Dr. Minnotte 64
Spaces between the bars show categories.
Bars should start at 0 and show full height (no truncation!). Otherwise, relative heights get distorted.
Math 321 - Dr. Minnotte 22
Math 321 - Dr. Minnotte 65
Math 321 - Dr. Minnotte 66
Unless there is a strong natural ordering (e.g. poor-fair-good-excellent; notalphabetical), bars should be sorted in ascending or descending order. This makes comparisons between close values much easier.
Math 321 - Dr. Minnotte 67
Math 321 - Dr. Minnotte 23
Math 321 - Dr. Minnotte 68
Math 321 - Dr. Minnotte 69
Many categories or long category names may be better served by horizontal bars.
Math 321 - Dr. Minnotte 70
3-D perspective looks fancy but hurts clarity usually a bad idea.
Math 321 - Dr. Minnotte 24
A stacked bar chart includes a second categorical variable, but focuses on the totals for the main category of the bars.
Math 321 - Dr. Minnotte 71
0100200300400500600700800900
1000
1st Class 2ndClass
3rdClass
Crew
Individuals on the Titanic
SurvivedDied
A clustered bar chart focuses on the counts of the specific combinations of categories, and is useful for comparing the distribution of one variable for different values of the other.
Math 321 - Dr. Minnotte 72
0100200300400500600700800
1stClass
2ndClass
3rdClass
Crew
DiedSurvived
Example Minitab Bar Charts
Math 321 - Dr. Minnotte 73
Math 321 - Dr. Minnotte 25
Math 321 - Dr. Minnotte 74
Math 321 - Dr. Minnotte 75
Math 321 - Dr. Minnotte 76
Pie Charts The other common chart for categorical
data.
A pie chart should only be used when the categories represent (all of the) parts of some whole, and so should always plot percentages.
Math 321 - Dr. Minnotte 26
Math 321 - Dr. Minnotte 77
Each categorys slice gets an angle equal to
Math 321 - Dr. Minnotte 78
Comparing angles is much more difficult than comparing heights or lengths. Bar charts are almost always more effective.
3-D pie charts are the work of the devil. (Probably worse than no chart.)
Minitab:
Math 321 - Dr. Minnotte 79
Math 321 - Dr. Minnotte 27
Dotplots Dotplots are simple plots which are very
useful for looking at univariate numeric data, especially when the sample size is small or there are many ties in the data.
Each observation is plotted at its location above an appropriate number line. If there are ties, one dot is stacked for each tied observation.
Math 321 - Dr. Minnotte 80
Example: Temperature (F) at launch of the first 25 space shuttle launches.
Math 321 - Dr. Minnotte 81
66 70 69 80 6867 72 73 70 5763 78 70 67 5375 67 70 81 7679 75 76 58 31
Math 321 - Dr. Minnotte 82
Histograms A histogram is a bar chart for numerical
data.
The shape of the histogram describes the shape of the distribution of the data.
If you have a large, randomly collected sample, the shape is also descriptive of the population the sample was taken from.
Your book also describes stem-and-leafplots, which are similar, but rarely used.
Math 321 - Dr. Minnotte 28
Math 321 - Dr. Minnotte 83
Constructing a Histogram1) Find the minimum and maximum of the
data.
2) Break that interval into class intervals. 5-20 classes is often a good start. More for
large samples, less for small ones. A reasonable rule of thumb is
Select your classes so that each is of equal width.
Math 321 - Dr. Minnotte 84
3) Find the frequencies (counts, ni) and relative frequencies (fi = ni/n) in each class.
4) Plot the bar chart with a bar over each class whose height equals fi or ni.
Math 321 - Dr. Minnotte 85
Example: Stock Data (Annual Rate of Return, 1976-1995):
Math 321 - Dr. Minnotte 29
Math 321 - Dr. Minnotte 86
The shape of the histogram tells us about the distribution. Some things to look for include:
Is the distribution left-skewed? Symmetric? Right-skewed?
Math 321 - Dr. Minnotte 87
Is the distribution bimodal?
Multimodal?
Are there any outliers?
Math 321 - Dr. Minnotte 88
Its a good idea to look at several choices of bin width and location, as different choices here can produce dramatically different histograms.
Features that remain in many histograms are likely to be trustworthy; those that only appear sometimes are less certain.
Math 321 - Dr. Minnotte 30
Math 321 - Dr. Minnotte 89
Example: Milk Fill Weights Data
Math 321 - Dr. Minnotte 90
Math 321 - Dr. Minnotte 91
Math 321 - Dr. Minnotte 31
Math 321 - Dr. Minnotte 92
Math 321 - Dr. Minnotte 93
Math 321 - Dr. Minnotte 94
Definition: A boxplot is another graphical tool for displaying a sample:
Boxplots
Math 321 - Dr. Minnotte 32
Math 321 - Dr. Minnotte 95
The box goes from the first to the third quartile, with a line at the median.
For boxplots, outliers are usually defined as any values below
Q1 1.5 IQRor above Q3 + 1.5 IQR.Those points are marked individually.
The whiskers go from the quartiles to the least and greatest values among the non-outliers.
Math 321 - Dr. Minnotte 96
Boxplots are much less informative than histograms for a single distribution, so the histogram is usually preferable.
On the other hand, comparing histograms is difficult, while comparing boxplots is easy.
Use boxplots to compare 2-20 (or more) distributions.
Math 321 - Dr. Minnotte 97
Example: Fish length data
Math 321 - Dr. Minnotte 33
Math 321 - Dr. Minnotte 98
Example: Circuit board data by board.
Math 321 - Dr. Minnotte 99
Ch. 2: Bivariate Data
Statistics is most powerful when looking at relationships between variables.
In the simplest case, this involves looking at pairs of measurements made on the same subjects, (x, y).
Recall, such data is called bivariate (two variables).
Math 321 - Dr. Minnotte 100
Examples: Heights and weights of a group of people. ACT score and Freshman GPA for college
students. January and April average temperatures for
many years at a specified location. January and February inflows of the Nile river
at a location.
Math 321 - Dr. Minnotte 34
Math 321 - Dr. Minnotte 101
We usually picture our variables in a cause-and-effect relationship.
The explanatory (independent, predictor) variable, x, is assumed to play some role in determining the value of the response(dependent) variable, y.
x y
Math 321 - Dr. Minnotte 102
Scatterplots (2.1)
Definition: A scatterplot is the most common graph for displaying bivariatedata. It consists of plotting each point at (xi, yi), on a standard x-y graph.
The pattern formed by the points describes the relationship between the variables.
Math 321 - Dr. Minnotte 103
Math 321 - Dr. Minnotte 35
Math 321 - Dr. Minnotte 104
Math 321 - Dr. Minnotte 105
Math 321 - Dr. Minnotte 106
Math 321 - Dr. Minnotte 36
Math 321 - Dr. Minnotte 107
Minitab Scatterplot:
Correlation Suppose we have a sample of (x, y) pairs
and compute the sample means, and
For each observation (xi, yi), compute the product of the two deviations from the means.
Dividing the scatterplot at the means results in two quadrants where the product is positive, and two where it is negative.
Math 321 - Dr. Minnotte 108
Math 321 - Dr. Minnotte 109
Math 321 - Dr. Minnotte 37
For a scatterplot with a positive relationship, most of the products will have a positive sign, and the sum will be positive.
Likewise, if the picture shows a negative relationship, the sum of the products will be negative.
Unfortunately, the exact value of the sum depends on the units and spread (as measured by standard deviation) of the variables.
Math 321 - Dr. Minnotte 110
Math 321 - Dr. Minnotte 111
Dividing by measures of spread for x and ysolves this issue.
Then is a good, unitlessmeasure of the linear relationship between xand y called the correlation coefficient.
Math 321 - Dr. Minnotte 112
Example: Nile flow data: n=115
What is r?
Math 321 - Dr. Minnotte 38
Math 321 - Dr. Minnotte 113
Properties of r1. The value of r does not depend on the units of x
or y. We will not change r if we multiply all xs, all ys, or both by a positive constant or if we add any constant to all xs, all ys, or both.
2. The value of r does not depend on which variable is labeled x.
3. Correlation is always between -1 and +1.
4. The sign of r shows whether the relationship between x and y is positive or negative.
Math 321 - Dr. Minnotte 114
Properties of r (continued)5. The absolute value of r measures the strength of the
linear relationship between x and y. Roughly speaking:
a. If |r| < 0.5, the relationship (if any) is weak.b. If 0.5 < |r| < 0.8, the association is moderate.c. If 0.8 < |r| < 1.0, the association is strong.d. If |r| = 1.0, the association is perfect. This occurs only
when all (x, y) points fall in a perfect line.
Note that strength is often context- and discipline-dependent. An engineer might find any correlation less than .95 to be weak, while a social scientist might find a correlation of .3 to be very strong.
Math 321 - Dr. Minnotte 115
Math 321 - Dr. Minnotte 39
Math 321 - Dr. Minnotte 116
Math 321 - Dr. Minnotte 117
Properties of r (continued)6. The correlation coefficient cannot measure the
strength of a nonlinear (curved) relationship.
Math 321 - Dr. Minnotte 118
7. Outliers can also lead to an inappropriate value -in either direction!
Math 321 - Dr. Minnotte 40
Math 321 - Dr. Minnotte 119
High correlation indicates strong association, not necessarily causality.
If |r| is large, there are at least 3 possible explanations:
1) x determines y2) y determines x3) Some third value, z, (called a confounding
factor) determines both x and y.
Math 321 - Dr. Minnotte 120
Example: Weekly surveys show that per capita chocolate consumption is strongly correlated with traffic fatalities.
Should driving under the influence of chocolate be outlawed?
Do people eat a lot of chocolate at funerals? Is there a third explanation that makes more
sense?
Math 321 - Dr. Minnotte 121
Example: Over time, ministers salaries in Massachusetts are strongly correlated with the price of rum in Havana. What is the causal relationship here?
Example: Childrens shoe size is correlated with size of vocabulary. What is the causal relationship?
Math 321 - Dr. Minnotte 41
One advantage of well-designed randomized, controlled experiments is that potential confounding factors should be (roughly) balanced between levels of the independent variable we are investigating, so should be much less likely to produce a spurious correlation.
Math 321 - Dr. Minnotte 122
Math 321 - Dr. Minnotte 123
Linear Regression (2.2 2.3) Definition: Regression involves modeling
and predicting the values of one response variable, based on the observed values of one or more other explanatory variables.
Well focus on the case of simple linear regression, where a straight line is fit to a scatterplot of x and y.
Math 321 - Dr. Minnotte 124
We want an equation for a line of the form
The most common way to estimate and uses the least squares fit, minimizing
This leads to the least squares estimates,
Math 321 - Dr. Minnotte 42
Deviations from a potential regression line:
Math 321 - Dr. Minnotte 125
Math 321 - Dr. Minnotte 126
The least squares line best fits the scatter plot.
Math 321 - Dr. Minnotte 127
Example: Nile flow data
What is the least-squares line for this data, and what should we predict the flow for February to be if Januarys was 3?
Math 321 - Dr. Minnotte 43
Math 321 - Dr. Minnotte 128
What would we predict for February from a January value of 10?
Is this likely to be a valid prediction? (Recall, Januarys mean is about 4, and its standard deviation is about 1.)
Extrapolation outside the range of the data is dangerous.
Math 321 - Dr. Minnotte 129
Math 321 - Dr. Minnotte 130
Residuals and Goodness-of-Fit Definition: Given a data set (xi, yi) and an
associated fitted regression model, the fitted value for observation i is
Definition: The residual for i is
The smaller the residuals, the better x and the regression line are at predicting y.
Math 321 - Dr. Minnotte 44
The error sum of squares (SSE) is
SSE is usually compared to the total sum of squares, SST:
and the regression sum of squares, SSR:
To avoid having to calculate all the residuals, we may use the computing formula:
SSE = SST - SSR
Math 321 - Dr. Minnotte 131
Math 321 - Dr. Minnotte 132
The coefficient of determination, r2, measures the proportion of the total variation of y which is explained by x:
The closer r2 is to 1, the more successful the relationship is at explaining the variation in y.
As the notation suggests, the coefficient of determination is the square of the correlation coefficient.
Math 321 - Dr. Minnotte 133
Math 321 - Dr. Minnotte 45
Math 321 - Dr. Minnotte 134
Example: Nile flow data:
Find SST, SSR, SSE, and r2.
What do these say about our predictions?
Note: r = 0.933.
The coefficient of determination r2 is found as R-Sq in Minitab output.
The sums of squares may be found in the SS column of the Analysis of Variance table.
The regression equation isFebruary Inflow = - 0.4698 + 0.8362 January Inflow
S = 0.330519 R-Sq = 87.1% R-Sq(adj) = 87.0%
Analysis of VarianceSource DF SS MS F PRegression 1 83.3794 83.3794 763.25 0.000Error 113 12.3444 0.1092Total 114 95.7238
Math 321 - Dr. Minnotte 135
Math 321 - Dr. Minnotte 136
Chapter 3: Probability
Definition: Probability is the branch of mathematics dealing with chance, randomness, and uncertainty.
Probability provides most of the mathematical foundation for inferential statistics.
Math 321 - Dr. Minnotte 46
Math 321 - Dr. Minnotte 137
Definition: A situation for which the outcome cannot be determined in advance is called an experiment.
Examples: The roll of a die. The draw of a card. The lifetime of an electronic component.
Math 321 - Dr. Minnotte 138
Definition: The sample space, S, of an experiment is the set of all possible outcomes.
Examples: Die: S = {1, 2, 3, 4, 5, 6} Card: S = ? Component: S = ?
An experiment with several steps can be visually represented by a tree diagram:
Example: Toss a coin three times:
Math 321 - Dr. Minnotte 139
Math 321 - Dr. Minnotte 47
Math 321 - Dr. Minnotte 140
Events Definition: Set A is a subset of set B
(A B) if every element of A is also in B. Example: S = {1, 2, 3, 4, 5, 6}
A = {1, 3, 5} S B = {1, 2, 6, 7} S
Every set is a subset of itself. The empty set, , consisting of no
elements, is a subset of every set.
Math 321 - Dr. Minnotte 141
Definition: Any interesting subset of the sample space can be called an event.
Examples: Die: A = odd numbers = {1, 3, 5} Card: B = ? Component: C = ?
The individual outcomes which make up Sare sometimes called simple events.
Math 321 - Dr. Minnotte 142
For subsets of S, A and B (A S, B S):1) The union of A and B (A B) is the set
consisting of all elements found in A, B, or both.
Keyword: or
Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} S B = {1, 2, 3} S A B = ?
Combining Events
Math 321 - Dr. Minnotte 48
Math 321 - Dr. Minnotte 143
2) The intersection of A and B (A B) is the set consisting of all elements found in bothA and B.
Keywords: and, both
Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} B = {1, 2, 3} A B = ?
Math 321 - Dr. Minnotte 144
3) The complement of A (Ac) is the set consisting of all elements of S not found in A.
Keyword: not
Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} Ac = ?
Math 321 - Dr. Minnotte 145
4) Sets A and B are said to be mutually exclusive if there are no elements in both A and B. That is, if A B = (the empty set).
Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} C = {4, 6} A and C = , so A and C are mutually
exclusive.
Math 321 - Dr. Minnotte 49
Example: Three coin tosses.S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Let A = First toss is a head = ? Let B = Last toss is a head = ? What simple events make up the event A and
B?
A or B? Not A? Are A and B mutually exclusive?
Math 321 - Dr. Minnotte 146
Math 321 - Dr. Minnotte 147
The Axioms of Probability Definition: A probability function P() is a
function from subsets of S (events) to the real numbers which satisfies the following axioms of probability:
1) P(S) = 1.2) 0 P(A) 1 for all events A.3) If A and B are mutually exclusive,
P(A B) = P(A) + P(B).
Math 321 - Dr. Minnotte 148
Example: A fair die. P(1) = 1/6, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,
P(5) = 1/6, P(6) = 1/6.
Probabilities of bigger events are found by axiom 3:
P({1,3}) = P(1) + P(3) = 1/6 + 1/6 = 2/6 = 1/3 P({1,3,5}) = ?
Math 321 - Dr. Minnotte 50
Math 321 - Dr. Minnotte 149
Example: A biased die. P(1) = 1/12, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,
P(5) = 1/6, P(6) = 3/12 = 1/4.
Note:
(as required by axiom 2) P({1,3}) = P(1) + P(3) = 1/12 + 1/6 = 1/4 P({1,3,5}) = ?
Math 321 - Dr. Minnotte 150
When applied to real experiments, probability measures (long-term) likelihood: if the experiment is repeated many times, event A should occur roughly P(A) fraction of the time.
Math 321 - Dr. Minnotte 151
Additional Properties of Probability The axioms of probability imply some
additional properties:
1) For any event A, P(Ac) = 1 P(A). This is sometimes called the complementary
events rule, or the opposites rule.
Show:
Note: Since Sc = , P() = 0.
Math 321 - Dr. Minnotte 51
Math 321 - Dr. Minnotte 152
2) For any events A and B, P(A B) = P(A) + P(B) P(A B).
This is sometimes called the general addition rule.
Show:
Note: if A and B are mutually exclusive, P(A B) = P() = 0, so this is the same as axiom 3.
Math 321 - Dr. Minnotte 153
Example: A fair die. P(1) = 1/6, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,
P(5) = 1/6, P(6) = 1/6. A = {1, 3, 5}, P(A) = 3/6 = 1/2. B = {1, 2}, P(B) = 2/6 = 1/3. P(Ac) = ?
A B = {1}, P(A B) = 1/6. P(A B) = ?
Math 321 - Dr. Minnotte 154
We dont need to know the entire probability function to use these.
Example: Lifetime of a component (T). Suppose we know:
P(A) = P(T 60) = .47 P(B) = P(40 T 80) = .34 P(A B) = P(40 T 60) = .26Then:
P(T 60) = ? P(lifetime no more than 80) = ?
Math 321 - Dr. Minnotte 52
Math 321 - Dr. Minnotte 155
Example: Suppose the probability that an integrated circuit chip has defective etching is 0.12. The probability that the chip has a crack defect is 0.29. And the probability of both defects is 0.07.
What is the probability the chip does not have defective etching?
What is the probability it has at least one defect?
What is the probability it has neither defect?
Math 321 - Dr. Minnotte 156
Equally Likely Outcomes If S consists of N equally likely outcomes,
and event A consists of k of them, P(A) = k/N.
Example: A fair die (see slides 148, 153). Example: Draw a card at random from a
standard deck (52 cards, 13 spades). What is the probability of drawing a spade?
Example: A shipment of 1000 hard drives contains 6 which do not work. If we draw one at random, what is the probability of selecting a defective drive?
Math 321 - Dr. Minnotte 157
Conditional Probability (3.2) Suppose we have partial information about
the outcome of an experiment. In particular, suppose we know that the event B has occurred.
We may use this information to revise the probability of another event, A.
We call the revised probability a conditional probability, as it depends on the condition of B being true.
Math 321 - Dr. Minnotte 53
Math 321 - Dr. Minnotte 158
Example: Fair die. Let A = {1, 3, 5} P(A) = 3/6 = 1/2 B = {1, 2, 3} P(B) = 3/6 = 1/2 P(A B) = P({1, 3}) = 2/6 = 1/3 If I roll the die and, without showing you, tell
you event B has occurred (I rolled no greater than 3), now what is the probability of event A?
Math 321 - Dr. Minnotte 159
Since B has occurred, the sample space reduces to B: {1, 2, 3}.
Two of the three possibilities are odd (in A), and the chances are still equal. So P(A|B) = 2/3.
Once we know the roll is 3 or less, the probability increases to 2/3 that its odd.
Math 321 - Dr. Minnotte 160
Definition: The conditional probability of A given B is
(undefined if P(B) = 0).
This is the probability, given that event Bhas occurred, that event A has also occurred.
Die:
Math 321 - Dr. Minnotte 54
Example (continued from slide 155): P(defective etching) = 0.12. P(crack defect) = 0.29. P(etching and crack defects) = 0.07.
If a chip has a crack defect, what is the (conditional) probability that it also has defective etching?
Math 321 - Dr. Minnotte 161
What is the probability that a chip has a crack defect but satisfactory etching?
If a chip has a crack defect, what is the probability that it has satisfactory etching?
Note: P(A|B) = 1 P(Ac|B) , just like P(A) =1 - P(Ac).
Math 321 - Dr. Minnotte 162
If a chip has defective etching, what is the probability that it also has a crack defect?
No relationship between P(A|B), P(B|A).
Math 321 - Dr. Minnotte 163
Math 321 - Dr. Minnotte 55
Math 321 - Dr. Minnotte 164
Independence Definition: If P(A B) = P(A) P(B), we say
A and B are independent.
If A and B are independent, P(A)>0, P(B)>0, then
Likewise, P(B|A) = P(B). Your book uses this as the definition of independence.
Math 321 - Dr. Minnotte 165
Assuming P(A)>0, P(B)>0, any one of P(A B) = P(A) P(B) P(A|B) = P(A) P(B|A) = P(B)proves independence and the other two.
Math 321 - Dr. Minnotte 166
Example: Draw one card at random from a well-shuffled deck. Define:
A = {draw a club} B = {draw an ace} C = {draw a red card}
Are A and B independent? A and C?
Math 321 - Dr. Minnotte 56
Note that events being mutually exclusive and their being independent is not the same thing.
Show: If P(A) > 0, P(B) > 0, and A and B are mutually exclusive, they cannot be independent!
Math 321 - Dr. Minnotte 167
Math 321 - Dr. Minnotte 168
Well often assume independence to calculate probabilities of intersections.
Example: Roll a red die and a black die. A = {red 6} P(A) = 1/6 (fair dice) B = {black 6} P(B) = 1/6
Results on one die shouldnt influence the other, so we assume independence.
P(double-sixes) = P(A B) = P(A) P(B)= (1/6)(1/6) = 1/36.
Math 321 - Dr. Minnotte 169
This extends to more than 2 events.
The multiplication law for independent events says that if events A1, A2, , Anare independent (that is, knowledge of any combination of the Ais does not change the probabilities of the remainder), then
P(A1 A2 An) = P(A1) P(A2) P(An). Note: this is the probability that all n
events occur.
Math 321 - Dr. Minnotte 57
Math 321 - Dr. Minnotte 170
Example: Flip a fair coin 4 times. Let Ai = {Flip i is a head}. P(Ai) = 1/2, i = 1, 2, 3, 4 Separate flips are independent. (Why?) P(4 heads) = P(A1 A2 A3 A4)
= P(A1) P(A2) P(A3) P(A4)= (1/2) (1/2) (1/2) (1/2)= 1/16.
Math 321 - Dr. Minnotte 171
Example: Draw a card from a standard deck 3 times with replacement (replace and reshuffle after each draw).
Let Ai = {Draw i is a spade}. P(Ai) = 13/52 = 1/4, i = 1, 2, 3 Separate draws are independent. (Why?) P(3 spades) = ?
Math 321 - Dr. Minnotte 172
What if events arent independent?
Recall,
Therefore, P(A B) = P(A|B) P(B).
The general multiplication law:
P(A1 and A2) = P(A1) P(A2|A1).
Math 321 - Dr. Minnotte 58
Math 321 - Dr. Minnotte 173
Example: Suppose we have 4 cards, labeled 1, 2, 3, and 4. Suppose we draw two at random without replacement. What is the probability both cards are odd?
Math 321 - Dr. Minnotte 174
Example: Suppose we draw two cards at random without replacement from a standard deck. What is the probability both cards are spades?
Math 321 - Dr. Minnotte 175
Random Variables (3.3)
Definition: A random variable is a random number. It is obtained by assigning a number to each outcome of an experiment.
Example: Roll a die. The number rolled is a random variable.
Math 321 - Dr. Minnotte 59
Math 321 - Dr. Minnotte 176
Example: Flip a coin 5 times. Is the sequence of heads and tails a random variable (Example: HHTHT)?
Some random variables we could generate from 5 coin flips:
X = # H Y = # H # T Z = # H before first T
We usually denote random variables by capital letters from the end of the alphabet.
Math 321 - Dr. Minnotte 177
Example: Select a rat at random from a large colony. What are some possible random variables?
Math 321 - Dr. Minnotte 178
There are two main types of random variables: discrete and continuous.
Definition: A discrete random variable can only take on a specified (countable) list of values. There is a gap between any two elements in its sample space.
In practice, these are usually counts of some sort, and thus whole numbers.
Example: Number of heads in 5 coin flips.
Math 321 - Dr. Minnotte 60
Math 321 - Dr. Minnotte 179
Definition: A continuous random variable may take any real number in some (set of) interval(s).
Examples: Weight, lifetime.
We will need to deal differently with discrete and continuous random variables.
Math 321 - Dr. Minnotte 180
Discrete Random Variables
Definition: The probability mass function (p.m.f.) of a discrete random variable X is a function p() from the support of X to the real numbers, where
p(x) = P(X = x) .
Notation: X: capital letter, indicates a random variable. x: lowercase letter, indicates a specific value.
Math 321 - Dr. Minnotte 181
Example: Let X be the roll of a fair die. S = {1, 2, 3, 4, 5, 6} p(1) = P(X = 1) = 1/6 p(2) = P(X = 2) = 1/6 and so on.
We might write
p(x) = 1/6 x {1, 2, 3, 4, 5, 6}
Math 321 - Dr. Minnotte 61
Math 321 - Dr. Minnotte 182
Example: An industrial plant has 3 machines. The probability that X are operating at a given random time may be found from
x 0 1 2 3p(x) 0.12 0.27 0.46 0.15
Math 321 - Dr. Minnotte 183
The laws of probability tell us that:
1) ? p(x) ? for all p(x)
2) x S p(x) = ?
Math 321 - Dr. Minnotte 184
A p.m.f. is plotted as spikes:
Math 321 - Dr. Minnotte 62
Or as a probability histogram, with areas equal to probabilities:
Math 321 - Dr. Minnotte 185
Math 321 - Dr. Minnotte 186
Math 321 - Dr. Minnotte 187
Math 321 - Dr. Minnotte 63
Math 321 - Dr. Minnotte 188
Continuous Random Variables
Recall, a continuous random variable may take any value in some real interval.
Continuous random variables are typically measurements (length, weight, lifetime, etc.).
Math 321 - Dr. Minnotte 189
With continuous random variables, we cant use a p.m.f. to find probabilities. Instead:
Definition: A probability density function (density, p.d.f.), f(x), is a function which determines the probability properties of a continuous random variable. If X f(x), then
Math 321 - Dr. Minnotte 190
If f(x) is a p.d.f.:
f(x) ? for all x, and
Note: for a continuous random variable,
Why?
Math 321 - Dr. Minnotte 64
Math 321 - Dr. Minnotte 191
Example: a continuous random variable has p.d.f.
Is f(x) a true p.d.f.?
Math 321 - Dr. Minnotte 192
Example (continued): What is the probability that X will be between 0.5 and 1.0?
P(2.5 X 3.0) = ?
P(0.2 X 0.2) = ?
P(X < 1.0) = ?
Math 321 - Dr. Minnotte 193
Math 321 - Dr. Minnotte 65
Math 321 - Dr. Minnotte 194
Definition: The cumulative distribution function (c.d.f.), F(x), of a random variable is defined as
F(x) = P(X x). If X is continuous,
Math 321 - Dr. Minnotte 195
Properties of continuous c.d.f.s:1) limx-F(x) = 0
2) limxF(x) = 1
3) F is nondecreasing (if x < y, F(x) F(y) ).4) P(a X b) = P(X b) P(X a)
= F(b) F(a).
This is often easier than integrating f(x).
Math 321 - Dr. Minnotte 196
Example (back to earlier p.d.f.):
P(0.5 X 1.0) = ?(Compare to slide 192.)
Math 321 - Dr. Minnotte 66
Math 321 - Dr. Minnotte 197
The Population Mean Definition: The population mean (expectation,
expected value) of random variable X is
if X is discrete, and
if X is continuous. It can be thought of as the long-term average
of X, or the mean of a sample that follows the distribution of X perfectly.
Math 321 - Dr. Minnotte 198
Example: Die roll p(x) = 1/6 x{1, 2, , 6} = ?
Example: Machines
= ?
x 0 1 2 3p(x) 0.12 0.27 0.46 0.15
Math 321 - Dr. Minnotte 199
Example:
= ?
Example:
= ?
Math 321 - Dr. Minnotte 67
Math 321 - Dr. Minnotte 200
Expectations of Functions of Random Variables Given a random variable, X, suppose we
are really interested in a function, h(X).
The expected value of h(X) is
if X is discrete, and
if X is continuous.
Example: X ~ p(x) = , x = 1, 2. What is E(X2)?
Note: In general, E[h(X)] h[E(X)]. Example: For the above p.m.f., what is
E(X)? [E(X)]2?
Is E(X2) = [E(X)]2?
Math 321 - Dr. Minnotte 201
Math 321 - Dr. Minnotte 202
The Population Variance and Standard Deviation Just as we have a population mean to
measure of the center of a distribution, the population variance and standard deviation measure a distributions spread.
Math 321 - Dr. Minnotte 68
Math 321 - Dr. Minnotte 203
Definition: Let X be a random variable with mean . Then the population variance of X, 2, is
Definition: The population standard deviation, , of random variable X is the square root of the variance of X.
Math 321 - Dr. Minnotte 204
Example: Die roll p(x) = 1/6 x{1, 2, , 6} = ? E(X2) = ? V(X) = ? = ?
Example: p(x) = 1/2 x{3, 4} = ? E(X2) = ? V(X) = ? = ?
Math 321 - Dr. Minnotte 205
Example: Machines
= ? E(X2) = ?
V(X) = ?
= ?
x 0 1 2 3p(x) 0.12 0.27 0.46 0.15
Math 321 - Dr. Minnotte 69
Math 321 - Dr. Minnotte 206
Example:
= ? E(X2) = ?
V(X) = ?
= ?
Math 321 - Dr. Minnotte 207
Linear Functions of Random Variables (3.4)
Recall, a linear function (or linear combination) of variables x1, x2, , xn, is a function of the form
f(x1,x2,,xn) = a1x1 + a2x2 + +anxn + bwhere b and all of the ais are fixed constants.
Math 321 - Dr. Minnotte 208
Given any random variables X1, X2, , Xnand known constants a1, a2, , an, and b, then
E(a1X1 + a2X2 + + anXn + b) = a1E(X1) + a2E(X2) + + anE(Xn) + b .
To find the expectation of a linear combination of random variables, we need only know the constants and the expectation of each random variable individually.
Math 321 - Dr. Minnotte 70
Math 321 - Dr. Minnotte 209
Example: Let X be a random temperature measured in degrees Celsius, with E(X) = 10. Let Y be the same temperature in degrees Fahrenheit, Y = 9/5 X + 32. What is E(Y)?
Example: The expectation of the roll of a fair die is 3.5. What is the expectation of the sum of four such rolls?
Independent Random Variables Recall, events are said to be independent
if knowledge of one does not affect the probability of the other.
Likewise, random variables X and Y are independent if knowing the value of Xdoes not affect probabilities of Y, no matter what value X takes (and vice-versa).
Math 321 - Dr. Minnotte 210
Math 321 - Dr. Minnotte 211
If X and Y are independent, any event involving X alone will be independent from any event involving Y alone.
P(X A and Y B) = P(X A)P(Y B) for any A and B.
Draws with replacement are independent. Draws in a simple random sample are not
independent, but may be treated as though they are if the sample size is much smaller than the population size.
Math 321 - Dr. Minnotte 71
Math 321 - Dr. Minnotte 212
If the random variables are independent, then
V(a1X1 + a2X2 + + anXn + b) = a12V(X1) + a22V(X2) + + an2V(Xn) .
Notes: The shift b does not affect the variance. The coefficients ai are squared. Dependent random variables require a more
complex formula.
Math 321 - Dr. Minnotte 213
Example: Let the variance of the Celsius temperature X be V(X) = 25.
What is the standard deviation of X?
What is the variance of Y = 9/5 X + 32?
What is the standard deviation of Y?
Math 321 - Dr. Minnotte 214
Example: The variance of the roll of a fair die is 35/12. What is the variance of the sum of four such rolls?
If we take a single roll and multiply it by 4, what is the variance of the result? Why is this different?
Math 321 - Dr. Minnotte 72
Math 321 - Dr. Minnotte 215
Suppose X and Y each have mean 10 and variance 4. What are the mean and variance of Z = X Y?
Math 321 - Dr. Minnotte 216
Mean and Variance of the Sample Mean
An important special case concerns the sample mean of the Xis,
Note that is a linear combination of the Xis.
Math 321 - Dr. Minnotte 217
Theorem: If X1, X2, Xn are independent random variables, each with E(Xi) = and V(Xi) = 2, then
and
Proof:
Math 321 - Dr. Minnotte 73
Math 321 - Dr. Minnotte 218
Example: A (possibly biased) coin has probability p of coming up heads. We flip it and let X = 1 if heads, 0 if tails.
What are E(X) and V(X)?
Suppose we flip it n times, and look at
Chapter 4: Common Distributions Often we will have useful mathematical forms
which represent entire families of distributions.
These distributions include one or more constants (called parameters) which must be specified to define a specific distribution.
We will concentrate on two especially important families, the binomial and normal distributions.
Math 321 - Dr. Minnotte 219
The Binomial Distribution (4.1) The binomial distribution is the most
important common named family of discrete distributions.
Recall, a discrete distribution is described by a probability mass function p(), where
p(0) = P(X = 0) p(1) = P(X = 1) and so on.
Math 321 - Dr. Minnotte 220
Math 321 - Dr. Minnotte 74
Suppose our experiment consists of trials with only two possible outcomes.
One outcome called a success occurs with probability p.
The other outcome is called a failure, and occurs with probability (1 p).
Such a process is called a Bernoulli trial(after 17th-century probabilist James Bernoulli).
The binomial distribution looks at a fixed number of independent identical Bernoulli trials, and counts the number of successes.
Math 321 - Dr. Minnotte 221
Example: Suppose silicon computer chips are made in pairs, and that 30% of all chips produced are defective.
Also assume that the chips in a pair are independent of each other.
Out of pairs in which the first chip is good, the second is defective in 30% of pairs. This remains true for pairs in which the first chip is defective.
Math 321 - Dr. Minnotte 222
Out of all pairs, 70% will have a good first chip. Out of those, 70% will also have a good second chip. Overall, 70% of 70%, or 49% (.7*.7 = .49) will have two good chips.
Likewise, 30% of that 70%, or 21% overall (.7*.3 = .21) will have a good first chip and a defective second chip.
By the same reasoning, 30% will have a defective first chip, and 70% of those (21% overall) will have a good second chip.
Finally, 30% of 30%, or 9% will have both chips defective.
Math 321 - Dr. Minnotte 223
Math 321 - Dr. Minnotte 75
If we let the letter S (for success) represent a good chip, and F (for failure) represent a defective one, we can summarize as:
P(SS) = .7*.7 = .49 P(SF) = .7*.3 = .21 P(FS) = .3*.7 = .21 P(FF) = .3*.3 = .09
Math 321 - Dr. Minnotte 224
Now let X be the number of good chips produced in a pair.
Then X can take the values 0, 1, or 2.
From the above, p(0) = P(X = 0) = P(FF) = .09 p(2) = P(X = 2) = P(SS) = .49 p(1) = P(X = 1) = P(SF or FS) = .21 + .21
= .42
Math 321 - Dr. Minnotte 225
What if the chips are produced in sets of 4?
If we want the probability of a set consisting of 2 good and 2 defective chips, we can think about the case of SSFF the first and second chips are good, while the third and fourth are defective.
The probability of this particular outcome will be .7*.7*.3*.3 = .0441 or 4.41%.
Math 321 - Dr. Minnotte 226
Math 321 - Dr. Minnotte 76
But there are other ways we can have two successes and two failures 5 other ways, in this case:
P(SSFF) = .7*.7*.3*.3 = .0441 P(SFSF) = .7*.3*.7*.3 = .0441 P(SFFS) = .7*.3*.3*.7 = .0441 P(FSSF) = .3*.7*.7*.3 = .0441 P(FSFS) = .3*.7*.3*.7 = .0441 P(FFSS) = .3*.3*.7*.7 = .0441
Overall, p(2) = P(X = 2) = 6*.0441 =.2646.
Math 321 - Dr. Minnotte 227
Math 321 - Dr. Minnotte 228
In general, suppose we have an experiment consisting of n independent Bernoulli trials.
Those trials which satisfy the condition we wish to count are called successes, and occur with probability p.
The remaining trials are called failures; these occur with probability (1 p).
Let X be the number of successes in the full experiment.
Math 321 - Dr. Minnotte 229
If these conditions are true, we say that X, the number of successes in the experiment, has a binomial distribution with parameters n and p.
X Binomial(n, p) or X Bin(n, p) . The mass function for X is:
Math 321 - Dr. Minnotte 77
Math 321 - Dr. Minnotte 230
Note: the exclamation mark is pronounced factorial.
Given n items, n! is the number of arrangements, and is found as
n! n (n-1) (n-2) 2 1.
Since there is one (empty) way to arrange 0 objects, we define 0! = 1.
Example: The chips (30% defective) are produced in batches of 4. Let X be the number of good chips in a batch.
What distribution does X follow?
What is p(2)?
What is the probability that a random batch will contain no more than one good chip?
Math 321 - Dr. Minnotte 231
Example: In a genetics study, a second-generation cross of pure green peas with pure yellow peas leads to pods where p = P(yellow) = .
If pods contain 8 seeds, what is the probability that a random pod will contain 6 yellow seeds?
What is the probability that a random pod will contain at least 6 yellow seeds?
Math 321 - Dr. Minnotte 232
Math 321 - Dr. Minnotte 78
Table A.1 in your book can save calculations by providing probabilities of P(X x) for n 20 and certain values of p.
Example: Draw 16 times with replacement from a standard deck, and let X = number of spades drawn.
Find P(X > 6).
Math 321 - Dr. Minnotte 233
With standard distributions, the mean and variance may generally be found as a function of the parameters.
If X Binomial(n, p), then = np. Example: If 75% of all seeds are yellow, and
each pod contains 8 seeds, what is the mean number of yellow seeds per pod?
Example: If we have 4 fair coins which we flip as a batch, what is the mean number of heads?
Math 321 - Dr. Minnotte 234
Additionally, if X Bin(n, p), then 2 = np(1 p).
Example: X = # yellow seeds ~ Bin(8, .75). What are the variance and standard deviation of X?
Example: X = # heads in 4 flips ~ Bin(4, .5). What are the variance and standard deviation of X?
Math 321 - Dr. Minnotte 235
Math 321 - Dr. Minnotte 79
Recall, draws without replacement (simple random samples) are not independent.
However, we may do calculations as though they are independent (including binomial calculations) as long as the sample size is small (less than 5%) compared to the population size.
Math 321 - Dr. Minnotte 236
Example: A lot of several thousand components contains 7% defective. We sample 8 at random.
What is the probability of no defective components in our sample?
What is the probability of at least one defective?
What is the expected number of defectives in our sample?
Math 321 - Dr. Minnotte 237
Math 321 - Dr. Minnotte 238
The Normal Distribution (4.3) The continuous normal (or Gaussian)
distribution has two parameters, and 2. If X ~ N(, 2),
This distribution is often seen in practice, and is also very important theoretically.
Math 321 - Dr. Minnotte 80
Math 321 - Dr. Minnotte 239
The normal p.d.f. is a bell-shaped curve, symmetric around, and with its peak at, . E(X) = .
Its width is determined by 2; large values of 2imply a wide, low curve, while small values imply a narrow, tall one. V(X) = 2.
Math 321 - Dr. Minnotte 240
An important special case is the standard normal distribution, with = 0 and 2 = 1.
We usually identify standard normal variables with the letter Z.
If Z is standard normal, Z~N(0,1) and the density of Z is
Math 321 - Dr. Minnotte 241
There is no closed-form integral for the normal probability density function, so we cant find probabilities that way.
To find normal probabilities, we must use computer programs (which themselves use numeric integration), or tables such as Table A.2 (p. 521-522, and inside the front cover of your book) of the standard normal distribution.
Math 321 - Dr. Minnotte 81
Math 321 - Dr. Minnotte 242
Math 321 - Dr. Minnotte 243
Examples: P(Z 1.00) = ?
P(Z > 1.00) = ?
P(-2.00 Z 0.75) = ?
Math 321 - Dr. Minnotte 244
For X ~ N(, 2), we find proportions by converting to standard units.
If X ~ N(, 2), then Z = (X - )/ ~ N(0,1). Remember to convert both sides of any
inequality the same way.
Math 321 - Dr. Minnotte 82
Math 321 - Dr. Minnotte 245
Examples: Let X ~ N(3, 4). P(X 6.00) = ?
P(X > 4.00) = ?
Math 321 - Dr. Minnotte 246
Normal Percentiles Just as for samples, the pth percentile of a
distribution has p% of the probability below it, and (100 p)% above.
We find percentiles for the normal distribution using Table A.2 again, but reading from the inside out.
Since probabilities are in the middle of the table, start there.
Read to the outside to find the percentile.
Math 321 - Dr. Minnotte 247
Example: Z ~ N(0, 1). What is the 70thpercentile of Z?
Example: What is the 25th percentile of Z?
Math 321 - Dr. Minnotte 83
Math 321 - Dr. Minnotte 248
For non-standard normal variables, first find the desired percentile for the standard normal, then use the fact that since
Z = (X - )/, therefore X = + Z. Example: X ~ N(10, 25). What is the 95th
percentile of X?
Besides the binomial and normal distributions, there are a number of other named families of distributions with useful properties.
For example, the Poisson distribution (Section 4.2) is useful for modeling random counts in a fixed interval of time or space.
See Sections 4.4-4.6 for discussion of the lognormal, exponential, gamma, and Weibulldistributions, which are useful for modeling continuous histograms which are positively skewed and unimodal.
Math 321 - Dr. Minnotte 249
Sampling Distributions (4.8) Suppose random variable X is drawn from
some distribution f. (X ~ f )
Now suppose we generate n of these random variables, X1, Xn, independently from f.
We say that X1, Xn make a random sample from f.
Sometimes we say that X1, Xn are i.i.d. (independent and identically distributed) from f.
Math 321 - Dr. Minnotte 250
Math 321 - Dr. Minnotte 84
Since the Xs make a sample, we can compute sample statistics such as the mean,
Recall (3.4), since the Xs are random, so is and since it is a number, is itself a random variable with a distribution.
This distribution is referred to as the sampling distribution of and plays a large role in inferential statistics.
Math 321 - Dr. Minnotte 251
Example: Let pX(x) = 1/3, x = 1, 2, 3, and let X1 and X2 be independent draws from pX(x).
Now let = (X1 + X2)/2 be the average of X1 and X2.
Note that is also a discrete random variable, and therefore has a probability mass function.
What is the mass function (sampling distribution) of ?
Math 321 - Dr. Minnotte 252
Example: Suppose X ~ N(50, 4). A histogram of 1000 Xs looks like this:
Math 321 - Dr. Minnotte 253
Math 321 - Dr. Minnotte 85
Sample 25 Xs and compute
If we repeat this process 1000 times, we get a histogram such as this:
Math 321 - Dr. Minnotte 254
Note that has a distribution that: Is centered on 50 (); Is narrower than the solid normal curve for the
individual Xs the variance and standard deviation of are smaller than those of X.
Remains bell-shaped and (roughly?) normal.
Understanding the distributions of sample statistics and their relationships to the associated population parameters is the basis of most of inferential statistics.
Math 321 - Dr. Minnotte 255
In general, if a sample statistic is used to estimate a population parameter:
The sampling distribution of the statistic is centered on (or at least near) the parameter.
The spread of the sampling distribution will decrease as the sample size gets larger.
As the sample size gets larger, the shape of the sampling distribution will usually get more and more bell-shaped (normal).
Math 321 - Dr. Minnotte 256
Math 321 - Dr. Minnotte 86
Let be the sample mean of a random sample X1, X2, Xn, from a population or process with mean and standard deviation . Then (recall, Section 3.4):
The mean of the sampling distribution of , , is , the population mean, regardless of
sample size n. The standard deviation of the sampling
distribution of , , is , the population standard deviation divided by the square root of the sample size.
Math 321 - Dr. Minnotte 257
Sampling Distributions of the Mean
The standard deviation of the sample mean, , is often called the standard error of the sample mean.
This emphasizes that it describes a sampling distribution, not a population.
Math 321 - Dr. Minnotte 258
As the sample size gets larger, we have more information and can make better estimates, so the standard error decreases.
(Note, however, that the square root means we have diminishing returns; each new observation provides less new information than the previous one.)
The larger the sample, the closer is likely to be to .
Math 321 - Dr. Minnotte 259
Math 321 - Dr. Minnotte 87
Math 321 - Dr. Minnotte 260
If our original population has a normal distribution, the sampling distribution of is also normal, regardless of sample size.
Example: An automated filling machine fills soft drink cans with a volume that has a normal distribution with = 0.05 ounces.
If we sample 4 cans and take the sample mean, what is the probability that will be within 0.04 ounces of the population mean ?
Math 321 - Dr. Minnotte 261
The Central Limit Theorem The Central Limit Theorem is the most
important theorem in statistics.
It shows the importance of the normal distribution, and provides the justification of many of the most fundamental statistical methods.
Math 321 - Dr. Minnotte 262
Math 321 - Dr. Minnotte 88
Math 321 - Dr. Minnotte 263
If we know that a population or process has a normal distribution, we know that the sampling distribution of will also be normal. This allows us to compute useful probabilities.
Unfortunately, we often do not know the population distribution (or perhaps we know that it is not normal).
Fortunately, this is not always required.
Math 321 - Dr. Minnotte 264
The sample mean (or sum) of a large number of independent random variables has a sampling distribution which is approximately normal, no matter what distribution the original random variables come from.
This important result is the Central Limit Theorem.
Math 321 - Dr. Minnotte 265
Theorem (Central Limit Theorem): If X1, X2, Xn are independent random variables, from a population or process with mean and standard deviation , then as long as n is sufficiently large,
We can use this to find probabilities for sums or averages, without knowing the distribution of the Xis!
Math 321 - Dr. Minnotte 89
Math 321 - Dr. Minnotte 266
Math 321 - Dr. Minnotte 267
Example: The (population) mean time required for maintenance on an air-conditioning unit is 1 hour, and the standard deviation is also 1 hour. A company operates 50 such units.
Could we find the probability that the maintenance on a single unit requires more than 2 hours from the information given?
Math 321 - Dr. Minnotte 268
What is the probability that the average time for maintenance will be more than 75 minutes?
What is the probability that the total time for maintenance will be less than 40 hours?
Math 321 - Dr. Minnotte 90
How large is large? As a general rule, n 30 is usually large
enough that the Central Limit Theorem is reasonable.
Symmetric populations can get by with much less, often as few as 10, or even fewer.
Highly skewed populations require more. 50 or more should be fairly safe in all but the worst cases.
Math 321 - Dr. Minnotte 269
The Normal Approximation to the Binomial Distribution Recall, if X ~ B(n, p), then E(X) = np and
V(X) = np(1-p). If the particular values of n and p lead to a
binomial distribution which is not very skewed, the distribution can be a good approximation to the B(n,p) distribution.
We usually require that np 10 and n(1-p) 10 .
Math 321 - Dr. Minnotte 270
Example: Roll a die 120 times and count the number of 6s rolled (X).
What distribution does X follow?
What are E(X) and V(X)?
What is P(X 25)?
Math 321 - Dr. Minnotte 271
Math 321 - Dr. Minnotte 91
The true binomial probability is 0.136.
Were pretty close, but we can do better.
Binomial probabilities are located entirely on the integers, but normal probabilities are smeared out over the whole real line (remember the probability histogram).
Well get a better approximation if we use a continuity correction, by taking the normal probability from (x - .5) to (x + .5) to approximate the binomial P(X = x).
Math 321 - Dr. Minnotte 272
Math 321 - Dr. Minnotte 273
So, for X ~ B(120, 1/6),
P(X 25) = P(X 24.5) =
Example: If X~Bin(120, 1/6), use the normal approximation to estimate P(15 < X < 25).
Math 321 - Dr. Minnotte 274
Math 321 - Dr. Minnotte 92
Math 321 - Dr. Minnotte 275
Chapter 5: Statistical Estimation
The remainder of the course will focus on inferential statistics.
Recall, in probability, we generally know the distribution in question and wish to calculate something about particular outcomes or events.
In inferential statistics, we have a sample, and wish to use that information to say something about the population or distribution the sample was drawn from.
Math 321 - Dr. Minnotte 276
Population Sample
Probability
Inferential Statistics
Math 321 - Dr. Minnotte 277
Recall: A parameter is an unknown quantity related to a population or distribution.
A statistic is a known quantity which can be calculated from a dataset.
Estimation uses a statistic (what we know) to tell us something about an unknown parameter (what we wish we knew).
Math 321 - Dr. Minnotte 93
Math 321 - Dr. Minnotte 278
Definition: A point estimate of a parameter , is a statistic, , which represents a best guess for .
Example: We have an unknown distribution, X ~ f(x), and we wish to know the unknown parameter = E(X). We take a sample X1, X2, Xn, and estimate with the known statistic .
Point Estimation (5.1)
Math 321 - Dr. Minnotte 279
Other common point estimates:
Estimate V(X) = 2 with . If X ~ Binomial(n, p) (n known, p
unknown), estimate p with .
All of our standard sample statistics (median, quartiles, etc.) are good estimates of the corresponding population or distribution parameters.
Properties of Estimates There are a few properties that we like to
see in a parameter estimate.
On average (over many samples), an estimate should give the correct value for the parameter. If the mean of the sampling distribution of our estimate is the parameter we are estimating, that is,
we say that is an unbiasedestimate of .
Math 321 - Dr. Minnotte 280
Math 321 - Dr. Minnotte 94
Example: We know that so is an unbiased estimate of .
Also, and (proof:)
so the sample variance and proportion are unbiased estimates of the population variance and proportion.
This is why we divide by (n 1) instead of n to find s2.
Math 321 - Dr. Minnotte 281
On the other hand, the sample standard deviation, s, has so s is a biasedestimate for .
Fortunately, the bias (defined asor more generally, ) is small, especially as n gets large.
Math 321 - Dr. Minnotte 282
Note that just because an estimate is unbiased, does not guarantee that it will give you the exact parameter on this (or possibly, any) sample.
Example: X ~ Binomial(n = 25, p = 0.3). Even though is unbiased for p, there is no value of X that will give
Remember our sampling distributions; an unbiased estimates distribution will be centered correctly, but it will still have some spread.
Math 321 - Dr. Minnotte 283
Math 321 - Dr. Minnotte 95
The variance of the sampling distribution of our estimate measures that spread and is also important in measuring how well it performs.
Math 321 - Dr. Minnotte 284
We combine these two aspects into a single measure, the mean squared error:
A small MSE means that both bias and variance are small.
Math 321 - Dr. Minnotte 285
Example: Suppose X1 and X2 are independent, with E(X1) = E(X2) = and V(X1) = V(X2) = 2.
Let
Find:
Math 321 - Dr. Minnotte 286
Math 321 - Dr. Minnotte 96
Example (continued): Let
Find:
For what values of and 2 is
Math 321 - Dr. Minnotte 287
Math 321 - Dr. Minnotte 288
Confidence Intervals (5.2) Having a good estimate is a good first step
in learning about a population parameter.
We should also be interested in how close our estimate is likely to be to the parameter.
One approach is to calculate the standard error, remembering that we will usually be within 2-3 standard errors of the parameter (if we use an unbiased estimate).
Math 321 - Dr. Minnotte 289
Another way to look at this issue is that we know our estimate is incorrect. (We just dont know by exactly how much.)
We can improve this situation by expanding our point estimate to an interval estimate, providing a range of plausible values for .
Done carefully, we can identify how likely it is that our interval includes .
Math 321 - Dr. Minnotte 97
Math 321 - Dr. Minnotte 290
If our sample size, n, is large, we can use the Central Limit Theorem to give us the following.
Math 321 - Dr. Minnotte 291
Therefore, the interval
is a random interval which covers the population mean with probability 0.95.
We call such an interval a 95% confidence interval.
This represents a set of plausible values of that are consistent with the data.
Example: A random sample of 80 auto body shops for cost to repair a particular kind of damage have mean $472.36 and standard deviation $62.35.
What is the 95% confidence interval for the mean of this population?
Math 321 - Dr. Minnotte 292
Math 321 - Dr. Minnotte 98
Math 321 - Dr. Minnotte 293
Is it correct to say P(458.70 486.02) = 0.95 ?
No! Nothing inside the probability statement is random. Recall:
The random parts are the sample statistics.
The interval is random, not the population parameter, .
Math 321 - Dr. Minnotte 294
If we constructed many 95% confidence intervals from independent datasets, wed get many different sample means and sample standard deviations, and each would lead to a different confidence interval.
In the long run, about 95% of these different confidence intervals would contain the true parameter .
Remember, randomness is in the sample and the interval, not in the parameter!
Math 321 - Dr. Minnotte 295
Math 321 - Dr. Minnotte 99
Math 321 - Dr. Minnotte 296
We call the value 95% the confidence level. We say we are 95% confident that the population mean lies within the computed interval.
We can select other confidence levels if desired, by replacing the critical value 1.96 with the Z-percentile that gives the appropriate center probability.
A confidence level of 95% (1.96) is most common, but levels of 90% (1.645) and 99% (2.575) are also often used.
In general, define zp to be the value, above which there is probability p in the tail of the standard normal distribution.
Then zp will be the 100(1-p)th percentile of the standard normal distribution.
For a 100(1-)% confidence interval, we use the critical value z/2.
Example: What critical value would we use for an 80% confidence interval?
Math 321 - Dr. Minnotte 297
Math 321 - Dr. Minnotte 298
Math 321 - Dr. Minnotte 100
Math 321 - Dr. Minnotte 299
Math 321 - Dr. Minnotte 300
What factors affect the length (precision) of the confidence interval?
s If s is bigger, is less accurate, and the interval must be wider.
Confidence level To be more confident of including the true value, we must make the interval wider.
n as n gets bigger, the standard error of gets smaller, and the interval gets narrower.
Math 321 - Dr. Minnotte 301
If we require a 95% confidence interval of error width (interval half-width) no more than w, we can compute a (rough) minimum sample size if we have an estimate or upper bound for s.
Of course, we can substitute the appropriate Zcritical value to find sample sizes for other confidence levels.
Math 321 - Dr. Minnotte 101
Math 321 - Dr. Minnotte 302
Example: Milk fill weights. n = 50, = 2.0727, s = 0.0711 Find a 95% confidence interval for .
w = ?
If we require w 0.01, how big should n be?
Confidence Bounds Sometimes, we only wish to know a lower
(or upper) bound on . We can generate one-sided confidence
intervals, also called confidence bounds, in a similar way to the usual two-sided case.
Math 321 - Dr. Minnotte 303
If we have a large sample, then: A 95% lower confidence bound for is
A 95% upper confidence bound for is
To get 90%, 99%, or 100(1-)% bounds, replace 1.645 with 1.28, 2.33, or z,respectively.
Math 321 - Dr. Minnotte 304
Math 321 - Dr. Minnotte 102
Example: A sample of 48 Shear strength measurements give a mean of 17.17 N/mm2 and a standard deviation of 3.28 N/mm2.
If we only care that the population mean shear strength is great enough, find a 90% lower bound on .
Math 321 - Dr. Minnotte 305
Math 321 - Dr. Minnotte 306
For our normal-based confidence interval and level to be valid, we must know (or at least assume) that:
The sample is a random draw from the population.
The sample size n is large enough that the sample mean is approximately normally distributed and that s is a good estimate of .
Math 321 - Dr. Minnotte 307
Chapter 6: Hypothesis Testing Estimation (both point and interval) is
useful for providing an idea of the value of a population parameter.
Frequently, we may wish to investigate a more specific question about a parameter. For this purpose, we use the other major branch of inferential statistics, hypothesis testing.
Math 321 - Dr. Minnotte 103
One-Sample Z-Tests (6.1-6.2) Example: (Milk data) Suppose our bottle-
filling machine is supposed to dispense 2.04 L of milk. Recall, a sample of size 50 gave = 2.0727, s = 0.0711. Does the machine need to be recalibrated?
To answer this, lets assume that the machine is working properly, and see how likely we are to get a sample mean as far or further from the expected value as the sample mean we actually saw (2.0727).
Math 321 - Dr. Minnotte 308
Math 321 - Dr. Minnotte 309
More formally, we choose a null hypothesis, H0.
This is a statement about a population parameter (say, ), generally that it is equal to the value of interest (denoted 0).
Usually, the null hypothesis means everything is as it should be, or nothing interesting is happening.
Here: H0: = 2.04 (= 0)
Math 321 - Dr. Minnotte 310
We also choose an alternative hypothesis, H1, that the null is incorrect.
H1: 2.04
The alternative is literally simply that the null is incorrect, but this is often the more interesting or important result.
Math 321 - Dr. Minnotte 104
Math 321 - Dr. Minnotte 311
Next, we compute a test statistic, under the assumption that H0 is correct.
For large-sample tests on the population mean, , we usually use the z-statistic:
Here: z = ? If H0 is true, and z ~ N(0, 1). Is z a typical value from a N(0, 1)
distribution?
Math 321 - Dr. Minnotte 312
Formally, we find a P-value, the probability that a sample from the null distribution would give a test statistic as or more unusual as the one we just saw.
Since H1: 2.04, we use a two-sidedP-value: P = P(|z| 3.25) (z ~ N(0,1)).
From our table, if z ~ N(0,1),P (|z| 3.25) = .0012.
Math 321 - Dr. Minnotte 313
So we have two possibilities:1) H0 is correct, = 2.04, and we got very
unlucky to happen to get the (roughly) 1 in 800 chance to get 2.0727 (or the equally unusual 2.0073), or
2) H0 is wrong.
Which seems more reasonable to believe? Since P is so small, we reject H0 and
decide the filling machine does require recalibration.
Math 321 - Dr. Minnotte 105
Math 321 - Dr. Minnotte 314
All hypothesis tests follow this general pattern:
1) We observe some difference in a sample and wish to decide if it reflects a true difference in the population.
2) Identify the null and alternative hypotheses.3) Compute a test statistic which has a known
distribution when the null hypothesis is true.4) Find a P-value: the probability of a statistic as
or more unusual than the one we observed, when the null hypothesis is true.
5) If P is small, reject the null hypothesis. Otherwise, fail to reject it.
Math 321 - Dr. Minnotte 315
This basic pattern holds for many different tests on different parameters with different assumptions.
For questions about the population mean for a single population, we often use the one sample z-test demonstrated above.
Math 321 - Dr. Minnotte 316
Details on the one-sample z-test:1) We have a single population, and a
specific value, 0, we wish to consider for the population mean.
This may be a known population mean for some related population (see next example).
Or it may be a desired population mean (example: milk data).
A sample from the population will give a sample mean different from 0, even if that is the actual population mean.
Math 321 - Dr. Minnotte 106
Math 321 - Dr. Minnotte 317
2) Identify H0 and H1. H1 is a statement that something interesting is
going on. It is usually what we wish to prove. We should decide if we care about a one-
sided or two-sided alternative, ideally before we ever see data.
Two-sided: H0: = 0 vs. H1: 0. One-sided: H0: 0 vs. H1: > 0
or: H0: 0 vs. H1: < 0 We always compute z and P using 0, so = 0 is always part of H0.
Math 321 - Dr. Minnotte 318
Example: Example: A newspaper article says that college freshmen average 7.5 hours per week at parties.
We suspect the number is lower at our college.
H0 = ?
H1 = ?
Math