Report Engineering Probability and Statistics Ahmedawad

Preview:

DESCRIPTION

ain shams universityfaculty of engineeringpostgraduate degreeProbability and statistics contact:https://www.facebook.com/ahmedawad1991eng.ahmed.1991@hotmail.com

Citation preview

1. Statistics, Data, and Statistical Thinking

1.1 The Science of Statistics

Definition 1.1

Statistics is the science of data. This involves collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information.

1.2 Types of Statistical Applications

Definition 1.2

Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set, and to present that information in a convenient form.

Definition 1.3

Inferential statistics utilizes sample data to make estimates, decisions, predictions or other generalizations about a larger set of data.

1.3 Fundamental Elements of Statistics

Definition 1.4

A population is a set of units (usually people, objects, transactions, or events) that we are interested in studying.

Definition 1.5

A variable is a characteristic or property of an individual population unit.

For example, we may be interested in the variables age, gender, and / or the number of years of education of the people currently unemployed in the United States.

The name “variable” is derived from the fact that any particular characteristic may vary among the units in a population.

Definition 1.6

A sample is a characteristic or property of an individual population unit.

Definition 1.7

A statistical inference is an estimate, prediction, or some other generalization about a population based on information contained in a sample.

Following examples are for checking “Popluation, variable, sample, and Inference ”

We also need to know its reliability – that is, how good the inference is?

Thus, we introduce an element of uncertainty into our inferences.

Reliability is the fifth element of inferential statistical problems.

Definition 1.8

A measure of reliability is a statement (usually quantified) about the degree of uncertainty associated with a statistical inference.

Four Elements of Descriptive Statistical problems

1. The population or sample of interest.

2. One or more variables that are to be investigated.

3. Tables, graphs, or numerical summary tools.

4. Identification of patterns in the data

Five Elements of Inferential Statitical Problems

1.The population of interest.2. One or more variables that are to be

investigated.3.The sample of population units.4.The inference about the population based

on information contained in the sample.5.A measure of reliability for the inference.

1.4 Types of Data

Definition 1.9

Quantitative data are measurements that are recorded on a naturally occuring numerical scale.

Examples for Quantitative data,

the temparature,or the current unemployment rate for each of the 50 states,or the scores of a sample of 150 law school applicants on the LSAT, or the number of convicted murders who receive the death penalty each year over a 10-year.

Definition 1.10

Qualitative data are measurements that can not be measured on a natural numerical scale; they can only be classified into one of a group of categories.

Examples for Qualitative data,

The political party affilation (Democratic, Republican or Independent) in a sample of 50 voters

A taste-tester’s ranking (best, worst, etc) of four brands of barbecue sauce for a panel of 10 testers.

1.5 Collecting Data

Obtain data in four different ways:1. Data from a published source (book, journal,

newspaper).2. Data from a designed experiment.3. Data from a survey.4. Data from an observational study.

Definition 1.11

A representative sample exhibits characteristics typical of those possessed by the target population.

A random sample ensures that every subset of fixed size in the population has the same chance of being included in the sample.

1.6 The Role of Statistics in Critical Thinking

Definition 1.12

Statistical thinking involves applying rational thought to assess data and the inferences made from them critically.

2. Methods for Describing Sets of Data

2.1 Describing Qualitative Data

Definition 2.1

A class is one of the categories into which qualitative data can be classified.

Definition 2.2

A class frequency is the number of observations in the data set falling in a particular class.

Definition 2.3

The class relative frequency is the class frequency divided by the total number of observations in the data set, i.e.,

Class relative frequency = class frequency / n

2.2 Graphical Methods for Describing Quantitative Data

Dot plot

For example, here is a typical dotplot.

110 |**111 |***112 |**113 |*****114 |******115 |***

116 |**117 |*

Stemplot

A set of data like the number of home runs that Barry Bonds hit can be represented by a list:16, 25, 24, 19, 33, 25, 34, 46, 37, 33, 42,40, 37, 34, 49, 73, 46, 45, 36. It is very difficult for me or just about anybody else to learn much about this data set from looking at a list of numbers like this, but a stemplot can provide a lot of insight. We use the tens digit as the stem, and the ones digit as the leaves to produce the display.

Excel output

Stem-and-Leaf Display Variable:  Barry BondsLeaf unit: 10   

1 6 92 4 5 53 3 3 4 4 6 7 7 4 0 2 5 6 6 95  

*Note the huge gap between the 40s and 70 !!!!

6  7 3

Histograms

Sometimes we have too much data to do a stem plot easily. Then a histogram is a more efficient choice. Here is the algorithm for doing such a plot.

1.Divide the data into classes of equal width. 2.Count # of observations in each class. 3.Draw histogram. Put variable values

(classes) on horizontal axis. Frequencies of relative frequencies = freq / total on the horizontal axis. No space between bars. Sum of relative frequencies sum to 1, or 100

From Barry Bonds Home Run Data, We divide eight classes the following way:

Class # of HR 1-10 011-20 221-30 331-40 841-50 551-60 0

61-70 071-80 1

Excel output

Barry Bonds HR Histogram

0

5

10

10 20 30 40 50 60 70 80 More

class

Fre

qu

ency

2.3 Summation Notation

means “add up all these numbers” .

2.4 Numerical Measures of Central Tendency

Measuring center: Mean, Median and Mode

Definition 2.4

The mean of a set of quantitative data is the sum of the measurements divided by the number of measurements contained in the data set.One measure of center is the mean or average. The mean is defined as follows, suppose we

have a list of numbers denoted,x1 x2 , …,xn .

That is, there are n numbers in our list. The mean or average x-bar (x ) of our data is defined by adding up all the numbers and dividing by the number of numbers. In symbols this is,

x=x1+x2+…+xn

n=1

n∑i=1

n

xi.

Symbols for the Sample Mean and the Population Mean

The symbols for the mean are

x = Sample meanμ = Population mean

Definition 2.5

The median M of a quantitative data set is the middle number when the measurements are arranged in ascending (or descending) order.

How to find the median.

1.Order observations from smallest to largest. 2.If n is odd, the median is the value of the

center observaton. Location is at (n+1) / 2 in the list.

3.If n is even, the median is defined to be ther average of the two center observations in the ordered list.

Comparing the Mean and the Median

Right Skewed Curve

Normal (Bell-shaped) Curve

Left Skewed Curve

Definition 2.8

The mode is the measurement that occurs most frequently in the data set.

2.5 Numerical Measures of Variability

Measuring spread:

Range, Sample Variance, and Sample Standard DeviationDefinition 2.9

The range of a quantitaive data set is equal to the largest measurement minus the smallest measurement.

Definition 2.10

The sample variance for a sample of n measurements is equal to the sum of the squared distances from the mean divided by (n-

1). In symbols, using s2

to represent the sample variance,

s2=∑i=1

n

( xi− x )2

n−1

Note: A shortcut formula for calculating s2

is

s2=∑i=1

n

xi2−

(∑i=1

n

x i )2

n

n−1 .

Definition 2.11

The sample standard deviation, s , is defined as the positive square root of the sample

variance, s2

. Thus,

s=√s2.

Symbols for Variance and Stardard Deviation

s2 = Sample variance

s = Sample standard deviation

σ 2 = Population variance

σ = Population standard deviationIf s is large when the observations are widely spread about the mean, and s is small when the data are closely clustered about the mean. The value s goes between zero and infinity. A value like s=0 would mean all the values in the dataset had the same value, and thus no spread at all in their values.

2.6 Interpreting the standard deviation

The 68-95-99.7 Rule

In any Normal Curve:

Sixty-eight percent of all observations fall

within s

units on either side of the mean x

. 95% of all obs fall within 2 standard

deviations s

's of the mean

x.

99.7% of all obs fall within 3 standard

deviations s

's of the mean

x.

Chebyshev’s Rule

Chebyshev’s Rule applies to any data set, regardless of the shape of the frequency distribution of the data.

No useful information is provided on the fraction of measurements that fall within 1 standard deviation of the mean, i.e., within

the interval (x

-s

, x

+s

) for samples and (μ

) for populations. At least ¾ of the measurements will fall

within the interval (x

-2s

, x

+2s

) for samples

and (μ

-2σ

+2σ

) for populations. At least 8/9 of the measurements will fall

within the interval (x

-3s

, x

+3s

) for samples

and (μ

-3σ

+3σ

) for populations.

2.7 Numerical Measures of Relative Standing

Definition 2.12

For any set of n mesarements (arranged in ascending or descending order), the p-th percentile is a number such that p% of the measurements fall below the pth percentile and (100-p)% fall above it.

Standard Normal DistributionA special normal curve to study is the standard

normal, with μ=0 and σ=1 . This is special because every normal problem can be converted to a problem about a standard normal. The conversion from a normally distributed variable, X with mean μ and standard deviation σ is carried out by the Z-Score transform given by,

z= x−μσ .

Definition 2.13

The sample z-score for a mesurement x is

z= x− xs

The population z-score for a mesurement x is

z= x−μσ

Interpretation of z-Scores for Mound-Shaped Distribution of Data

Approximately 68% of the measurements will have a z-score between -1 and 1.Approximately 95% of the measurements will have a z-score between -2 and 2.Approximately 99.7% of the measurements will have a z-score between -3 and 3.

2.8 Methods for Detecting Outliers

Sometimes it is important to identify inconsistent or unusual measurements in a data set. An observation that is unusually large or small relative to the data values we want to describe is called an outlier.

Definition 2.14

An observation (or measurement) that is usually large or small relative to the other vlaues in a data set is called an outlier. Outliers typically are attributable to one of the following causes:

1. The measurement associated with the outlier may be invalid.

2. The measurement comes from a different population.

3. The measurement is correct, but represents a rare (chance) event.

Measures Based on the Quartiles

We can now define some special percentiles:

The first quartile Q1 is the 25th percentile, 25 percent of the observations in a list are smaller than Q1.

The second quartile, Q2 is the 50th percentile, or the median. About half the data are less than this value Q2.

The third quartile, Q3 is the 75th percentile, about 75 percent of the observations are below this value Q3.

Notice that these three quartiles cut the data set into four parts, hence the name quartiles: 1) the part between the minimum and Q1 (25%), 2) the part between Q1 and Q2 (25%), 3) the part between Q2 and Q3 (25%), and 4) the part between Q3 and the maximum (25%).

How to find the quartiles.

1.Arrange the observations in increasing order and locate the median M in the ordered list of observations.

2.The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median.

3.The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.

Boxplot

A boxplot is a graph of the five-number summary.

A central box spans the quartiles Q1 and Q3.

A line in the box marks the median M. Lines extend from the box out to the

smallest and largest observations.

A measure of spread based on these quartiles is the Interquartile range IQR =Q3 - Q1, the distance between the quartiles. The IQR gives the spread in data values covered by the middle half of the data.

The quartiles in IQR give a good measure of spread because they are not sensitive to a few extreme observations in the tails. Thus, when a dataset has outliers or skewness the IQR is an appropriate summary measure.

A common rule of thumb for detecting outliers is that 1.5 times IQR should contain most of the data. Values in the dataset that are either bigger than 1.5* IQR+Q3 or values less than Q1 - 1.5* IQR are often flagged for further consideration as potential outliers.

3. Probability

3.1 Events, Sample Spaces, and Probability

Definition 3.1

An experiment is an act or process of observation that leads to a single outcome that cannot be predicted with certainty.

Definition 3.2

A sample point is the most basic outcome of an experiment.

Definition 3.3

The sample space S of an experiment is the collection of all its sample points.

Definition 3.4

An event is a subset of the sample space.

Here are probability rules for sample points:

1. All sample point probabilities must lie between zero and one.

2. The probabilities of all the sample points within a sample space must sum to 1.

Example Toss a coin. There are two possible sample points, and the sample space is

S = {heads, tails} or more briefly, S = {H,T}.

Example Toss a coin four times and record the results. That’s a bit vague. To be exact, record the results of each of the four tosses in order. The sample space S is the set of all 16 strings of four H’s and T’s:

S = { HHHH, HHHT, HHTH, HHTT,

HTHH, HTHT, HTTH, HTTT,

THHH, THHT, THTH, THTT,

TTHH, TTHT, TTTH, TTTT }

Suppose that our only interest is the number of heads in four tosses. The sample space contains only five outcomes:

S = { 0, 1, 2, 3, 4}.

Probability of an Event

The probability of an event A is calculated by summing the probabilities of the sample points in the sample space for A

Example

Take the sample space S for four tosses of a coin to be the 16 possible outcomes in the form HTHH. Then “exactly 2 heads” is an event. Call this event A. The event A expressed as a subset of outcomes is

A={HHTT, HTHT, HTTH, THHT, THTH, TTHH}

P(A)=P({HHTT, HTHT, HTTH, THHT, THTH, TTHH})

=P({HHTT})+P({HTHT})+P({HTTH})+P({THHT})+P({THTH})+P({TTHH})= 6/16=3/8

3.2 Unions and Intersections

Definition 3.5The union of two events A and B is the event that occurs if either A or B or both occur on a single performance of the experiment. We denote the union of events A and B by the

symbol A∪B

. A∪B

consists of all the sample points that belong to A or B or both.

Figure. Venn diagram showing disjoint events A and B.

Definition 3.5The intersection of two events A and B is the event that occurs if both A and B occur on a single performance of the experiment. We denote the intersection of events A and B by the

symbol A∩B

. A∩B

consists of all the sample points that belong to A and B.

3.3 Complementary Events

Definition 3.7The complement of an event A is the event that A does not occur- that is, the event consisting of all sample points that are not in event A. We

denote the complement of A by Ac

.

Here are some rule about probabilities:

The probability of an event happening is simply one minus the event not happening.

That is, P( A )=1−P ( Ac )

, or the probability of event A is one minus the probability of A not

happening, (Ac=A

complement).

If the events have no outcomes in common the probability of either of them happening is the sum of their probabilities. In notation, P(A or B) = P(A) + P(B).

For example, suppose a certain little town the number of children in households with children is

Outcome 1 2 3 4 5 6 or moreProbability .15 .55 .10 .10 .05 .05

The probability of two or fewer children is P(1 or 2)=P(1)+P(2)=.15+.55=.7.

Le’s denote A={1,2}. Then P(A

)=.7. How do you

find P(Ac

)?

P( A )=1−P ( Ac )= 1 - .7 = .3 .

3.4 The Additive Rule and Mutually Exclusive Events

Additive Rule of Probability

The probability of the union of events A and B is the sum of the probability of events A and B minus the probability of the intersection of events A and B, that is

P( A∪B ) =P(A) + P(B) – P( A∩B )

Definition 3.8Events A and B are mutually exclusive if A∩B

contains no sample points, that is, if A and B have no sample points in common.

Probability of Union of Two Mutually Exclusive Events

If two events A and B are mutually exclusive, the probability of the union of A and B equals the sum of the probabilities of A and B; that is,

P( A∪B ) =P(A) + P(B)

3.5 Conditional Probability

The new notation P( A|B) is a conditional probability. That is, it gives the probability of one event under the condition that we know another event. You can read the bar | as “given the information that.”

Formula for P( A|B )

To find the conditional probability that event A occurs given that event B occurs, divide the probability that both A and B occur by the probability that B occurs, that is,

P ( A|B )=P ( A∩B )P( B )

We assume that P( B)≠0.

Example Let’s define two events:

A = the woman chosen is young, ages 18 to 29

B = the woman chosen is married

The probability of choosing a young woman is

P( A )=22 ,512103 , 870

=0 .217 .

The probability that we choose a woman who is both young and married is

P( A and B)= 7 , 842103 , 870

=0 . 075 .

The conditional probability that a woman is married when we know she is under age 30 is

P( B|A )=P( A and B)

P( A )= 7 , 842

22 , 512=0. 348

.

3.6 The Multiplicative Rule and Independent Events

Multiplication Rule of Probability

The probability that both of two events A and B happen together can be found by

P( A ∩ B)=P( A )×P(B|A ).

Example Slim is still at the poker table. At the moment, he wants very much to draw two diamonds in a row. As he sits at the table looking at his hand and at the upturned cards on the table, Slim sees 11 cards. Of these, 4 are diamonds. The full deck contains 13 diamonds among its 52 cards, so 9 of the 41 unseen cards are diamonds. To find Slim’s probability of drawing two diamonds, first calculate

P( first card diamond)= 9

41

P( second card diamond | first card diamond

)= 840

Multiplication rule P( A ∩ B)=P( A )×P(B|A )

now says that

P( both cards diamonds)= 9

41× 8

40=0.044

.

Slim will need luck to draw his diamonds.

Probability Trees

Many probability and decision making problems can be conceptualized as happening in stages, and probability trees are a great way to express such a process or problem.

Example There are two disjoint paths to B (professional play). By the addition rule, P(B) is the sum of their probabilities. The probability of reaching B through college (top half of the tree) is

P( B and A )=P( A )×P(B|A )

=0.05×0 .017=0 . 00085 .

The probability of reaching B without college (bottom half of the tree) is

P( B and Ac )=P( Ac)×P (B|Ac )

=0.95×0 .001=0 . 00095 .

About 9 high school atheletes out of 10,000 will play professional sports.

Independent Events

Two events A and B that both have positive probability are independent if

P( A |B)=P( A ) .

When events A and B are independent, it is also true that

P( B |A )=P(B ).

Events that are not independent are said to be dependent.

Probability of Intersection of Two Independent Events

If events A and B are independent, the probability of the intersection of A and B equals the product of the probabilities of A and B; that is

P( A ∩B)=P ( A ) P(B ) .

The converse is also true: If

P( A ∩B)=P ( A ) P(B ),

then events A and B are independent.

3.7 Random Sampling

Definition 3.10

If n elements are selected from a population in such a way that every set of n elements in the population has an equal probability of being selected, the n elements are said to be a random sample.

A method of determining the number of samples is to use combinatorial mathematics. The combinatorial symbol for the number of different ways of selecting n elements from N elements is

(Nn )

,which is read “the number of combinations of N elements taken n at a time.” The formula for calculating the number is

(Nn )= N !

n !( N−n) !

where “!” is the factorial symbol and is shorthand for the following multiplication:

n !=n(n−1)(n−2)⋯(3 )(2 )(1)

Thus, for example, 5 !=5⋅4⋅3⋅2⋅1=120 .(The quantity 0 ! is defined to be 1.)

4. Discrete Random Variables

Definition 4.1

A random variable is a variable that assumes numerical values associated with the random outcomes of an experiment, where one (and only one) numerical value is assigned to each sample point.

For example define the random variable X as the number of heads in 2 tosses of a fair, 50-50

coin. The sample space is S={HT , HH , TH ,TT } the corresponding outcomes in this sample space get associated with values of the random

variable X as {1,2,1,0} because the outcomes have 1,2,1, and 0 heads respectively.

4.1 Two Types of Random Variables

Random Variable: Discrete random variable

Continuous random variable

Discrete Random variable

A discrete random variable X has a finite number of possible values.

The following are examples of discrete random varibles:

1. the number of seizers an epileptic patient has in a given week: x=0,1,2,…

2. The number of voters in a sample of 500 who favor impeachment of the president: x=0,1,2,…,500

3. The number of students applying to medical schools this year: x=0,1,2,…

4. The number of errors on a page of an account’s ledger: x=0,1,2,…

5. The number of customers waiting to be served in a restaurant at a particular time: x=0,1,2,…

Continuous Random variable

Random variables that can assume values corresponding to any of the points contained in one or more intervals are called continuous.

Suppose that we want to choose a number at random between 0 and 1, allowing any number between 0 and 1 as the outcome. Software random number generators will do this. You can visualize such a random number by thinking of a spinner (Figure). The sample space is now an entire interval of numbers:

S={all number x such 0≤x≤1 } .

Figure. A spinner that generate a random number between 0 and 1.

4.2 Probability Distributions for Discrete Random Variables

The probability distribution of X lists the values and their probabilities:

Value of X ProbabilityX1 p1

X2 p2

X3 p3

: :: :

Xk pk

The Probabilities pi must satisfy two requirements:

1. Every probability pi is a number between 0 and 1.

2. p1+p2 … +pk = 1

We usually summarize all the information about a random variable with a probability table like:

X 0 1 2------------------------------------

P(x) 1/4 1/2 1/4

this is the probability table representing the random variable X defined above for the 2 toss coin tossing experiment. There is one outcome with zero heads, 2 with one head, and one with 2 heads. All outcomes are equally likely, and this means the probabilities are defined as the number of outcomes in the event divided by the total number of outcomes.

Definition 4.4

The probability distribution of a discrete random variable is a graph, table, or formula that specifies the probability associated with each possible value the random variable can assume.

4.3 Expected values of Discrete Random Variables

Definition 4.5

The mean, or expected value, of a discrete random variable x is

μ=E ( x )=x1 p1+x2 p2+⋯+ xk pk

=∑i=1

k

x i pi.

Suppose that X is a discrete random variable whose distribution is

Value of X ProbabilityX1 p1

X2 p2

X3 p3

: :: :

Xk pk

To find the mean of X, multiply each possible value by its probability, then add all the products:

μ=E ( x )=x1 p1+x2 p2+⋯+ xk pk

=∑i=1

k

x i pi.

This means that the average or expected value, μ , of the random variable X is equal to the sum

of all possible values of the variable, the x i ,

multiplied by the probabilities of each value happening.

In our 2 tosses of a coin example, we can compute the average number of heads in 2 tosses by 0(1/4)+1(1/2)+2(1/4)=1. That is, the average number or expected number of heads in 2 tosses is one head.

A more helpful way to implement this formula is to create the random variable table again, but now add an additional column to the table, and call it X P(X). In this third column multiply the value of X by the probability. For example,

X P(x) X*P(X)----------------------------

0 1/4 0 1 1/2 1/2 2 1/4 1/2then the average or expected value of X is found by adding up all the values in the third column to

obtain μ=1 .

Another example is suppose we toss a coin 3 times, let X be the number of heads in 3 tosses. The table is:

X P(x) X*P(X)----------------------------

0 1/8 0 1 3/8 3/8

2 3/8 6/8 3 1/8 3/8to give μ =12/8=1.5 so that the expected number of heads in three tosses is one and a half heads.

Since a probability distribution can be viewed as a representation of a population, we will use the population variance to measure its variability.

Definition 4.6

The variance of a random variable x is

σ 2=E [ ( x−μ)2 ]=∑ (x−μ )2 p( x )

Definition 4.7

The standard deviation of a discrete random variable x is equal to the square root of the variance, i.e.,

σ=√σ2

4.4 The Binomial Random Variable

Characteristics of a Binomial Random Variable

1. The experiment consists of n identical trials.2. There are only two possible outcomes on

each trial. We will denote one outcome by S (for success) and the other by F (for Failure).

3. The probability of S remains the same from trial to trial. The probability is denoted by p, and the probability of F is denoted by q. Note that q=1-p.

4. The trials are independent.5. The binomial random variable x is the

number of S’s in n trials.

The Binomial distributions for sample counts

Think of tossing a coin n times as an example of the binomial setting. Each toss gives either heads or tails. The outcomes of successive tosses are independent. If we call heads a success, then p is the probability of obtaining a head. The number of heads we count is a random variable X. The distribution of X is determined by the number of observations n and the success probability p.

Binomial Distribution

The distribution of the count X of successes is called the binomial distribution with parameters n and p. The parameter n is the number of observations, and p is the probability of a success on any one observation. The possible values of X are the whole numbers from 0 to n. As an abbreviation, we say that X is B(n,p).

Example 5.2 (a) Toss a balanced coin 10 times and count the number X of heads. There are n=10 tosses. Successive tosses are independent. If the coin is balanced, the

probability of a head is p=0.5 on each toss. The number of heads we observe has the binomial distribution B(10, 0.5).

In general, we can use combinatorial mathematics to count the number of sample points. For example,

Number of sample points for which x=3= Number of different ways of selecting 3 successes of the 4 trials

= (43 )= 4 !

3 ! (4−3 )!= 4⋅3⋅2⋅1

3⋅2⋅1⋅(1 )=4

The formula that works for any value of x can be deduced as follows: suppose p=0.1 and q=0.9,

P( x=3 )=(43 )( . 1)3×( . 9 )1=(4x )( . 1)x×( .9 )4− x

The component (43 )

counts the number of sample points with x successes and the

components ( .1 )x×(. 9 )4−x is the probability

associated with each sample point having x successes.

The Binomial probability Distribution

P( x )=(nx ) px×qn− x

,(x=0,1,2 , .. . , n )where

p= Probability of a success on a single trial q= 1-p n= Number of trials x= Number of successes in n trials

(nx )= n !

x ! (n−x )!

As noted in Chapter 3, 5 !=5⋅4⋅3⋅2⋅1=120 .

Similarly, n !=n⋅(n−1 )⋅(n−2)⋯3⋅2⋅1 .

Binomial Mean and Standard Deviation

If a count X has the binomial distribution B(n,p), then

Mean: μ=n×p

Variance: σ2=n×p×q

Standard deviation: σ=√n×p×q

Example The Helsinki study planned to give gemfibrozil to about 2000 men aged 40 to 55 and a placebo to another 2000. The probability of a heartattack during the five year period of the study for men this age is about 0.04. What are the mean and standard deviation of the number of heart attacks that will be observed in one group if the treatment does not change this probability? (Solution). There are 2000 independent observations, each having probability p=0.04 of a heart attack. The count X of heart attacks is B(2000, 0.04), so that

μ=n×p=2000×0 . 04=80

σ=√n×p×(1−p )

=√2000×0 . 04×(1−0 .04 )=8 .76

Finding binomial probabilities: Tables

We can find binomail probabilities for some values for n and p by looking up probabilities in Table II (Please look at page 885) in the back of the book. The entries in the table are the probabilities P(X=k) of individual outcomes for a binomial random variable X.

Example A quality engineer selects an SRS of 10 switches from a large shipment for detailed inspection. Unknown to the engineer, 10% of the switches in the shipment fail to meet the specifications. What is the probability that no more than 1 of the 10 switches in the sample fails inspection?

(Solution). Let X = the count of bad switches in the sample.

The probability that the switches in the shipment fail to meet the specification is p = 0.1 and sample size is n=10. Thus, X is B(n=10, p=0.1).

We want to calculate

P( X≤1)=P ( X=0 )+P( X=1)

Let’s look at page 885 in the Table II for this calculation, look opposite n=10 and under p=0.10. This part of the table appears at the left.

The entry opposite each k is P( X=k ). We find

P( X≤1)=P ( X=0 )+P( X=1)

=0.736 .

About 74% of all samples will contain no more than 1 bad switch.

Figure Probability histogram for the binomial distribution with n=10 and p=0.1, for Example.

Example Corinne is a basketball player who makes 80% of her free throws over the course of a season. In a key game, Corinne shoots 15 free throws and misses 5 of them. The fans think that she failed because she was nervous. Is it unusual for Corinne to perform this poorly?

(Solution). Because the probability of making a free throw is greater than 0.5, we count misses in order to use Table II.

Let X = the number of misses in 15 attempts.

The probability of a miss is p=1-0.80=0.20. Thus, X is B(n=15, p=0.20).

We want the probability of missing 5 or more. This is

P( X≥5)=P( X=5 )+⋯+P( X=15) .

Let’s look at page 885 in the Table II for this calculation, look opposite n=15 and under p=0.20. This part of the table appears at the left.

The entry opposite each k is P( X=k ). We find

P( X≥5)=P( X=5 )+⋯+P( X=15)

=1−P( X≤4 )

=1−0 .838=0 . 162 .

Corinne will miss 5 or more out of 15 free throws about 16% of the time, or roughly one of every six games. While below her average level, this performance is well within the range of the usual chance variation in her shooting.

4.5 The Poisson Random Variable

A type of probability distribution that is often useful in describing the number of events that will occur in a specific period of time or in a specific area or volume is the Poisson distribution (named after the 18th-century physicist and mathematician, Simeon Poisson)

Characteristics of a Poisson Random Variable

The experiment consists of counting the number of times a certain event occurs during a given unit of time or in a given area or volume.The probability that an event occurs in a given unit of time, area, or volume is the same for all the units.The number of events that occur in one unit of time, area, or volume is independent of the number that occur in other units.The mean (or expected) number of events in each unit is denoted by the Greek letter lambda λ

Probability Distribution, Mean, and Variance for a Poisson Random variable

P( x )= λx e− λ

x ! ,(x=0,1,2 , .. . , n )

μ= λ , σ2= λ

where

λ = Mean number of events during given unit of time, area, volume, etc.

5. Continuous Random Variables

5.1 Continuous Probability Distribution

Continuous Random variable

A continuous random variable takes all values in an interval of numbers. The probability distribution of X is described by a density curve. The probability of any event is the area under the density curve and above the values of X that make up the event.

Figure The probability distribution of a continuous random variable assigns probabilities as area under a density curve.

The probability associated with a particular value of x is equal to 0; that is, P(x=a)=0 and hence

P(a< x<b)=P(a≤x≤b ) .

5.2 The Uniform Distribution

Continuous random variables that appear to have equally likely outcomes over their range of possible values possess a uniform probability distribution, perhaps the simplest of all continuous probability distributions.

Figure Assigning probabilities for generating a random number between 0 and 1. The probability of any interval of numbers is the area above the interval and under the curve.

Suppose the random variable x can assume values only in an interval c≤x≤d . The height

of f ( x ) is constant in that interval and equals

1/(d-c). Therefore, the total area under f ( x ) is given by

Total area of rectangle =(Base)(Height)

= (d−c )( 1

d−c )=1

Probability Distribution, Mean, and Standard Deviation of a Uniform Random Variable x

f ( x )= 1d−c (c≤x≤d )

μ= c+d2

σ=d−c

√12

5.3 The Normal Distribution

Probability Distribution for a Normal Random Variable x

f ( x )= 1σ √2π

e−(1/2)[(x−μ)/σ ]2

where

μ= Mean of the normal random variable x

σ= Standard deviation

π= 3.1416...

e= 2.71828...

Definition 5.1The standard normal distribution is a normal

distribution with μ=0 and σ=1 . A random variable with a standard normal distribution, denoted by the symbol z, is called a standard normal random variable.

Normal distributions as probability distributions

In the language of random variables, if x has the

N(μ

, σ

) distribution, then the standardized variable

z= x−μσ

is a standard normal random variable having the distribution N(0,1).

Here are the steps for finding a Probability Corresponding to a normal Random variable .

1.Draw a normal curve first, do your best. 2.Next, label the center or mean of the curve

with zero because standard normal curves have a mean of zero.

3.Put in scaling by finding the distance from the center to the inflection point. This distance above mu is one unit. Put in a 1, that is one standard deviation above mu.

4.Use Table IV in Appendix A to find the areas corresponding to the z-values.

5.4 Descriptive Methods for Assessing Normality

Determining Whether the Data Are From an Approximately Normal Distribution

1. Construct either a histrogram or stem-and-leaf display for the data and note the shape of the graph. If the data are approximately normal, the shape of the histogram or stem-and-leaf display will be similar to the normal curve.

2. Compute the intervals (x -s , x +s ), (x -2s , x +2s ), and (x -3s , x +3s ) determine the percentage of measurements falling in each. If the data are approximately normal, the percentage will be approximately equal to 68%, 95%, and 100%, respectely.

3. Find the interquartile range, IQR, and standard deviation,s, for the sample, then calculate the ratio IQR/s. if the data are approximately normal, then IQR/s = 1.3 .

4. Construct a normal probability plot for the data. If the data are approximately normal, the point will fall (approximately) on a straight line.

Definition 5.2

A normal probability plot for a data set is a scatterplot with the ranked data values on one axis and their corresponding expected z-scores from a standard normal distribution on the other axis.

Arc 1.06, rev July 2004, Sun Sep 19, 2004, 22:52:29. Data set name: baseballPerformance/Salary Data for Major League Baseball teams in 1995. FromSamaniego, F. J. and Watnik, M. R. (1997). "The Separation Principle inLinear Regression." Journal of Statistics Education, Vol. 5, Number 3,available at http://www.stat.ncsu.edu:80/info/jse/v5n3/samaniego.html.Name Type n InfoHitpay Variate 28 Payroll of non-pitchers onlyPayroll Variate 28 Total payroll, millions of dollarsPitchpay Variate 28 Payroll of pitchers onlyWins Variate 28 Number of games wonTeam Text 28 Team name

5.6 The Exponential Distribution

The exponential distribution is an example of a skewed distribution. It is a popular model for populations such as the length of time a light bulb lasts. For this reason, the exponential distribution is sometimes called the waiting time distribution.

Probability Distribution for an Exponential Random Variable x

The Probability density function:

f ( x )=1θ

e−1/θ

( x>0 )

Mean: μ=θ

Standard deviation: σ=θ .

Chapter 6: Some Continuous Probability Distributions

Again, PDFs are population quantities which gives us information about the distribution of items in the population. There are many PDFs where are used to understand probabilities associated with random variables. There are a few PDFs which are used for multiple real-life situations. These PDFs are described next.

From this chapter, it is important to learn the following:

What are these PDFs which can be used for multiple situations

When can these PDFs be used

The means and variances for random variables with these PDFs

All PDFs in this chapter will be for continuous random variables.

6.1: Continuous Uniform Distribution

The simplest PDF for continuous random variables is when the probability of observing a particular range of values for X is the same for all equal length ranges! Since the probabilities are the same, this PDF is called the uniform PDF.

The Uniform PDF – Let X be a random variable on the interval [A,B]. The uniform PDF is

Notes: o We examined this PDF at the beginning of

Section 3.3!o The parameters, A and B, control the

location of the PDF. In general, this is what a graph of the PDF looks like.

o The area under the curve is 1. Since the PDF looks like a rectangle, we can take baseheight = (B-A)[1/(B-A)] to find the area is 1.

Example: Uniform distribution with A=1 and B=4 (uniform.xls)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Uniform PDF

x

f(x)

Areas underneath the curve correspond to probabilities. For example, P(1<X<3) = 0.67.

How could I find this using calculus?

Note the blue lines on the x-axis should be extended to the end of the plot.

Theorem 6.1 – The mean and variance of a random variable X with a uniform PDF are

and

6.2: Normal Distribution

This is the main PDF that we will be using since it occurs in many applications.

Normal PDF – Let X be a random variable with mean E(X)= and Var(X)=2. The normal PDF is

Notes:The parameters, and , control the location and scale of the distribution, respectively. These are the population mean and standard deviation! Thus, a nice simplification with the normal PDF is that the mean and standard deviation can be represented easily as parameters in the function.

In most realistic applications, and will not be known and we will need to estimate them. How to do this will be discussed in future chapters.

The book denotes f(x;,) by n(x;,).

Terminology: Suppose X is a random variable with a normal PDF. One can shorten how this is said by saying X is a normal random variable.

In general, this is what a graph of the distribution looks like.

o The curve graphed are (x, f(x)) connected

points. o The PDF is centered at (symmetric

about ). Thus, P(X>) = P(X<) = 0.5. The parameter is often called a location parameter since it gives the central location of the PDF.

o The area under the curve is 1.

o The left and right sides of the curve extend out to - and + without touching the x-axis (although it will get very close). Note the plot above may be a little misleading with respect to this. The left and right sides of the PDFs are often called the “tails” of the PDF.

o controls the scale of the PDF. The larger , the more spread out the PDF (large variability). The smaller , the less spread out the PDF (small variability). Below are three normal PDFs demonstrating this.

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normal PDF Example

x (MPG)

f(x)

A VERY IMPORTANT specific case of a normal PDF is the standard normal PDF. This PDF has =0 and =1. Therefore,

Typically, “Z” is used instead of “X” to denote a standard normal random variable. This will be discussed more later.

Showing =1 is not as easy as it was in Chapter 3. The proof involves

making a transformation to polar coordinates. Pages 104-5 of Casella and Berger’s (1990) textbook shows the proof (this book is used for STAT 882).

Example: Interactive normal PDFs (normal_dist.xls)

This file is constructed to help you visualize the normal probability distribution. For example, below is the normal PDF for =50 and =3.

Experiment on your own using different values of and to see changes in the distribution. Make sure you understand the following:

What happens when is increased or decreased?

What happens when is increased or decreased?

Where is the highest point on the distribution? What is this highest point?

Also in the file are examples of how to use the NORMDIST( ) and NORMINV( ) Excel functions which be discussed in detail in Section 6.3.

Below is the proof showing that E(X) = . A similar proof can be done to show Var(X) = 2 (see p. 146 of the book).

6.3-6.4: Areas Under the Normal Curve and Applications of the Normal Distribution

Example: Grand Am (grand_am_normal.xls)

Suppose that it is reasonable to assume a Grand Am’s MPG has a normal PDF with a mean MPG of =24.3 and a standard deviation of =0.6. Let X denote the MPG for one tank of gas. Answer the following questions.

1) Find the probability that a randomly selected Grand Am gets less than 23 MPG for one tank of gas.

We need to find P(X<23) = F(23). This is the area to the left of the red line underneath the PDF.

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal PDF Example

m=24.3 & s=0.6

x (MPG)

f(x)

This probability can be found by:

.

Using Maple without evaluating at the limits of integration, we get:

> assume(sigma>0);> f:=1/(sqrt(2*Pi)*0.6)*exp(- (x-24.3)^2/(2*0.6^2));

:= f .83333333352 e

( ) 1.388888889( )x~ 24.3 2

> int(f,x);.4999999998 ( )erf 1.178511302 x~ 28.63782464

Notice that a capital P is used in the Pi function. See http://mathworld.wolfram.com/Erf.html for more information on the erf() function.

Using Maple with the limits of integration, we get:

> int(f,x=-infinity..23);.0151301397

where it uses numerical approximations for the last integral.

To make finding probabilities easier, many software packages (and calculators) have special functions which do the integration for X in some interval. In Excel, the NORMDIST(x, , , TRUE) function finds F(x) for a normal random variable with mean and standard deviation .

Chris Bilder, 02/13/04,
FALSE evaluates the f(X) at X=23. Thus, the height of the curve is found at X=23.

For this example, use

NORMDIST(23,24.3,0.6,TRUE)

This results in 0.0151.

Chris Malone’s Excel Instructions website contains help for this function at http://www.statsclass.com/excel/tables/prob_values.html#prob_n. The web page shows another way to use the function through a window based format.

Side note: To find the probability in Maple using its specialized functions, you can use the following code:

> with(stats); [ ], , , , , , ,anova describe fit importdata random statevalf statplots transform

> statevalf[cdf,normald[24.3,0.6]](23);

.01513014001

2) Suppose is increased to =1.3. What do you expect to happen to P(X<23)?

The Excel function is NORMDIST(23,24.3,1.3,TRUE)

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal PDF Example

m=24.3 & s=1.3

x (MPG)

f(x)

3) Suppose =0.6 again, but is decreased to =23.1. What do you expect to happen to P(X<23)?

The Excel function is NORMDIST(23,23.1,0.6,TRUE)

Chris Bilder, 02/13/04,
Probability goes up since the mean is closer to 23. Thus, it seems more reasonable to have an X<23. P(X<23)=0.4338
Chris Bilder, 02/13/04,
Probability goes up since there is more variability. Thus, it seems more reasonable to have an X<23. P(X<23)=0.1587

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal PDF Example

m=23.1 & s=0.6

x (MPG)

f(x)

Below is a nice comparative graph for the 3 examples above.

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal PDF Example

m=24.3 & s=0.6 m=24.3 & s=1.3 m=23.1 & s=0.6

x (MPG)

f(x)

4) Suppose =0.6 and =24.3 again. What is P(23<X<25)?

The probability needs to be broken up since the NORMDIST( ) function only finds probabilities in the form of F(x).

P(23<X<25) = P(X<25) – P(X<23) = F(25) – F(23).

This can be found with the Excel functions:

NORMDIST(25,24.3,0.6,TRUE)-NORMDIST(23,24.3,0.6,TRUE)

The probability is 0.8632.

2020.120.220.320.420.520.620.720.820.92121.121.221.321.421.521.621.721.821.92222.122.222.322.422.522.622.722.822.92323.123.223.323.400000000000123.500000000000123.600000000000123.700000000000123.800000000000123.900000000000124.000000000000124.100000000000124.200000000000124.300000000000124.400000000000124.500000000000124.600000000000124.700000000000124.800000000000124.900000000000125.000000000000125.100000000000125.200000000000125.300000000000125.400000000000125.500000000000125.600000000000125.700000000000125.800000000000125.900000000000126.000000000000126.100000000000126.200000000000126.300000000000126.400000000000126.500000000000126.600000000000126.700000000000126.800000000000126.900000000000127.000000000000127.100000000000127.200000000000127.300000000000127.400000000000127.500000000000127.600000000000127.700000000000127.800000000000127.900000000000128.000000000000128.100000000000128.200000000000128.300000000000128.400000000000128.500000000000128.600000000000128.700000000000128.800000000000128.900000000000129.000000000000129.100000000000129.200000000000129.300000000000129.400000000000129.500000000000129.600000000000129.700000000000129.800000000000129.900000000000130.00000000000010

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal Probability Distribution Example for m=24.3, s=0.6

(23< <25)P XX

f(X)

5) Suppose =0.6 and =24.3 again. What is P(X>23)?

6) Suppose =0.6 and =24.3 again. What is P(X<23 or X>25)?

Chris Bilder, 02/13/04,
Use complement: P(X<23 or X>25) = 1-P(23<X<25) = 1-0.8632 = 0.1368
Chris Bilder, 02/13/04,
Use complement: P(X>23) = 1-P(X<23) = 1-0.0151 = 0.9849

7) What MPG is at least required for a car to be in the top 5% of all Grand Ams? Suppose =0.6 and =24.3 again.

This problem requires going in the opposite direction. We are now given a probability and need to find the corresponding “x” that works for P(X>x)=0.05. In terms of integration, we are trying to find x in the equation below:

Equivalently,

Notice the limits of integration used are in terms of y. This is done to avoid confusion of integrating from “x=x to ”.

The x value can be found by using Excel’s NORMINV(area,, ) function where area=P(X<x).

Be careful! Notice that the area is for P(X<x), not P(X>x).

The x value can be found with the Excel function:

NORMINV(0.95,24.3,0.6)

Therefore, P(X>25.29)=0.05.

See http://www.statsclass.com/excel/tables/crit_values.html#crit_n for more information about this function. Note that we will eventually use these types of values as “critical points” in hypothesis testing.

Here are other ways to find the value of x in Maple:

> with(stats); [ ], , , , , , ,anova describe fit importdata random statevalf statplots transform

> statevalf[icdf,normald[24.3,0.6]](0.95);

25.28691218

> f:=1/(sqrt(2*Pi)*0.6)*exp(- (y-mu)^2/(2*sigma^2));

:= f .83333333352 e

/1 2

( )y 2

2

> solve(0.95 = eval(int(f, y = -infinity..x), [mu=24.3, sigma=0.6], x);

25.28691217

Example: Grading (grade_bell.xls)

Suppose the set of test #2 grades in the class has a normal distribution with =73% and =8%. Let X be a student’s grade. Answer the following.

1) What is the probability that a randomly chosen student in the class received a grade of 90% or better?

50 55 60 65 70 75 80 85 90 95 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Grading Normal PDF Example

m=73 & s=8

x (Grade)

f(x)

Let X be a normal random variable with =73% and =8%. Find P(X>90). Thus, we need to find

The Excel function is 1-NORMDIST(90,73,8,TRUE) and the answer is 0.0168.

2) What percentage of students scored between a 70% and 90%?

50 55 60 65 70 75 80 85 90 95 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Grading Normal PDF Example

m=73 & s=8

x (Grade)

f(x)

The Excel function is NORMDIST(90,73,8,TRUE)-NORMDIST(70,73,8,TRUE) and the answer is 0.6294.

3) Suppose that your instructor curves the test #2 grades and that ONLY the top 10% of test scores receive A’s. Would a student be better off with a test #2 grade of 81% (still with =73% and =8%) or a grade of 68% on a different test #2 that has a normal distribution with =62% and =3%?

50 55 60 65 70 75 80 85 90 95 1000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Grading Normal PDF Example

m=73 & s=8 m=62 & s=3

x (Grade)

f(x)

Find the top 10% of the scores for each situation.

For =73% and =8%, find x for P(X>x)=0.10.

The Excel function to find this is NORMINV(0.9,73,8) and the answer is 83.25.

For =62% and =3%, find x for P(X>x)=0.10.

The Excel function to find this is NORMINV(0.9,62,3) and the answer is 65.84.

A student would prefer the second test since an A would be received.

Rule of thumb for the number of standard deviations all data lies from its mean:

In Chapter 4, we discussed that approximately all data lies within 2 or 3. We also discussed the more formal expression of this using Chebyshev’s Rule. Examine what happens if our data comes from a normal PDF. The end result is what is often called the Empirical Rule.

Example: Standard normal distribution template (stand_norm_prob.xls)

Let Z be a random variable with a standard normal PDF. Thus, =0 and =1. All of these results apply for 0 and 1 also. Below are three screen captures that show a standard normal PDF. The distributions show the area between 1, 2, and 3 standard deviations of the mean.

Notice how large the probability is that Z is between 2 or 3 standard deviations from the mean!

Reminder about P(X=x)=0

What is P(X=x)? It is 0 since X is a continuous random variable. To see why this is true, consider this proof by example.

Let Z be a standard normal random variable. The following table of probabilities can then be constructed.

ProbabilityP(0.95<Z<1.05) 0.0242P(0.98<Z<1.02) 0.0096P(0.99<Z<1.01) 0.0049

P(0.99<Z<1) 0.0042P(1<Z<1.01) 0.0025

P(Z=1) 0

Notice the probability gets smaller and smaller as the interval gets smaller. Eventually, the probability will become 0.

Remember that for some PDF f(x) where X is a continuous random variable. When a=b, then

.

Standard normal PDF

Probabilities associated with the standard normal PDF have been tabled.

Example: Standard normal distribution tables (stand_norm_table.xls)

Before there were readily assessable software package or calculators with functions for the normal PDF, people used tables based on the standard normal PDF in order to find probabilities associated with ANY normal PDF. Table A.3 on p.670-1 of the book is one of these tables. It provides F(z), the CDF for a standard normal random variable Z. The reason why I am using Z here is because this is the common practice when discussing standard normal random variables.

Thus, Table A.3 gives probabilities such as the one shown below.

Below is an excerpt from the table contained in stand_norm_table.xls.

  0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09-3.40.00030.00030.00030.00030.00030.00030.00030.00030.00030.0002-3.30.00050.00050.00050.00040.00040.00040.00040.00040.00040.0003-3.20.00070.00070.00060.00060.00060.00060.00060.00050.00050.0005-3.10.00100.00090.00090.00090.00080.00080.00080.00080.00070.0007-3.00.00130.00130.00130.00120.00120.00110.00110.00110.00100.0010-2.90.00190.00180.00180.00170.00160.00160.00150.00150.00140.0014-2.80.00260.00250.00240.00230.00230.00220.00210.00210.00200.0019-2.70.00350.00340.00330.00320.00310.00300.00290.00280.00270.0026-2.60.00470.00450.00440.00430.00410.00400.00390.00380.00370.0036-2.50.00620.00600.00590.00570.00550.00540.00520.00510.00490.0048-2.40.00820.00800.00780.00750.00730.00710.00690.00680.00660.0064-2.30.01070.01040.01020.00990.00960.00940.00910.00890.00870.0084-2.20.01390.01360.01320.01290.01250.01220.01190.01160.01130.0110-2.10.01790.01740.01700.01660.01620.01580.01540.01500.01460.0143-2.00.02280.02220.02170.02120.02070.02020.01970.01920.01880.0183-1.90.02870.02810.02740.02680.02620.02560.02500.02440.02390.0233

This table uses the NORMDIST(z, 0, 1, TRUE) function to find P(Z<z). For example,

P(Z<-3.41) = 0.0003,

P(Z<-3.03) = 0.0012,

P(Z<-2.57) = 0.0051,

Why are we concerned with this table of standard normal probabilities?

A simple transformation can be made from ANY normal PDF to the standard normal PDF using the following formula:

where X is a normal random variable with mean and standard deviation and Z is a standard normal random variable with mean 0 and standard deviation 1.

Therefore, using this one table, we can find all normal PDF probabilities WITHOUT Excel or other means.

Example: Grand Am (grand_am_normal.xls)

Suppose that it is reasonable to assume a Grand Am’s MPG has a normal PDF with a mean MPG of =24.3 and a standard deviation of =0.6. Let X denote the MPG for one tank of gas. Answer the following questions.

1) Find the probability that a randomly selected Grand Am gets less than 23 MPG for one tank of gas.

We need to find P(X<23) = F(23). This is the area to the left of the red line underneath the PDF.

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal PDF Example

m=24.3 & s=0.6

x (MPG)

f(x)

The function, NORMDIST(23,24.3,0.6,TRUE), can be used in Excel to find the probability to be 0.0151.

Using the tables, P(X<23)

= = P(Z<-2.1667) P(Z<-2.17) = 0.0150.

2) Suppose is increased to =1.3. What do you expect to happen to P(X<23)?

The function, NORMDIST(23,24.3,1.3,TRUE), can be used to find the probability to be 0.1587.

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal PDF Example

m=24.3 & s=1.3

x (MPG)

f(x)

Using the tables, P(X<23)

= = P(Z<-1) = 0.1587.

3) Suppose =0.6 again, but is decreased to =23.1. What do you expect to happen to P(X<23)?

The function, NORMDIST(23,23.1,0.6,TRUE), can be used to find the probability to be 0.4338.

Using the tables, P(X<23)

= = P(Z<-0.1667) P(Z<-0.17) = 0.4325

4) Suppose =0.6 and =24.3 again. What is P(23<X<25)?

The function, NORMDIST(25,24.3,0.6,TRUE)-NORMDIST(23,24.3,0.6,TRUE)

can be used to find the probability to be 0.8632

Using the tables, P(23<X<25)

=

= P(-2.1667<Z<1.1667)

P(-2.17<Z<1.17)

= P(Z<1.17) – P(Z<-2.17)

= 0.87900 – 0.01500

= 0.86400

5) Suppose =0.6 and =24.3 again. What is P(X>23)?

6) Suppose =0.6 and =24.3 again. What is P(X<23 or X>25)?

7) What is MPG required for a car to be in the top 5% of all Grand Ams? Suppose =0.6 and =24.3 again.

The x value for P(X>x)=0.05 was found with the Excel function =NORMINV(0.95,24.3,0.6). This produced P(X>25.29)=0.05.

Chris Bilder, 02/13/04,
Use complement: approximately 0.1368
Chris Bilder, 02/13/04,
Use complement: approximately 0.9849

Using the tables, P(X>x) =

.

Note that P(Z<z)=0.95 produces z1.64.

Then

and . Thus, z=25.284. Therefore, P(X>25.284)0.05.

Observing a sample from a population characterized by a normal PDF

Suppose a population can be characterized by a normal PDF. What characteristics would you expect for a sample taken from that population?

Example: MPG (gen_norm.xls)

MPG example from before: X is a normal random variable with =E(X)=24.3 and =

=0.6. Suppose 1,000 different x’s are observed. In other words, a sample of 1,000 is taken from the population

Questions:1) What would you expect the average value

of the 1,000 observed x’s to be approximately?

2) What range would you expect most of the x’s to fall within?

Observed values of a normal random variable can also be generated in the same

way as what was done in the Chapters 3 and 5. Excel also has a specific normal PDF option in the Random Number Generation window. The file, gen_norm.xls, gives an example of using the window below. More directions are available at Chris Malone’s Excel help website at http://www.statsclass.com/excel/misc/norm_dist.html.

In this case, 1 variable with 1,000 observed values are generated. The mean =24.3 and standard deviation =0.6 are used to coincide with the Grand Am example. The seed number gives Excel a random place to start when generating these observed values. I can use this seed number again and generate the exact same data!

Below are part of the results. MPG population sample

23.74819 mean 24.3 24.3210623.48401 standard deviation 0.6 0.59628524.5949624.1905924.3233923.11923 classes Bin Frequency24.47166 22.6 22.6 024.62146 22.8 22.8 324.72662 23 23 1124.14857 23.2 23.2 2323.62592 23.4 23.4 2323.90335 23.6 23.6 5324.57064 23.8 23.8 8324.05448 24 24 11023.71645 24.2 24.2 11325.07131 24.4 24.4 12124.32436 24.6 24.6 111

24.1024 24.8 24.8 13624.62523 25 25 9324.28823 25.2 25.2 55

24.7843 25.4 25.4 3724.04441 25.6 25.6 1624.54797 25.8 25.8 524.26204 26 26 423.68477 26.2 26.2 124.76348 26.4 26.4 124.05258 26.6 26.6 124.38633 26.8 26.8 023.26503 More 023.7828624.09225

23.6193223.3286723.0244423.9139724.42597

24.42067

Notes:

Notice how close and are to the sample mean and standard deviation. The sample standard deviation is calculated as

where is the sample mean and xi for i=1,…,n is the ith observed value. Explanation for why this formula was used will be given in Chapter 8.

Here is an example of how to simulate a sample from a normal PDF using Maple:

> randomize(1514);1514

> data:=stats[random, normald[24.3, 0.6]](100);data 25.36372908 24.96025314 24.44663243 25.27318122 23.94262355, , , , , :=

23.69829609 23.96992063 23.72400640 24.06923492 24.38832186, , , , ,

24.54452405 23.47219191 24.51653894 24.22545826 24.58063212, , , , ,

24.40056631 24.22519976 24.73647509 23.04956592 24.94875357, , , , ,

24.02254401 24.35341391 24.67885308 24.81796173 23.60716054, , , , ,

24.15571156 24.48549168 23.84686372 25.62993784 24.95907390, , , , ,

24.13187013 24.40491872 25.04623787 23.81147131 23.04161664, , , , ,

25.57549338 23.34059716 24.46719408 24.23062843 23.80346201, , , , ,

25.20382342 23.72508178 23.35185260 23.99842442 24.55421301, , , , ,

24.06936962 23.50756715 24.22223306 24.28139128 24.47253728, , , , ,

24.50969275 25.31179898 24.30883191 24.39745116 24.34240361, , , , ,

24.44507802 24.28610049 24.04085590 25.13232101 24.66322075, , , , ,

25.09714835 25.12040542 24.69746294 24.51272238 23.75350627, , , , ,

25.60826660 24.19990788 25.02525917 24.41097845 24.17714648, , , , ,

24.63990563 24.74360918 23.45013063 24.52780462 24.47851759, , , , ,

24.27232784 23.27915406 25.15368420 24.38724182 23.47378351, , , , ,

24.23063511 24.06653251 24.43778592 24.04812858 25.20231330, , , , ,

23.34198654 23.30621749 24.58547842 24.40825270 23.90859335, , , , ,

25.63674860 24.48445061 24.56049376 23.33174552 24.26972911, , , , ,

23.65460645 24.16122685 24.74861908 24.58956375 24.41964871, , , ,

> evalf(stats[describe,mean]([data]),4);

24.30

> evalf(stats[describe, standarddeviation]([data]),4);

.5871

Page 6.118 of the notes shows one possible frequency distribution for the sample. This gives information about how often observed values fell into chosen classes. In Excel, I originally entered in the values in the “classes” column. Through performing a few steps, Excel automatically generates a frequency distribution. One needs to be VERY careful with interpreting what Excel gives. Below is another representation of it:

classesFrequenc

y22.6 0

>22.6 and 22.8 3

>22.8 and 23 11>23 and 23.2 23

>23.2 and 23.4 23

>23.4 and 23.6 53

>23.6 and 23.8 83

>23.8 and 24 110>24 and 24.2 113

>24.2 and 24.4 121

>24.4 and 24.6 111

>24.6 and 24.8 136

>24.8 and 25 93>25 and 25.2 55

>25.2 and 25.4 37

>25.4 and 25.6 16

>25.6 and 25.8 5

>25.8 and 26 4>26 and 26.2 1

>26.2 and 26.4 1

>26.4 and 1

classesFrequenc

y26.6

>26.6 and 26.8 0>26.8 0

Thus, 136 sampled values are greater than 24.6 and less than or equal to 24.8.

Why were these classes chosen? There are more than one set of classes which can be used. Here are some guidelines:

a) Find the minimum and maximum observed values. You can use the MIN() and MAX() functions in Excel to do this.

b) Choose classes which are of equal size.

c) Choose the classes between the minimum and maximum values which make sense relative to the data set. You may need to choose a few different ones until you think the frequency distribution represents the data well.

d) Note that 1, 2, or 3 classes do not work!

The frequency distribution is often plotted. This plot is called a histogram. Below is the histogram created by Excel.

22.6 23

23.4

23.8

24.2

24.6 25

25.4

25.8

26.2

26.6

More

0

20

40

60

80

100

120

140

160Histogram of 1,000 MPG observed values

x = MPG

Freq

uenc

y

Does the histogram have a similar shape to the normal PDF with =24.3 and =0.6? If so, a normal PDF approximation to the distribution of MPG would be appropriate.

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal PDF Example

m=24.3 & s=0.6

x (MPG)

f(x)

Below is an outline of the steps to find the frequency distribution and histogram for this example. General information about how to find a frequency distribution and histogram are available at http://www.statsclass.com/excel/graphs/histogram.html.

1)Find the minimum and maximum values.min max

22.71475 26.46672

min max=MIN(B10:B1009) =MAX(B10:B1009)

2)In an empty area in the spreadsheet, create a column of classes.

classes22.622.823

23.223.423.623.824

24.224.424.624.825

25.225.425.625.826

26.226.426.626.8

3)Select TOOLS > DATA ANALYSIS from the main Excel menu bar.

4)Select HISTOGRAM and OK from the DATA ANALYSIS window.

5)The HISTOGRAM window will then appear. In the window, do the following:a)Input the cell range of the 1,000 observed

values in the INPUT RANGE. b)Input the cell range of the classes into the

BIN RANGE.c)Select an OUTPUT RANGE for the

corresponding frequency distribution to start at. I usually specify the first cell to the right of my classes.

d)Select the CHART OUTPUT option to have a histogram created.

e)Select OK to have the frequency distribution and the histogram created!

Below is what my spreadsheet looks like immediately after OK is selected.

6)Edit the histogram so that it looks nicer:

22.6

22.8 23

23.2

23.4

23.6

23.8 24

24.2

24.4

24.6

24.8 25

25.2

25.4

25.6

25.8 26

26.2

26.4

26.6

26.8

More

0

20

40

60

80

100

120

140

160

Histogram of 1,000 MPG observed values

x = MPG

Freq

uenc

y

Chris Malone has created a spreadsheet called, data_summary.xls, which can be used when one wants to determine if a normal PDF approximation is appropriate. Below is the spreadsheet result when used with the 1,000 MPG observed values.

The curve drawn on the histogram is a normal PDF with mean 24.3211 and standard deviation of 0.5963. Thus, the sample mean and standard deviation are substituted in for the population mean and standard deviation. You are

not responsible for knowing how this plot was created, but you will need to be able to use the spreadsheet. There are also other summary measures displayed (box plot and dot plot) which may be discussed in future chapters.

From the results in data_summary.xls, does a normal PDF approximation for MPG seem appropriate? Explain.

Please see p. 18-19 of the book for more information about frequency distributions and histograms.

Validity of the normal PDF assumption

All of the probabilities found using the normal PDF ASSUME the normal PDF is the correct PDF for the random variable. What if this assumption is incorrect? The probabilities found using this assumption are WRONG!

Example: Grand Am (grand_am.xls)

Suppose X really has an uniform distribution with A=22.3 and B=26.3. The P(X<23) is baseheight = 0.70.25=0.175. With the normal assumption of =24.3 and =0.6, the probability was found to be 0.0151

20 21 22 23 24 25 26 27 28 29 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Grand Am Normal and Uniform PDF Example

Normal mean=24.3 s.d.=0.6

x (MPG)

f(x)

How does one know when the normal PDF assumption is valid?

Rarely, if ever, will it be 100% correct.

If a sample from the population is possible, construct a histogram of the observed values and check to see if it has the shape of a normal PDF. In addition, calculate the sample mean and variance to see if they are close to the population mean and variance (if they are known). If the histogram does have a similar shape to a normal PDF and the sample and population mean and variance are about the same (if the population values are known), then the normal PDF assumption is a reasonable approximation.

Suppose a histogram was constructed and the data did not appear to come from a normal or other known PDF. What can you do?

You can still use the normal PDF with the sample mean provided the sample size is large enough. The central limit theorem is used here in order to make a normal PDF

approximation. Chapter 8 talks about this in detail.

6.5: Normal Approximation to the Binomial

Skip!

Theorem 6.2: Note that if X is a binomial random variable with mean = E(X) = np and variance Var(X) = 2 = np(1-p), then the limiting form of the PDF for

as n, is the standard normal PDF. Another way this can be worded is X can be approximated by a normal random variable with mean np and variance np(1-p).

Thus as the number of trials increases, Z increasingly becomes more like a normal random variable.

This information will be used in Section 9.10.

6.6: Gamma and Exponential Distributions

We have already been using the Gamma and Exponential PDFs! These PDFs are often used in survival and reliability analysis. For example, these PDFs are used for modeling lifetimes of individuals or manufactured products.

Definition 6.2: The gamma function is defined by

for >0.

Notes: When is a positive integer, () = (-1)!;

for example, (3) = (3-1)! = 2! = 21 = 2 Through integrating by parts, one can

show () = (-1)(-1)

(1/2) = In Maple, this is represented by the

GAMMA() function where GAMMA needs to be in capital letters. For example,

> GAMMA(3);

2

Gamma PDF: The continuous random variable X has a gamma PDF, with parameters and , if its PDF is given by

where >0 and >0.

Notes: In most realistic applications, and will not be known and we will need to estimate them. How to do this will be discussed in future chapters.

controls the shape of the PDF since it mostly influences the “peakedness” of the PDF.

controls the scale of the PDF since most of its influence is for the spread of the PDF.

In Maple, this can be programmed in as

> assume(x>0);> assume(alpha>0);> assume(beta>0);

> about(x, alpha, beta);Originally x, renamed x~: is assumed to be: RealRange(Open(0),infinity)

Originally alpha, renamed alpha~: is assumed to be: RealRange(Open(0),infinity)

Originally beta, renamed beta~: is assumed to be: RealRange(Open(0),infinity)

> f(x):=1/(beta^alpha*GAMMA(alpha))* x^(alpha-1)*exp(-x/beta);

:= ( )f x~x~

( ) 1e

x~

( )

> simplify(int(f(x),x=0..infinity));

1

There are easier ways to use the gamma PDF in Maple that will be discussed later.

Below are a few comparative plots (gamma.xls). Notice the x- and y-axis scales are fixed for comparative purposes. Values of X could be greater than 24!

=1, =1, =1, 2=1 =1, =2, =2, 2=4

0 2 4 6 8 10 12 14 16 18 20 22 24

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gamma PDF

x

f(x)

0 2 4 6 8 10 12 14 16 18 20 22 24

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gamma PDF

x

f(x)

=1, =3, =3, 2=9 =2, =1, =2, 2=2

0 2 4 6 8 10 12 14 16 18 20 22 24

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gamma PDF

x

f(x)

0 2 4 6 8 10 12 14 16 18 20 22 24

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gamma PDF

x

f(x)

=4, =1, =4, 2=4 =4, =2, =8, 2=16

0 2 4 6 8 10 12 14 16 18 20 22 24

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gamma PDF

x

f(x)

0 2 4 6 8 10 12 14 16 18 20 22 24

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gamma PDF

x

f(x)

=2.5, =2.5, =6.25, 2=15.625

0 2 4 6 8 10 12 14 16 18 20 22 24

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Gamma PDF

x

f(x)

Questions:What happens if and/or are increased? What happens if and/or are decreased?

Why would someone want to use different values of and/or ?

Theorem 6.3: The mean and variance of the gamma PDF are: E(X) = = and Var(X) = 2 = 2.

pf:

Notice that is a gamma PDF with +1 and as its parameters! Thus,

= 1 and

.

A similar proof can be done for the variance.

Maple code,

> E(X):=simplify(int(x*f(x), x=0..infinity));

:= ( )E X

> Var(X):=simplify(int((x-E(X))^2*f(x), x=0..infinity));

:= ( )Var X 2

Examine what happens to the PDF as values of and 2 change the gamma PDF plots on the previous pages.

Example: Distribution of lifetimes (gamma_actuary.xls)

Let X be a random variable denoting the lifetime of a person in a particular population. An actuary uses the PDF for X below to model the lifetimes of all people in this population:

for x>0.

For this example, =15 and =2.

0 25 50 75 100 125 1500

0.005

0.01

0.015

0.02

0.025

0.03

Gamma PDF for actuary example

x

f(x)

In Maple, the plot is> plot(eval(f(x),[alpha=2,beta=15]), x=0..150, title="Gamma PDF, alpha=2, beta=15", labels=["x", "f(x)"]);

This particular PDF may not be realistic for what we would commonly perceive to be the distribution of lifetimes in the United States.

Questions:What are the mean and variance?

The mean and variance are = = 215 = 30 and 2 = 2152 = 450. Thus, one would expect to live 30 years on average for this population.

What is the probability a person in the population lives longer than 80 years?

The probability can be found from

P(X>80) = . Notice that integration by parts would be needed here. If the integration was done in Maple,

> P(X>80):=int(eval(f(x), [alpha=2,beta=15]), x=80..infinity);

:= ( )P 80 X193

e( )/-16 3

> evalf(P(X>80),4);.03059

Also, note that P(X>80) = 1 - P(X<80) = 1 - F(80). Thus, the CDF can be used to find the probability. The GAMMADIST(x, , , TRUE) function in Excel can simply be used here. Thus,

=1-GAMMADIST(80,2,15,TRUE)

results in a value of 0.0306.

Using the stats package in Maple,

> 1-stats[statevalf,cdf,gamma[2,15]](80);

.0305770166

What is the median lifetime?

The value c needs to found such that the probability of living less than c years is 0.5. Then we could use

and solve for c. If the integration and solving was done in Maple,

> solve(int(eval(f(x),[alpha=2, beta=15]), x=0..c) = 0.5, c);

,-11.52058571 25.17520485

Of course, the positive value for c would be the answer. The GAMMAINV(prob., , ) function can be used in Excel to find c. Thus,

=GAMMAINV(0.5,2,15)

results in c = 25.18.

Using the stats package in Maple,

> stats[statevalf,icdf, gamma[2,15] ](0.5);

25.17520485

There are a few important special cases of the gamma PDF. One of them is the exponential PDF.

Exponential PDF: The continuous random variable X has an exponential PDF, with parameter , if its PDF is given by

where >0.

Notes:This is the gamma PDF with =1. In most realistic applications, will not be known and it will need to be estimated. How to do this will be discussed in future chapters.

controls the scale of the PDF since most of its influence is for the spread of the PDF. In general, this is what a plot of the PDF looks like.

The height of the curve at a point xo is

. Notice that when xo=0,

since e0=1.

Theorem 6.3: The mean and variance for the exponential PDF are: E(X) = = and Var(X) = 2 = 2.

pf: See the Chapter 4 examples with tire tread wear. Substitute in for 30. Also, see the proof used with the gamma PDF earlier.

Example: Tire life (tire_wear.xls from Chapter 3)

The number of miles an automobile tire lasts before it reaches a critical point in tread wear can be represented by a PDF. Let X = the number of miles (in thousands) an automobile is driven before it reaches the critical tread wear point for one tire. Suppose the PDF for X is

In Chapter 3 and 4, we used =30. Remember that we found in Chapter 4 that E(X) = = 30 and Var(X) = 2 = 302!

In the spreadsheet, different values of can be entered into the cell to see how it affects the PDF. Below is a screen capture of the spreadsheet.

Note that the line on the plot should extend past x=225.

Questions:What happens if is increased? Explain why has this effect relative to it being called a “scale” parameter, E(X)=, and Var(X) = 2.

What happens if is decreased? Explain why has this effect relative to it being called a “scale” parameter, E(X)=, and Var(X) = 2.

Why would someone want to use different values of ?

Find the probability that a random selected tire will last (will not get to the critical tread wear point) longer than 30,000 miles. In Chapter 3, we found the probability through integration:

Using the relationship between the gamma and exponential function, we can use the GAMMADIST() function:

=1-GAMMADIST(30,1,30,TRUE)

Notes: Remember that if FALSE is used instead of TRUE in the function, then f(x) is given as a result (the height of the curve).

Excel also has a function specifically for the Exponential PDF: EXPONDIST(x,1/beta,TRUE) which finds F(x). Please note that 1/beta corresponds to what Excel defines as . Thus, Excel uses a PDF of

To find P(X>30), note that P(X>30) = 1 – P(X<30) = 1 – F(30). In Excel,

=1-EXPONDIST(30,1/30,TRUE)

To avoid confusion with = 1/, I recommend using the GAMMADIST() function instead.

Find the tire wear number of miles such that less than 0.95 of the total number of tires will reach the critical point. In Chapter 3, we found the value of c as a solution to

The value of c was 30ln(20) 89.87. We can use the relationship between the gamma and exponential PDFs to find the same answer with

=GAMMAINV(0.95,1,30)

Example: Exponential distribution with =10/3 (exp.xls)

0 2 4 6 8 10 12 14 16 180

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Exponential PDF

x

f(x)

To find the probability P(2<X<4), find the area underneath that part of the plot. Note that P(2<X<4) = P(X<4) – P(X<2) = F(4) = F(2) since

-

=The Excel functions to find the probability are

GAMMADIST(4,1,10/3,TRUE) – GAMMADIST(2,1,10/3,TRUE)

and the answer is 0.2476.

Final notes:o Go back to Chapter 3 and examine

example_sample_tire.xls. Notice how choosing = 30 results in a very good fit of the PDF to the sampled values displayed in the histogram!

o If you are an engineering major, I recommend examining the Weibull PDF in Section 6.10.

o The chi-square PDF in Section 6.8 is an often used PDF which we will discuss later in the course.

7. Inference Based on a Single Sample: Estimation with Confidence Intervals

In this chapter, we’ll put all the preceding material into practice; that is, we’ll estimate population means and proportions based on a single sample selected from the population of interest.

7.1 Large-Sample Confidence Interval for a Population Mean

According to the Central Limit Theorem, the sampling distribution of the sample mean is approximatley normal for large samples. Let us calculate the interval

x±2σ x= x±2σ

√nThat is, we form an interval 4 standard deviations wide – from 2 standard deviation below the sample mean to 2 standard deviations above the mean.

Definition 7.1

An interval estimater (or confidence interval ) is a formula that tells us how to use sample data to calculate an interval that estimates a population parameter.

Definition 7.2

The confidence coefficient is the probability that an interval estimator encloses the population parameter – that is, the relative frequency with which the interval estimator encloses the population parameter when the estimator is used repeatedly a very large number of times. The confidence level is the confidence coefficient expressed as a percentage.

A confidence interval provides an estimate of an unknown parameter of a population or process along with an indication of how accurate this estimate is and how confident we are that the interval is correct. Confidence intervals have two parts. One is an interval computed from our data. This interval typically has the form

estimate ±margin of error

Figure Twenty-five samples from the population gave these 95% confidence intervals. In the long run, 95% of all samples give an interval that covers μ .

Large-Sample Confidence interval for a population mean

The precise formula for calculating a confidence interval for μ is:

x± zα /2

σ

√n

where x is the sample average, σ is the standard deviation of the population measurements, and n is the sample size and the zα /2 is a value from the standard normal table(Table IV).

Note: When σ is unknown (as is almost always the case) and n is large, (say n≥30 ) the confidence interval is approximately equal to

x± zα /2

s

√n

where s is the sample standard dviation.

Assumptions: None, since the Central Limit Theorem guarantees that the sampling distribution of x is approximately normal.

For example if we want a 95 percent confidence

interval for μ , we use zα /2 =1.96.

Why is this the correct value?

Well the correct value of z is found by trying to capture probability .95 between two symmetric boundaries around zero in the standard normal curve. This means there is .025 in each tail and looking up the correct upper boundary with .475 to the left gives 1.96 as the correct value of z from table IV. Verify that a 90 percent

confidence interval will use zα /2 =1.645, and a 99 percent confidence inteval will use 2.576.

Here are the most important entries from that part of the table:

zα /2

1.645 1.96 2.576

100(1-α )% 90% 95% 99%So there is probability C that x lies between

μ−z α /2σ

√n and μ+zα /2

σ

√n

Figure The area between - z* and z* under the standard normal curve is C.

Interpretation of a Confidence Interval for a P opulation M ean

When we form a 100(1-α )% confidence interval for μ , we usually express our confidence in the interval with a statement such as. “We can be 100(1-α )% confidence that μ lies between the lower and upper bounds of the confidence interval.”

7.2 Small-Sample Confidence Interval for a Population Mean

In Chapter 6, we considered the (unrealistic) situation in which we knew the population standard deviation σ . In this section, we consider the more realistic case where σ is not known and we must estimate σ from our SRS by the sample standard deviation s . In Chapter 6 we used the one-sample z statistic

z= x−μσ /√n

which has the N(0,1) distribution.

Replacing σ by s , we now use the one sample t statistic

t= x−μs /√n

which has the t distribution with n-1 degrees of freedom.When σ is not known, we estimate it with the sample standard deviation s , and then we

estimate the standard deviation of x by s/√n .

Standard Error

When the standard deviation of a statistic is estimated from the data, the result is called the standard error of the statistic. The standard error of the sample mean is

SE x=s

√n

The t DistributionsSuppose that an SRS of size n is drawn from an N(μ ,σ ) population. Then the one-sample t statistic

t= x−μs /√n

has the t distribution with n-1 degrees of freedom.

Degrees of freedom

There is a different t distribution for each sample size. A particular t distribution is specified by giving the degrees of freedom. The degrees of freedom for this t statistic come from the sample standard deviation s in the denominator of t.

History of Statistics

The t distribution were discovered in 1908 by William S. Gosset. Gosset was a statistician employed by the Guinness brewing company, which required that he not publish his discoveries under his own name. He therefore wrote under the pen name “Student.” The t distribution is called “Student’s t” in his honor.

Figure. Density Curve for the standard normal and t(5) distributions. Both are symmetric with center 0. The t distributions have more probability in the tails than does the standard normal distribution due to the extra variability caused by substituting the random variable s for the fixed parameter σ .

We use t(k) to stand for the t distribution with k degrees of freedom.

The One-Sample t Confidence IntervalSuppose that an SRS of size n is drawn from a population having unknown mean μ . A level C confidence interval for μ is

x±tα /2s

√n

where tα /2 is the value for the t(n-1) density

curve with area C between –tα /2 and tα /2 . This interval is exact when the population distribution is normal and is approximately correct for large n in other cases.

So the Margin of error for the population mean when we use the data to estimate σ is

tα /2s

√n

Example In fiscal year 1996, the U.S. Agency for international Development provided 238,300 metric tons of corn soy blend (CSB) for development programs and emergency relief in countries throughout the world. CSB is a high nutritious, low-cost fortified food that is partially precooked and can be incorporated into different food preparations by the recipients. As part of a study to evaluate appropriate vitamin C levels in this commodity, measurements were taken on samples of CSB produced in a factory.The following data are the amounts of vitamin C, measured in miligrams per 100 grams of blend (dry basis), for a random sample of size 8 from a production run:

2.1 26, 31, 23, 22, 11, 22, 14, 31We want to find 95% confidence interval for μ , the mean vitamin C content of the CSB produced during this run. This sample mean is x=22 .50 and the standard deviation is s=7 .19 with degrees of freedom n-1=7. The standard error is

SE x=s

√n=7 .19

√8=2 .54

From Table VI we find t*=2.365. The 95% confidence interval is

x±t0 .05 /2

s

√n=22 .50±2 . 365

7 .19

√8 =22 . 50±2 . 365×(2. 54 ) =22 . 5±6 . 0=(16 .5 ,28 .5) .

We are 95% confident that the mean vitamin C content of the CSB for this run is between 16.5 and 28.5 mg/100 g.

7.3 Large-Sample Confidence Interval for a Population Proportion

Sample Distribution of p

The mean of the sampling distribution of p

is p ; that is, p is an unbiased estimator of p .The standard deviation of the sampling

distribution of p is √ pq /n , where q=1−p .For large samples, the sampling distribution

of p is approximately normal. A sample size

is considered large if the interval p±3 σ p

does not include 0 or 1

Large-Sample Confidence Interval for p

p± zα /2σ p = p±

zα /2√ pqn

¿ p± zα /2√ p q

n

where p= x

n and q=1− p

Note: When n is large, p can approximate the

value of p in the formula for σ p .

Adjusted (1-α )100% Confidence Interval for a Population Proportion, p

p¿± zα /2√ p¿ (1−p¿ )

n+4

where p¿= x+2

n+4 is the adjusted sample proportion of observations with the characteristic of interest, x is the number of successes in the sample, and n is the sample size.

7.4 Determining The Sample Size

Sample Size Determination for (1-α )100% Confidence Intervals for μ

In order to estimate μ to within a bound B with (1-α )100% confidence, the required sample size is found as follows:

zα /2( σ

√n )=B

The solution can be written in terms of B as follows:

n=( zα /2)

2 σ2

B2

The value of σ is usually unknown. It can be estimated by the standard deviation s from a prior sample. Alternatively, we may approximate the range R of observations in the population, and (conservatively) estimate

σ≈ R4 .

Sample Size Determination for (1-α )100% Confidence Intervals for p

In order to estimate a binomial probability p to within a bound B with (1-α )100% confidence, the required sample size is found by solving the following equation for n:

zα /2√ pqn

=B

The solution can be written in terms of B as follows:

n=( zα /2)

2 pq

B2

Since the value of the product pq is unknown, it can be estimated by using the sample fraction of

successes, p , form a prior sample.

Remember (Table 7.5) that the value of pq is at its maximum when p equals 0.5, so that you can obtain conservatively large values of n by approximating p by 0.5 or values close to 0.5. In any case, you should round the value of n obtained upward to ensure that the sample size will be sufficient to achieve the specified reliability.

8. Inference Based on a Single Sample: Tests of Hypothesis

We’ll see how to utilize sample information to test what the value of a population parameter may be. This type of inference is called a test of hypothesis. We’ll also see how to conduct a test of hypothesis about a population mean and a population proportion.

8.1- 8-3 Large-Sample Test of Hypothesis about a Population Mean

A test of significance consists of four steps:

1.Specify the null and alternative hypotheses. 2.Calculate the test statistic. 3.Calculate the P-value. 4.Give a complete conclusion.

Null Hypothesis

The statement being tested in a test of significance is called the null hypothesis. The test of significance is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of “no effect” or “no difference”.

We abbreviate “null hypothesis” as H0 and

“alternative hypothesis” as Ha . These are

statements about a parameter in the population, or beliefs about the truth. The alternative hypothesis is usually what the investigator wishes to establish or prove. The null hypothesis is just the logical opposite of the alternative.

Example Suppose we work for a consumer testing group that is to evaluate a new cigarette that the manufacturer claims has low tar (average less than 5mg per cig).

From our perspective,

the alternative hypothesis is Ha : μ>5 because we will only be concerned or care if the average is too high or not consistent with what the tobacco company claims.

The null hypothesis is then H0 : μ≤5 , or the opposite of the alternative. These are both statements about the true average tar content of the cigarettes, a parameter.

Three possible cases for hypotheses:

Case 1 Case 2 Case 3H0 : μ=μ0 H0 : μ≤μ0 H0 : μ≥μ0

Ha : μ>μ0 Ha : μ<μ0

The symbol μ0 stands for the value of mu that is assumed under the null hypothesis.

Test statistics

In the second step we summarize the experimental evidence into a summary statistic.

From Example, suppose there were n=36 cigarettes tested and they had x=5 .5 mg and σ=1 .2 . We summarize this information with a z-statistic. The test statistic for this problem is:

Z=x−μ0

σ /√n= 5 . 5−5

1. 2/√36=2 .5

What does z-value mean?

Well, it is usually easier to discuss such things in terms of probabilities. The test statistic is used to compute a P-value which is the probability of getting a test statistic at least as extreme as the z-value observed, where the probability is

0: aH

computed when the null hypothesis is true. This is what the third step in the process is about.

P-values

Null Hypothesis

The probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed is called the P-value of the test. The smaller the P-value, the stronger the evidence

against H0 provided by the data.

The second definition is easier to understand. The P-value is the tail area associated with the calculated test statistic value in the distribution we know it has if the null hypothesis is a true. From both of these statements you can see that the P-value is a probability.

From our tobacco example the P-value is the probability of observing a value of z more extreme than 2.5. What does more extreme mean here? It is specified by the direction of the alternative hypothesis, in our problem it is greater than. This means that the P-value we want is

P(Z ¿ 2.5) = 1-P(Z¿2.5) = 1-.9938 =.0062.

Now it is time for the fourth step in a test: the conclusion. We can compare the P-value we calculated with a fixed value that we regard as decisive. The decisive value of P is called the significance level. It is denoted by α , the Greek letter alpha.

α= P(Type I error)

= P( Rejecting the null hypothesis when

in fact the null hypothesis is true)

Statistical Significance

If the P-value is as small or smaller than α , we say that the data are statistically significant at level α or we say that “Reject Null hypothesis (H0 ) at level α ”.

If we choose α =0.05,from our tobacco example, the P-value is .0062.

Since P-value = .0062 is less than α =0.05, we

say that “Reject Null hypothesis (H0 : μ≤5 ) at level α =0.05”

Note

We usually choose α =0.05 or α =0.01. But if we choose α =0.01 then we are insisting on

stronger evidence against H0 compared to the case of α =0.05. In our course, I will ask a statistical significance when α =0.05.

A test of significance is a recipe for assessing the significance of the evidence provided by data against a null hypothesis. The four steps common to all tests of significance are as follows:

1. State the null hypothesis H0 and the

alternative hypothesisHa . The test is designed to assess the strength of the evidence against H0 ; Ha is the statement that we will accept if

the evidence enables us to reject H0 .

2. Calculate the value of the test statistic on which the test will be based. This statistic

usually measures how far the data are from H0 .

3. Find the P-value for the observed data. This is

the probability, calculated assuming that H0 is

true, that the test statistic will weigh against H0 at least as strongly as it does for these data.

4. State a conclusion. One way to do this is to choose a significance level α , how much

evidence against H0 you regard as decisive. If the P-value is less than or equal to α , you conclude that the alternative hypothesis is sufficient evidence to reject the null hypothesis.

Here is the conclusion for our example problem.

We have evidence for the alternative

hypothesis that μ

the average tar content is actually above 5 mg per cig. This contradicts the company claim that this is a low-tar cig. Let's dial lawyers and start the complaining process with the tobacco industry.

Z Test for a Population Mean

To test the hypothesis H0 : μ=μ0 based on an SRS of size n from a population with unknown mean and known standard deviation σ , compute the test statistics

Z=x−μ0

σ /√n

In terms of a standard normal random variable

Z, the P-value for a test of H0

against

Ha : μ>μ0 is P(

Z≥z)

Ha : μ<μ0 is P(

Z≤z)

Ha : μ≠μ0 is 2P(

Z≥|z|)

These P-values are exact if the population distribution is normal.

8.4 Small-Sample Test of Hypothesis about a Population Mean

The One-Sample t Test

Suppose that an SRS of size n is drawn from a population having unknown mean μ . To test the

hypothesis H0 : μ=μ0 based on an SRS of size n, compute the one-sample t statistic

t=x−μ0

s/√n

In terms of a random variable T having t(n-1)

distribution, the P-value for a test of H0 against

Ha : μ>μ0 is P(T≥t )

Ha : μ<μ0 is P(T≤t )

Ha : μ≠μ0 is 2P(T≥|t|)

These P-values are exact if the population distribution is normal and are approximately correct for large n in other cases.

Example The specifications for the CBS described in the previous Example state that the mixture should contain 2 pounds of vitamin permix for every 2000 pounds of product. These specifications are designed to produce a mean (μ ) vitamin C content in the final product of 40mg/100g. We can test a null hypothesis that the mean vitamin C content of the production run conforms to these specifications. Specifically, we test

H0 : μ=40Ha : μ≠40

Recall that n=8 , x=22 .50 , and s=7 .19 . The t test statistics is

t=x−μ0

s/√n=22 . 5−40

7 . 2/√8=−6 .88

Because the degrees of freedom are n-1=7, this t statistic has the t(7) distribution.

Figure. The P-value for Example.Figure shows that the P-value is 2P(T≥6 .88 ), where T has the t(7) distribution. From Table, we see that P(T≥5 .408 )=0.0005. Therefore, we conclude that the P-value is less than 2×0. 0005

. Since P-value is smaller than α =0.05, we can

reject H0 and conclude that the vitamin C content for this run is below the specifications.

Example For the vitamin C problem described in the previous example, we want to test whether or not vitamin C is lost or destroyed by the production process. Here we test

H0 : μ=40Ha : μ<40

The t test statistic does not change: t=−6 . 88 .

Figure. The P-value for Example.As Figure illustrates, however, the P-value is now P(T≤−6 . 88 ). From Table, we can determine that P≤0.0005 . We conclude that the production process has lost or destroyed some of the vitamin C.

8.5 Large-Sample Test of Hypothesis about a Population Proportion

In this section we consider inference about a population proportion p from an SRS of size n

based on the sample proportion p=X /n where X is the number of successes in the sample.

Large-Sample Significance test for a Population Proportion

Draw an SRS of size n from a large population with unknown proportion p of successes.

To test the hypothesis H0 : p=p0 , compute the z-statistic

z=p−p0

√ p0 (1−p0 )n

In terms of a standard normal random variable

Z, the appropriate P-value for a test of H0 against

Ha : p> p0 is P(Z≥z )

Ha : p< p0 is P(Z≤z )

Ha : p≠p0 is 2P(Z≥|z|)

Example The French naturalist Count Buffon once tossed a coin 4040 times and obtained 2048 heads. This is a binomial experiment with n=4040. The sample proportion is

p=20484040

=0 .5069

If Buffon’s coin was balanced, then the probability of obtaining heads on any toss is 0.5. To assess whether the data provide evidence that the coin was not balanced, we test

H0 : p=0 .5

Ha : p≠0 .5

The test statistic is

z= p−0 .5

√ 0 . 5(1−0 . 5)4040

= 0 . 5069−0 . 5

√ 0 .5 (1−0 .5 )4040

=0. 88

Figure illustrates the calculation of the P-value. From Table IV we find

P( Z≤0 . 88)=0.8106 .

The probability in each tail is 1-0.8106 = 0.1894, and the P-value is P=2×0 .1894=0.38 . Since P-value is larger then α=0 . 05 , we do not reject H0 : p=0 .5 at the level α=0 . 05 .

Figure The P-value for Example 8.2.

Example A coin was tossed n=4040 times and we observed X=1992 tails. We want to test the null hypothesis that the coin is fair- that is, that the probability of tails is 0.5. So p is the probability that the coin comes up tails and we test

H0 : p=0 .5

Ha : p≠0 .5

The test statistic is

z= p−0 .5

√ 0 . 5(1−0 . 5)4040

= 0 . 4931−0 . 5

√ 0 .5 (1−0 .5 )4040

=−0. 88

Using Table IV, we find that

P=2×0 .1894=0.38 .

Since P-value is larger then α=0 . 05 , we do not

reject H0 : p=0 .5 at the level α=0 . 05 .

9. Inferences Based on Two Samples

Now that we’ve learned to make inferences about a single population, we’ll learn how to compare two populations.

For example, we may wish to compare the mean gas mileages for two models of automobiles, or the mean reaction times of men and women to a visual stimulus.

In this chapter we’ll see how to decide whether differences exist and how to estimate the differences between population means and proportions.

9.1 Comparing two population means: Independent Sampling

One of the most commonly used significance tests is the comparison of two population means μ1 and μ2 .

Two-sample Problems

The goal of inference is to compare the responses in two groups.Each group is considered to be a sample from a distinct population. The responses in each group are independent of those in the other group.

A two sample problem can arise from a randomized comparative experiment that randomly divides the subjects into two groups and exposes each group to a different treatment. The two samples may be of different sizes.

Two-Sample z Statistic

Suppose that x1 is the mean of an SRS of size n1 drawn from N( μ1 , σ 1 ) population and that x2

is the mean of an SRS of size n2 drawn from N( μ2 , σ 2 ) population. Then the two-sample z statistic

z=( x1− x2 )−( μ1−μ2)

√ σ12

n1

+σ 2

2

n2

has the standard normal N(0,1) sampling distribution.

Large Sample Confidence Interval for μ1−μ2

( x1− x2)±zα /2√ σ 12

n1

+σ 2

2

n2

Assumptions: The two samples are randomly selected in an independent manner from the two

populations. The sample sizes n1 and n2 are large enough.

Example for C.I. of μ1−μ2

Example for Test of Significance

In the unlikely event that both population standard deviations are known, the two-sample z statistic is the basis for inference about μ1−μ2 . Exact z procedures are seldom used

because σ 1and σ 2 are rarely known.

The two-sample t procedures

Suppose that the population standard deviations σ 1 and σ 2 are not known. We estimate them by

the sample standard deviations s1 and s2 from our two samples.

The Pooled two-sample t procedures

The pooled two-sample t procedures are used when we can safely assume that the two populations have equal variances. The modifications in the procedure are the use of the pooled estimator of the common unknown variance

sp2=

(n1−1 )s12+(n2−1 )s2

2

n1+n2−2 .

This is called the pooled estimator of σ2.

When both populations have variance σ2, the

addition rule for variances says that x1− x2 has variance equal to the sum of the individual variances, which is

σ2

n1

+ σ2

n2

=σ2 ( 1n1

+ 1n2 )

The standardized difference of means in this equal-variance case is

z=( x1− x2 )−( μ1−μ2 )

σ √ 1n1

+ 1n2

This is a special two-sample z statistic for the case in which the populations have the same σ .

Replacing the unknown σ by the estimates sp gives a t statistic. The degrees of freedom are n1+n2−2 .

The Pooled Two-Sample t Procedures

Suppose that an SRS of size n1 is drawn from a

normal population with unknown mean μ1 and

that an independent SRS of size n2 is drawn from another normal population with unknown

mean μ2 . Suppose also that the two populations have the same standard deviation. A level C

confidence interval μ1−μ2 given by

( x1− x2 )±t∗s p√ 1n1

+ 1n2

Here t* is the value for t (n1+n2−2 ) density curve with area C between –t* and t*.

To test the hypothesis H0 : μ1=μ2 ,compute the pooled two-sample t statistic

t=x1− x2

s p√ 1n1

+1n2

In terms of a random variable T having the t(n1+n2−2) distribution, the P-value for a test of H0 against

Ha : μ>μ0 is P(T≥t )

Ha : μ<μ0 is P(T≤t )

Ha : μ≠μ0 is 2P(T≥|t|)

Example Take Group 1 to be the calcium group and Group 2 to be the placebo group. The evidence that calcium lowers blood pressure more than a placebo is assessed by testing

H0 : μ1≤μ2

Ha : μ1>μ2

Here are the summary statistics for the decrease in blood pressure:

Group Treatment  n x   s  1 Calcium 10 5.000 8.743

2 Placebo 11 -0.273 5.901

The calcium group shows a drop in blood pressure, and the placebo group has a small increase. The sample standard deviations do not rule out equal population standard deviations. A difference this large will often arise by chance in samples this small. We are willing to assume equal population standard deviations. The pooled sample variance is

sp

2=(n1−1 )s1

2+(n2−1 )s22

n1+n2−2

=

(10−1)(8 . 743)2+(11−1)(5 . 901)2

10+11−2

=54 .536 . So that sp=√54 . 536=7 .385

The pooled two-sample t statistic is

t=x1− x2

s p√ 1n1

+1n2

=5 . 000−(−0 . 273 )

7 .385√ 110

+ 111

=5 .2733 .227

=1. 634

The P-value is P(T≥1 .634 ) , where T has t(19) distribution. From Table, we can see that P lies between 0.05 and 0.10. The experiment found no evidence that calcium reduces blood pressure (t=1.634, df=19, 0.05<P<0.10).

Example We estimate that the effect of calcium supplementation is the difference between the sample means of the calcium and the placebo

groups, x1− x2=5 .273 mm. A 90% confidence

interval for μ1−μ2 uses the critical value t*=1.729 from the t(19) distribution. The interval is

( x1− x2 )±t∗s p√ 1

n1

+ 1n2

=(5 .000−(−0 . 273))±(1 . 729)(7 . 385)√ 1

10+ 1

11

= 5.273±5.579

We are 90% confident that the difference in means is in the interval (-0.306, 10.852). The calcium treatment reduced blood pressure by about 5.3mm more than a placebo on the average, but the margin of error for this estimate is 5.6mm.

Approximate Small-Sample Procedures when both populations have different variance (

σ 12≠σ2

2)

Suppose that the population standard

deviations σ 1 and σ 2 are not known. We estimate them by the sample standard

deviations s1 and s2 from our two samples.

Equal Sample Sizes (n1=n2=n )

The confidence interval for μ1−μ2 is given by

( x1− x2 )±tα /2√ s12+s2

2

n

To test the hypothesis H0 : μ1=μ2 , compute the two-sample t statistic

t=x1− x2

√ s12+s2

2

n

where t is based on df v=n1+n2−2=2(n−1).

Unequal Sample Sizes (n1≠n2)

The confidence interval for μ1−μ2 is given by

( x1− x2 )±tα /2√ s12

n1

+s2

2

n2

To test the hypothesis H0 : μ1=μ2 , compute the two-sample t statistic

t=x1− x2

√ s12

n1

+s2

2

n2

where t is based on degree of freedom

v=( s1

2

n1

+s2

2

n2)2

( s12

n1)2

n1−1+

( s22

n2)2

n2−2 .

Note: The value of v will generally not be an integer. Round v down to the nearest integer to use the t table.

The Two-Sample t Significance test

Suppose that an SRS of size n1 is drawn from a

normal population with unknown mean μ1 and

that an independent SRS of size n2 is drawn from another normal population with unknown

mean μ2 . To test the hypothesis H0 : μ1=μ2 , compute the two-sample t statistic

t=x1− x2

√ s12

n1

+s2

2

n2

and use P-values or critical values for the t(k) distribution, where the degrees of freedom k are

the smaller n1−1 and n2−1 .

Example An educator believes that new directed reading activities in the classroom will help elementary school pupils improve some aspects of their reading ability. She arranges for a third-grade class of 21 students to take part in these activities for an eight-week period. A control classroom of 23 third-graders follows the same curriculum without the activities. At the end of the eight weeks, all students are given a Degree of Reading Power (DRP) test, which measures the aspects of reading ability that the treatment is designed to improve. The summary statistics using Excel are

 Treatment

Group Control Group     Mean 51.47619048 41.52173913Standard Error 2.402002188 3.575758061Median 53 42Mode 43 42Standard Deviation 11.00735685 17.14873323Sample Variance 121.1619048 294.0790514Kurtosis 0.803583546 0.614269919Skewness -0.626692173 0.309280608Range 47 75Minimum 24 10Maximum 71 85Sum 1081 955Count 21 23

Because we hope to show that the treatment (Group 1) is better than the control (Group 2), the hypotheses are

H0 : μ1=μ2 vs. Ha : μ1>μ2

The two-sample t statistic is

t=x1− x2

√ s12

n1

+s2

2

n2

=51 .48−41.52

√11.012

21+17 .152

23

=2 .31

The P-value for the one-sided test is P(T≥2 .31 ). The degree of freedom k is equal

to the smaller of n1−1=21−1=20 and n2−1=23−1=22 . Comparing 2.31 with entries in Table for 20 degrees of freedom, we see that P lies between 0.02 and 0.01. The data strongly suggest that directed reading activity improves the DRP score (t=2.31, df=20, 0.01<P<0.02).

Example We will find a 95% confidence interval for the mean improvement in the entire population of third-graders. The interval is

( x1− x2 )±t∗√ s12

n1

+s2

2

n2

=(51. 48−41 .52)±t∗√11. 012

21+17 . 152

23

=9.96±4 .31×t∗¿ ¿

From Example, we have the t(20) distribution.

Table D gives t∗¿ t0. 025 , 20=2. 086 . With this approximation we have

9 .96±4 .31×t∗¿ 9. 96±4 .31×2. 086

=9.96±8 .99=(1. 0 , 18 . 9)

We can see that zero is outside of the interval

(1.0, 18.9). We can say that “μ1−μ2 is not equal to zero”.

9.2 Comparing two population means: Paired Difference Experiments

Matched Pairs t procedures

One application of the one-sample t procedure is to the analysis of data from matched pairs studies. We compute the differences between the two values of a matched pair (often before and after measurements on the same unit) to produce a single sample value. The sample mean and standard deviation of these differences are computed.

Paired Difference Confidence Interval for μD=μ1−μ2

Large Sample

xD±zα /2

σ D

√nD

≈ xD± zα /2

sD

√nD

Assumption: The sample differences are randomly selected from the population of differences.

Small Sample

xD±tα /2

s D

√nD

where tα /2 is based on (nD−1 ) degrees of

freedom.

Assumptions:1. The relative frequency distribution of the

population of differences is normal.

2. The sample differences are randomly selected from the population of differences.

Paired Difference Test of Hypothesis for μD=μ1−μ2

One-Tailed Test

H0 : μD=D0 H0 : μD=D0

Ha : μD<D0 or Ha : μD>D0

Two-Tailed Test

H0 : μD=D0 Ha : μD≠D0

Large Sample

Test statistics

z=xD−D0

σ D

√nD

=xD−D0

s D

√nD

Assumption: The sample differences are randomly selected from the population of differences.Small Sample

Test statistics

t=xD−D0

sD

√nD

where tα /2 is based on (nD−1 ) degrees of

freedom.

Assumptions:1.The relative frequency distribution of the population of differences is normal.2.The sample differences are randomly selected from the population of differences.

Example To analyze these data, we first substract the pretest score from the posttest score to obtain the improvement for each student. These 20 differences form a single sample. They appear in the “Gain” columns in Table 7.1. The first teacher, for example, improved from 32 to 34, so the gain is 34-32=2.To assess whether the institute significantly improved the teachers’ comprehension of spoken French, we test

H0 : μD=0Ha : μD>0

Here μ is the mean improvement that would be achieved if the entire population of French teachers attended a summer institute. The null

hypothesis says that no improvement occurs,

and Ha says that posttest scores are higher on the average. The 20 differences havexD=2 .5 and sD=2 . 893

The one-sample t statistic is

t=xD−D0

sD /√nD

= 2. 5−02 .893 /√20

=3. 86

The P-value is found from the t(19) distribution (n-1=20-1=19). Table shows that 3.86 lies between the upper 0.001 and 0.0005 critical values of the t(19) distribution. The P-value lies between 0.0005 and 0.001. “The improvement in score was significant (t=3.86, df=19, p=0.00053).”

Example A 90% confidence interval for the mean improvement in the entire population

requires the critical value tα /2=1.729

from Table. The confidence interval is

xD±tα /2

s D

√nD

=2 .5±1 .7292.893√20

=2 .5±1 .12=(1 .38 , 3 .62)

The estimated average improvement is 2.5 points, with margin of error 1.12 for 90% confidence. Though statistically significant, the effect of the institute was rather small.

9.3 Comparing two population proportions: Independent Sampling

Suppose a presidential candidate wants to compare the preference of registered voters in the northeastern United States (NE) to those in the southeastern United States (SE). Such a comparison would help determine where to concentrate campaign efforts.

Properties of the Sampling Distribution of ( p1− p2 )

1.The mean of the sampling distribution of ( p1− p2 ) is ( p1−p2 ) that is,

E( p1− p2 ) = ( p1−p2 )

which means that ( p1− p2 ) is an unbiased

estimator of ( p1−p2 ).2.The standard deviation of the sampling

distribution of ( p1− p2 ) is

σ ( p1− p2)=√ p1(1−p1 )

n1

+p1(1−p1)

n2

3. If the sample sizes n1 and n2 are large, the

sampling distribution of ( p1− p2 ) is approximately normal.

Large-Sample Confidence Interval for ( p1−p2 )

( p1− p2 )±zα /2√ p1 (1−p1 )

n1

+p1(1−p1 )

n2

¿( p1− p2 )±zα /2√ p1 (1− p1 )

n1

+p1(1− p1 )

n2

Assumption: The two samples are independent random samples. Both samples should be large

enough that the normal distribution provides an adequate approximation to the sampling

distribution of p1 and p2 .

Large-Sample Test of Hypothesis about ( p1−p2 )

One-Tailed Test

H0 :( p1−p2)=0 H0 :( p1−p2)=0Ha :( p1−p2 )<0 or Ha :( p1−p2 )>0

Two-Tailed Test

H0 :( p1−p2)=0 Ha :( p1−p2 )≠0

Large Sample

Test statistics

z=( p1− p2 )σ ( p1− p2)

Note:

σ ( p1− p2)

=√ p1(1−p1 )n1

+p1(1−p1)

n2

¿√ p (1− p )( 1n1

+ 1n2

)

where p=

x1+x2

n1+n2

Assumption: Same as for large-sample

confidence interval for ( p1−p2 ).

9.4 Determining the Sample Size

Determination of Sample Size for Estimating ( μ1−μ2 )

To estimate ( μ1−μ2 ) to within a given bound

B with probability (1−α ) , use the following

formula to solve for equal sample sizes that will achieve the desired reliability:

n1=n2=( zα /2 )

2( σ12+σ 2

2 )

B2

You will need to substitute estimates for the

values of σ 12

and σ 22

before solving for the sample size. These estimates might be sample

variances s12

and s22

from prior sampling, or from an educated guess based on the range-

that is, s≈R/4 .

Determination of Sample Size for Estimating ( p1−p2 )

To estimate ( p1−p2 ) to within a given bound

B with probability (1−α ) , use the following formula to solve for equal sample sizes that will achieve the desired reliability:

n1=n2=( zα /2 )

2( p1 q1+ p2 q2 )

B2

You will need to substitute estimates for the

values of p1 and p2 before solving for the sample size. These estimates might be based on prior samples, obtained from educated guesses or, most conservatively, specified as p1=p2=0 .5 .

10. Analysis of Variance

In this chapter we extend the methodology of chapters 7-9 in two important ways. First, we discuss the critical elements in the design of a sampling experiment. Then we see how to analyze the experiment in order to compare more than two populations. We’ll look at several of the more popular experimental designs.

Sampling selects a part of population of interest to represent the whole.

Done properly, sampling can yield reliable information about a population.

Two basic types of studies are observational studies and designed experiments. An observational study observes individuals and measures variables of interest and does not attempt to influence the responses. A sample survey is a type of observational study. A designed experiment deliberately imposes some treatment or conditions on individuals to observe their responses. A drug study were some patients get drug A and others get drug B

is an example of an experiment. The study investigator imposes which subjects get which drug. When we wish to assess cause and effect relationships, an experiment is the only true way to evaluate the effects of the experimental conditions. Observational studies can shed light, but they tend to not be as convincing as a good designed experiment.

10.1 Elements of a Designed Experiment

Definition 10.1

The response variable is the variable of interest to be measure in the experiment. We also refer to the response as the dependent variable.

The response might be the SAT scores of a high school senior, the total sales of a firm last year, or the total income of a particular household this year.

Experimental Units, Subjects, Treatment

The individuals on which the experiment is done are the experimental units. When the units are human beings, they are called subjects. A specific experimental condition applied to the units is called a treatment.

Example Does regularly taking aspirin help protect people against heart attack?

The Physicians’ Health Study looked at the effects of two drugs: aspirin and beta carotine.

The subjects were 21,996 male physicians. There were two factors, each having two levels: aspirin (yes or no) and beta carotine (yes or no). Combinations of the levels of these factors form the four treatment shown in Figure. One-fourth of the subjects were assigned to each of these treatments. So there are four treatments in the experiment.

Figure The treatments in the Physicians’ Health Study.

The result shows that 239 of the placebo group but only 139 of the aspirin group had suffered heart attack.

Definition 10.2

Factors are those variables whose effect on the response is of interest to the experimenter.

Definition 10.3

Factor levels are the values of the factor utilized in the experiment.

Definition 10.4

The treatments of an experiment are the factor-level combination utilized.

10.2 The Completely Randomized Design

This design assigns experimental units to treatments with random assignment.

Figure Outline of a completely randomized design comparing three treatments.

General ANOVA Summary Table for a Completely Randomized Design

Source df SS MS FTreatment

s p-1 SSTMST=SST/(p-

1)MST/MSE

Error n-p SSEMSE=SSE/(n-

p)  

Total n-1SS(Total

)    

The variation between the treatment means is measured by the Sum of Squares for Treatments (SST).

SST=

∑i=1

3

ni( x i− x )2

where x

is the overall mean number and x i

is the ith treatment mean number.

The sampling variability within the treatments is measured by the Sum of Squares for Error (SSE).

SSE= (n1−1)s1

2+(n2−1)s22+(n3−1) s3

2

where s1

2

, s2

2

, and s3

2

are the sample variances associated with the three treatments.

Test to Compare p Treatment Means for a Completely Randomized Design

H0 : μ1=μ2=⋯=μ p

Ha: At least two treatment means differ

Test statistic

F= MSTMSE

Assumptions:

1. Samples are selected randomly and independently from the respective populations.

2. All p population probability distributions are normal.

3. The p population variances are equal.

Rejection region: F> Fα , where Fα is based

on v1=( p−1) numerator degrees of freedom

Variation due to treatment

Variation due to random sampling

Total variation

(associated with MST) and v2=(n−p ) denominator d.f. (associated with MSE). Also,

Reject H0 if p-value is less than α .

One-Way ANOVA

Partitions Total Variation

Sum of Squares Among Sum of Squares Within

Sum of Squares Between Sum of Squares Error

Sum of Squares Treatment Within Groups Variation

Among Groups Variation

X

Group 1

Group 2

Group 3

Response, X

Total Variation

SS (Total )=( X11− X )2+( X21− X )2+…+( X ij−X )2

X

X 3

X 2

X 1 Group 1

Group 2

Group 3

Response, X

Treatment Variation

SST=n1 ( X1− X )2+n2 ( X 2− X )2+…+np ( X p−X )2

X 2

X 1

X 3

Group 1

Group 2

Group 3

Response, X

Random (Error) Variation

SSE=( X11−X1)2+( X21−X1 )2+…+( X pj− X p)2

One-Way ANOVA F-Test Test Statistic

1. Test Statistic

F = MST / MSE MST Is Mean Square for

Treatment MSE Is Mean Square for Error

2. Degrees of Freedom

1 = p -12 = n - p

p = # Populations, Groups, or Levels

n = Total Sample Size

F

a

p

n

p

(

,

)

1

0

Reject H

0

Do Not Reject

H 0

F

One-Way ANOVA F-Test Example

As production manager, you want to see if 3 filling machines have different mean filling times. You assign 15 similarly trained & experienced workers, 5 per machine, to the machines. At the .05 level, is there a difference in mean filling times?

Mach1 Mach2 Mach325.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40

F

0

3.89

H 0 : 1 = 2 = 3 H a : Not All Equal = .05 1 = 2 2 = 12 Critical Value(s):

Test Statistic:

Decision : Reject at = .05

= .05

F

MST MSE

23

5820

9211

25

6

.

.

.

There Is Evidence Pop. Means Are Different

Source of Variation

Degrees of Freedom

Sum of

Squares

Mean

Square (Variance)

F

Treatment (Machines)

3 - 1 = 2

47.1640

23.5820

25.60

Error

15 - 3 = 12

11.0532

.9211

Total

15 - 1 = 14

58.2172

Summary Table

10.3 Multiple Comparisons of Means

Consider a completely randomized design with three treatments, A, B, and C. Suppose we determine that the treatment means are statistically different via the ANOVA F-test of Section 10.2. To complete the analysis, we want to rank the three treatment means.

In the three-treatment experiment, for example, we would construct confidence intervals for the

following differences: μA−μB , μA−μC , and μB−μC . In general, if there are p treatment means, there are

c=p ( p−1)/2

pairs of means that can be compared.

Tukey Procedure

X

f(X)

1 = 2 3

1. Tells Which Population Means Are Significantly Different Example: 1 = 2 3 2. Post Hoc Procedure Done After Rejection of Equal Means in ANOVA 3. Output From Many Statistical Computer Programs

ExperimentalDesigns

One-Way Anova

Completely Randomized

Randomized Block

Two-Way Anova

Factorial

10.4 The Randomized Block Design

Block Design. A bolck is a group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a block design, the random assignment of units to treatments is carried out separately within each block.

Figure Outline of a block design. The blocks consist of male and female subjects. The treatments are three therapies for cancer.

1. Experimental Units (Subjects) Are Assigned Randomly to Blocks

Blocks are Assumed Homogeneous

2. One Factor or Independent Variable of Interest

2 or More Treatment Levels or Classifications

3. One Blocking Factor

General ANOVA Summary Table for a Randomized Block Design

Source df SS MS FTreatment

s p-1 SST MST=SST/(p-1)MST/MSE

Block b-1 SSB MSB=SSB/(b-1)

Error n-p-b+1 SSE MSE  

Total n-1SS(Total

)    

The variation between the treatment means is measured by the Sum of Squares for Treatments (SST).

SST=

∑i=1

p

b( xT i− x )2

where

xT i

represents the sample mean for the i-th treatment, b (the number of blocks) is the number of measurements for each treatment, and p is the number of treatments.

The blocks also account for some of the variation among the different responses. The sampling variability between the blocks is measured by the Sum of Squares for Blocks (SSB):

SSB=

∑i=1

b

p ( x Bi− x )2

where

xBi

represents the sample mean for the i-th block and p (the number of treatments) is the number of measurements in each block.

The total variation is

SS(Total)=

∑i=1

n

( x i− x )2

Then the variation attributable to sampling error is found by subtraction:

H 0 : 1 = 2 = 3 = ... = p All Population Means are Equal No Treatment Effect H a : Not All j Are Equal At Least 1 Pop. Mean is Different Treatment Effect 1 2 ... p Is Wrong

X

f(X)

1

=

2

=

3

X

f(X)

1

=

2

3

SSE=SS(Total)-SST-SSB

Test to Compare p Treatment Means for a Completely Randomized Design

H0 : μ1=μ2=⋯=μ p

Ha: At least two treatment means differ

Test statistic

F= MSTMSE

Rejection region: F> Fα , where Fα is based

on v1=( p−1) numerator degrees of freedom

(associated with MST) and v2=(n−b−p+1 ) denominator d.f. (associated with MSE). Also,

Reject H0 if p-value is less than α .

Assumptions:

1. The probability distributions of observations corresponding to all the block-treatment combinations are normal.

2. The variances of all probability distributions are equal.

Randomized Block F-Test Example

Golfer(Block)

Brand A

Brand B

Brand C

Brand D

B1202.

4203.

2223.

7203.

6

B2 242248.

7259.

8240.

7

B3220.

4227.

3 240207.

4

B4 230243.

1247.

7226.

9

B5191.

6211.

4218.

7200.

1

B6247.

7 253268.

1 244

B7214.

8214.

8233.

9195.

8

B8245.

4243.

6257.

8227.

9

B9 224231.

5238.

2215.

7

B10252.

2255.

2265.

4245.

2

Excel Output

Anova: Two-Factor Without Replication                     

SUMMARY Count Sum Average Variance    B1 4 832.9 208.225 106.6825    B2 4 991.2 247.8 76.28667    B3 4 895.1 223.775 185.0692    B4 4 947.7 236.925 100.8958    B5 4 821.8 205.45 143.8033    

B6 41012.

8 253.2 112.3133    B7 4 859.3 214.825 241.9358    B8 4 974.7 243.675 150.4492    B9 4 909.4 227.35 93.96333    B10 4 1018 254.5 70.36                 

Brand A 102270.

5 227.05 410.6428    

Brand B 102331.

8 233.18 341.5507    

Brand C 102453.

3 245.33 297.6312    

Brand D 102207.

3 220.73 352.4534                              ANOVA            

Source of Variation SS df MS F P-value F critRows 12073.88 9 1341.542 66.26468 4.504E-16 2.250133Columns 3298.657 3 1099.552 54.31172 1.448E-11 2.960348Error 546.6208 27 20.24521                   Total 15919.16 39        

10.5 Factorial Experiments

All the experiments discussed in Sections 10.2 and 10.4 were single-factor experiments. The treatments were levels of a single factor, with the sampling of experimental units performed using either a completely randomized or randomized block design.

ExperimentalDesigns

One-Way Anova

Completely Randomized

Randomized Block

Two-Way Anova

Factorial

Factorial Design

1. Experimental Units (Subjects) Are Assigned Randomly to Treatments

Subjects are Assumed Homogeneous

2. Two or More Factors or Independent Variables

Each Has 2 or More Treatments (Levels)

3. Analyzed by Two-Way ANOVA

Definition 10.9

A complete factorial experiment is one in which every factor-level combination is utilized. That is, the number of treatments in the experiment equals the total number of factor-level combinations.

Factorial Design Example

Factor 2 (Training Method)

Factor Levels

Level 1

Level 2

Level 3

Level 1

19 hr.

20 hr.

22 hr.

Factor 1

(High)

11 hr.

17 hr.

31 hr.

(Motivation)

Level 2

27 hr.

25 hr.

31 hr.

(Low)

29 hr.

30 hr.

49 hr.

Advantages of Factorial Designs

1. Saves Time & Effort

e.g., Could Use Separate Completely Randomized Designs for Each Variable

2. Controls Confounding Effects by Putting Other Variables into Model

3. Can Explore Interaction Between Variables

6 Treatments

Two-Way ANOVA

1. Tests the Equality of 2 or More Population Means When Several Independent Variables Are Used

2. Same Results as Separate One-Way ANOVA on Each Variable

No Interaction Can Be Tested

3. Used to Analyze Factorial Designs

Two-Way ANOVA Data Table

X i j k

Level i

Factor A

Level j

Factor B

Observation k

Factor

Factor B

A 1

2

...

b

1 X

111

X

121

...

X

1b1

X 112

X

122

...

X

1b2

2

X

211

X

221

...

X

2b1

X 212

X

222

...

X

2b2

:

:

:

:

:

a X

a11

X

a21

...

X

ab1

X a12

X

a22

...

X

ab2

Total Variation

Variation Due to Treatment A

Variation Due to Random Sampling

Variation Due to Interaction

SSA

SS(Total)

Variation Due to Treatment B

SSB

Two-Way ANOVA

Total Variation Partitioning

Two-Way ANOVA Summary Table

Source of Variation

Degrees of Freedom

Sum of Squares

Mean

Square

F

A (Row)

a - 1

SS(A)

MS(A)

MS(A) MSE

B (Column)

b - 1

SS(B)

MS(B)

MS(B) MSE

AB (Interaction)

(a-1)(b-1)

SS(AB)

MS(AB)

MS(AB)

MSE Error

n - ab

SSE

MSE

Total

n - 1

SS(Total)

Same as Other Designs

Test Conducted in Analyses of Factorial Experiments: Completely Randomized Design, r Replicates per treatment

Test for Treatment Means

H0: No difference among the ab treatment means

Ha: At least two treatment means differ

Test statistic

F= MSTMSE

Rejection region: F> Fα , where Fα is based

on v1=(ab−1) numerator degrees of freedom

and v2=(n−ab) denominator d.f. (Note: n=abr).

Test for Factor Interaction

H0: Factors A and B do not interact to affect the response mean

Ha: Factors A and B do interact to affect the response mean

Test statistic

F=MS( AB)

MSE

Rejection region: F> Fα , where Fα is based

on v1=(a−1 )(b−1) numerator degrees of

freedom and v2=(n−ab) denominator d.f.

Test for Main Effect Factor A

H0: No difference among the a mean levels of factor A

Ha: At least two factor A mean levels differ

Test statistic

F=MS( A )MSE

Rejection region: F> Fα , where Fα is based

on v1=(a−1 ) numerator degrees of freedom

and v2=(n−ab) denominator d.f.

Test for Main Effect Factor B

H0: No difference among the a mean levels of factor B

Ha: At least two factor B mean levels differ

Test statistic

F=MS( B )MSE

Effects of Motivation (High or Low) & Training Method (A, B, C) on Mean Learning Time

Interaction No Interaction

Average Response

A

B

C

High

Low

Average Response

A

B

C

High

Low

Rejection region: F> Fα , where Fα is based

on v1=(b−1 ) numerator degrees of freedom

and v2=(n−ab) denominator d.f.

Assumptions for All Tests

The response distribution for each factor-level combination (treatment) is normal.The response variance is constant for all treatmentsRandom and independent samples of experimental units are associated with each treatment.

Graphs of Interaction

11. Simple Linear Regression

Learning Objectives

Describe the Linear Regression ModelState the Regression Modeling StepsExplain Ordinary Least SquaresCompute Regression CoefficientsPredict Response VariableInterpret Computer Output

Models

Representation of Some Phenomenon

Mathematical Model Is a Mathematical Expression of Some Phenomenon

Often Describe Relationships between Variables

Types

o Deterministic Modelso Probabilistic Models

Deterministic Models

1. Hypothesize Exact Relationships

2. Suitable When Prediction Error is Negligible

3.Example: Force Is Exactly Mass Times Acceleration

F = m·a

Probabilistic Models

1. Hypothesize 2 Components

DeterministicRandom Error

Y = Deterministic component + Random Error

where Y is the variable of interest.

2. Example: Sales Volume Is 10 Times Advertising Spending + Random Error

Y = 10X + Random Error May Be Due to Factors

Other Than Advertising

Types of Probabilistic Models

A First-Order (Straight-Line) Probabilistic Model

y=β0+β1 x+ε

where

y is the dependent (or response) variable,

x is the independent (or predictor) variable,

ProbabilisticModels

RegressionModels

CorrelationModels

OtherModels

Y

Y = mX + b

b = Y-intercept

X

Changein Y

Change in X

m = Slope

E( y )=β0+β1 x is the deterministic portion of the model,

β0 is y-intercept of the line, and

β1 is slope of the line.

Regression Models

1. Answer ‘What Is the Relationship Between the Variables?’

2. Equation Used

1 Numerical Dependent (Response) Variable

What Is to Be Predicted

1 or More Numerical or Categorical Independent (Explanatory) Variables

3. Used Mainly for Prediction & Estimation

Regression Modeling Steps

1. Hypothesize Deterministic Component

2. Estimate Unknown Model Parameters

3. Specify Probability Distribution of Random Error Term

RegressionModels

LinearNon-

Linear

2+ ExplanatoryVariables

Simple

Non-Linear

Multiple

Linear

1 ExplanatoryVariable

Unknown Relationship

Population

Random Sample

ii10i XY

ii10i ˆXˆˆY

$ $

$

$

$

$

$

Estimate Standard Deviation of Error

4. Evaluate Model

5. Use Model for Prediction & Estimation

Sample Linear Regression Model

Y

X

ii10i ˆXˆˆY

i10i XˆˆY

Unsampled observation

i = Random error

Observed value

^

Origins of Regression:

“Regression Analysis was first developed by Sir Francis Galton in the latter part of the 19th Century. Galton had studied the relation between heights of fathers and sons and noted that the heights of sons of both tall & short fathers appeared to ‘revert’ or ‘regress’ to the mean of the group. He considered this tendency to be a regression to ‘mediocrity.’ Galton developed a mathematical description of this tendency, the precursor to today’s regression models.” (From page 6 of Neter, Kutner, Nachtsheim, and Wasserman, 1996).Regression Line

The regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.

Table. Mean height of children in Kalama, Egypt, age from 18 to 29 months.

Scattergram

1. Plot of All (Xi, Yi) Pairs2. Suggests How Well Model Will Fit

Figure. Mean height of children in Kalama, Egypt, plotted against age from 18 to 29 months, from Table 2.7.

Figure. The regression line fitted to the Kalama data and used to predict height at age 32 months.

In Figure, we have drawn the regression line with the equation

Height = 64.93+(0.635 ¿ age)

It means that b=0.635 is the slope of the line and a=64.93 is the intercept.

If we substitute 32 for the age in the equation,

Height = 64.93+(0.635 ¿ 32)=85.25 centimeters.

How would you draw a line through the points? How do you determine which line ‘fits best’?

Least Squares

1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum

But Positive Differences Off-Set Negative

The least squares regression line is the

straight line y=a+bx which minimizes the sum of the squares of the vertical distances between the line and the observed values y.

∑ (error )2=∑( observed y−predicted y )2

2. LS Minimizes the Sum of the Squared Differences (SSE)

Least Square regression

If we predict 85.25 centimeters for the mean height at age 32 months and the actual mean turns out to be 84 centimeters, our error is

Error = observed height – predicted height

= 84 -85.25 = -1.25 centimeters

Figure The least-squares idea: make the errors in predicting y as small as possible by minimizing the sum of their squares.

Least Squares Graphically

2

Y

X

1 3

4

^^

^^

22102 ˆXˆˆY

i10i XˆˆY

Coefficient Equations

LS minimizes ∑i=1

n

εi2= ε1

2+ ε22+ ε3

2+ ε42

XˆYˆ

n

XX

n

YXYX

ˆ

10

n

1i

2n

1i i2i

n

1i i

n

1i in

1i ii

1

Sample Slope

Sample Y-intercept

i10i XˆˆY

Prediction Equation

Interpretation of Coefficients

1. Slope (1_hat)

Estimated Y Changes by 1_hat for Each 1 Unit Increase in X

If 1 hat = 2, then Sales (Y) Is Expected to Increase by 2 for Each 1 Unit Increase in Advertising (X)

2. Y-Intercept (0_hat)

Average Value of Y When X = 0

If 0_hat = 4, then Average Sales (Y) Is Expected to Be 4 When Advertising (X) Is 0

Parameter Estimation Example

You’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

What is the relationship between sales & advertising?

Parameter Estimation Solution

Coefficient Interpretation

1. Slope (1_hat)

Sales Volume (Y) Is Expected to Increase by .7 Units for Each $1 Increase in Advertising (X)

2. Y-Intercept (0_hat)

Average Value of Sales Volume (Y) Is -.10 Units When Advertising (X) Is 0

Difficult to Explain to Marketing Manager

Expect Some Sales Without Advertising

Parameter Estimation Computer Output

β1=∑i=1

n

X i Y i−(∑

i=1

n

X i)(∑i=1

n

Y i)n

∑i=1

n

X i2−

(∑i=1

n

X i)2

n

=37−

(15 ) (10 )5

55−(15 )2

5

=0 .70

¿ 0 { bar { hat 1 { bar (0 . 70 )¿¿

¿

Parameter Estimates

Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354

0 ^ 1 ^

k ^

  Barry Bonds Least-Squares Regression line

0

20

40

60

80

100

120

140

160

180

0 20 40 60 80

  HR

 RBI

Typically, the equation of the least squares regression line is obtained by computer software with a regression function.

Excel output from Barry Bonds Statistics

  CoefficientsIntercept 39.7618446  Slope 1.568414403

RESIDUAL OUTPUT

Observation

Predicted RBI Residuals

1 64.85647505 -16.85647502 78.97220467 -19.97220463 77.40379027 -19.40379024 69.56171826 -11.56171825 91.5195199 22.48048016 78.97220467 37.027795337 93.0879343 9.9120656988 111.9089071 11.091092869 97.79317751 -16.7931775

10 91.5195199 12.480480111 105.6352495 23.3647504712 102.4984207 -1.4984207213 97.79317751 24.2068224914 93.0879343 -10.087934315 116.6141503 -10.614150316 154.256096 -17.256096017 91.5195199 -16.5195199

From Dr. Chris Bilder’s website.

Select Tools > Data Analysis from the main Excel menu bar to bring up the Data Analysis window. Select Regression and OK to produce the Regression window. Below is the finished window.

The Residual option produces the residuals in the output. The Line Fit Plots option produces a

plot similar to a scatter plot with an estimated regression line plotted upon it. Notice the above output does not look exactly like a scatter plot with estimated regression line plotted upon it. Below is one way to fix the plot. Note that other steps are often necessary to make the plot more “professional” looking (changing the scale on the axes, adding tick marks, changing graph titles, etc…)

1) Change background from grey to whitea) Right click on the grey background (a

menu should appear)b) Select format plot area to bring up the

following window:

i) Select None as the areaii) Select OK

2) Remove legenda) Right click in the legendb) Select Clear

3) Create the regression linea) Right click on one of the estimated Y

values (should be in pink) and a menu should appear

b) Select Format Data Series to bring up the following window:

i) Under Marker, select Noneii) Under Line, select Automatic iii)Select OK

Linear Regression Assumptions

1. Mean of Probability Distribution of Error Is 0

2. Probability Distribution of Error Has Constant Variance

3. Probability Distribution of Error is Normal

4. Errors Are Independent

Error Probability Distribution

Measures of Variation in Regression

1. Total Sum of Squares (SSyy)

Measures Variation of Observed Yi Around the Mean¯Y

2. Explained Variation (SSR)

Variation Due to Relationship Between X & Y

3. Unexplained Variation (SSE)

Y

f()

X

X 1X 2

Y

X

¯Y

X i

i10i XˆˆY Total sum of squares (Y i - Y) 2

Unexplained sum of squares (Y i - Y i ) 2

^

Explained sum of squares (Y i - Y) 2

^

Y i

Variation Due to Other Factors

Variation Measures

Estimaton of σ 2

for a Straight-Line Model

s2= SSEn−2

where

SSE=

∑i=1

n

(Y i−Y )2

s=√s2=√ SSEn−2

We will refer to s as the estimated standard error of the regression model.

Interpretation of s, the estimated Standard

Deviation of ε

We expect most (95%) of the observed y values to lie within 2s of their respective least squares

predicted value, y

.

Test of Slope Coefficient

1. Shows If There Is a Linear Relationship Between X & Y 2. Involves Population Slope 1 3. Hypotheses H 0 : 1 = 0 (No Linear Relationship) H a : 1 0 (Linear Relationship) 4. Theoretical Basis Is Sampling Distribution of Slope

Y

Population LineX

Sample 1 Line

Sample 2 Line

1

All Possible Sample Slopes Sample 1: 2.5

Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1

: : Very large number of

sample slopes

Sampling Distribution

1

1 S

^

^

Sampling Distribution of Sample Slopes

Slope Coefficient Test Statistic

Test of an Individual parameter Coefficient in the Simple Linear Regression Model

One-Tailed Test

H0 : β1=0

Ha : β1<0 [or Ha : β1>0 ]

Two-Tailed Test

H0 : β1=0

Ha : β1≠0

Test statistic

t=β1

sβ1

where

sβ1= s

√∑i=1

n

( X i−X )2

Rejection region

t <−tα or t >tα or |t|>tα /2

where tα and tα /2 are based on (n-2) df.

or

Reject H0 : β1=0 if p-value is less than α , (for example, α =0.05 )

A 100(1-α )% Confidence Interval for a β1parameter

β1±tα /2⋅sβ1=( β1−t α /2 ¿ s β1

, β1+ tα /2 ¿ s β 1)

where tα /2 is based on (n-2) degree of freedom.

Coefficient of Correlation

Scatterplots provide a visual tool for looking at the relationship between two variables. Unfortunately our eyes are not good tools for judging the strength of the relationship. Changes in the scale or the amount of white space in the graph can easily affect our judgement as to the strength of the relationship. Correlation is a numerical measure we will use to show the strength of linear association.

Figure 2.9 Two scatterplots of the same data

Correlation The correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually denoted by r .

Suppose that we have data on variables x and y for n individuals. The mean and standard

deviations of the two variables are x and Sx for

the x-values, and y and S y for the y-values.

The correlation coefficient r between x and y is

r= 1n−1

∑ ( x i− x

Sx)( y i− y

S y)

.

The correlation coefficient r has possible values between negative one and positive one. That is, −1≤r≤1 .

When r is positive it means that there is a positive linear association between the variables and when it is negative there is a negative linear association. A scatterplot for a dataset with r=1 would have points on a perfectly straight upward sloping pattern. All points would fall on a straight line. A scatterplot for a datset with r=−1 would have points on a perfectly straight downward sloping line. A value of r like r=0 would give a scatterplot with a blob shape and no apparent upward or downward trend.

Figure How the correlation r measures the direction and strength of linear association.

Y

X

Y

X

Y

X

Y

X

r2 = 1 r2 = 1

r2 = .8 r2 = 0

Coefficient of Determination

Proportion of Variation “Explained” by Relationship between X and Y

r2 = Explained Variation / Total variation

=∑i=1

n

(Y i−Y )2−∑i=1

n

(Y i−Y )2

∑i=1

n

(Y i−Y )2

Practical Interpretation of the Coefficient of Determination

100 (r2) % of the variation in y can be explained by using x to predict y in the straight-line model.

Coefficient of Determination Examples

You’re a marketing analyst for Hasbro Toys. You find 0 = -0.1 & 1 = 0.7. Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Interpret a coefficient of determination of 0.8167.

^

^

Coefficient of Determination Example

Prediction With Regression Models

1. Types of Predictions

Point Estimates Interval Estimates

2. What Is Predicted

Population Mean Response E(Y) for Given X

Point on Population Regression Line

Individual Response (Yi) for Given X

What Is Predicted

Confidence Interval Estimate of Mean Y

Mean Y, E(Y)

YY Individual

Prediction, Y

E(Y) = 0 + 1X

^

XXP

Y−tn−2, α /2⋅SY≤E(Y )≤Y +tn−2 , α /2⋅S

Y

where

SY=S√ 1n

+( X p−X )2

∑i=1

n

( X i−X )2

1. Level of Confidence (1 - ) Width Increases as Confidence Increases 2. Data Dispersion ( s ) Width Increases as Variation Increases 3. Sample Size Width Decreases as Sample Size Increases 4. Distance of X p from Mean X Width Increases as Distance Increases

Sample 2 Line

Y

XX1 X2

Y_ Sample 1 Line

Greater dispersion than X 1

X

Factors Affecting Interval Width

Why Distance from Mean?

You’re a marketing analyst for Hasbro Toys. You find b 0 = -.1, b 1 = .7 & s = .60553 . Ad $ Sales (Units) 1 1 2 1 3 2 4 2 5 4 Estimate the mean sales when advertising is $4 at the .05 level.

Confidence Interval Estimate Example

Confidence Interval Estimate Solution

7553.3)Y(E6445.1

3316.01824.37.2)Y(E3316.01824.37.2

3316.010

3451

60553.S

7.247.01.0Y

StY)Y(EStY

2

Y

Y2/,2nY2/,2n

X to be predicted

n

1i

2i

2P

YY

YY2/,2nPYY2/,2n

XX

XXn1

1SS

where

StYYStY

Note!

Prediction Interval of Individual Response

Why the Extra ‘S ’ ?

Expected(Mean) Y

Y

Y i= 0

+ 1X i

^

Y we're trying to predict

Prediction, Y

E(Y) = 0 + 1X

^

XXP

^

^

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95% Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Predicted Y when X = 4

Confidence Interval

S Y ^

Prediction Interval

Interval Estimate Computer Output

Hyperbolic Interval Bands

X

Y

X

Y i= 0

+ 1X i

^

XP

_

^^

12. Multiple Regression Models

RegressionModels

LinearNon-

Linear

2+ ExplanatoryVariables

Simple

Non-Linear

Multiple

Linear

1 ExplanatoryVariable

Learning Objectives

1. Explain the Linear Multiple Regression Model

2. Test Overall Significance3. Describe Various Types of Models4. Evaluate Portions of a Regression Model5. Interpret Linear Multiple Regression

Computer Output6. Describe Stepwise Regression7. Explain Residual Analysis8. Describe Regression Pitfalls

Most practical applications of regression analysis utilize models that are more complex than the simple straight-line model. For example, a realistic probabilistic model for reaction time would include more than just the amount of a particular drug in the bloodstream. Factors such as age, a measure of visual perception, and sex of the subjects are a few of the many variables that might be related to reaction time.

Regression Modeling Steps

Dependent (response) variable

Independent (explanatory) variables

Population slopes

Population Y-intercept

Random error

ikiki22i110i XXXY ⋯

1. Relationship between 1 dependent & 2 or more independent variables is a linear function

1. Hypothesize Deterministic Component2. Estimate Unknown Model Parameters3. Specify Probability Distribution of Random

Error TermEstimate Standard Deviation of Error

4. Evaluate Model5. Use Model for Prediction & Estimation

Probabilistic models that include more than one independent variable are called multiple regression models. The general form of these models is

The dependent variable Y is now written as a

function of k independent variables,x1 , x2 ,. .. , xk .

The random error term is added to make the model probabilistic rather than deterministic. The

value of the coefficient β i determines the

contribution of the independent variable x i , and β0 is the y-intercept. The coefficients β0 , β1 ,. . . , βk are usually unknown because they represent population parameters.

Actually, x1 , x2 ,. .. , xk can be functions of variables as long as the functions do not contain unknown parameters. For example, the reaction time, Y, of a subject to a visual stimulus could be a function of the independent variables

x1= Age of the subject

x2=( Age)2=x12

x3=1 if male subject, 0 if female subject

The x2 term is called a higher-order term, since

it is the value of a quantitative variable (x1 ) squared (i.e., raised to the second power). The x3 term is an indicator variable representing a qualitative variable (gender).

The General Multiple Regression Model

Y=β0+β1 x1+β2 x2+⋯+βk xk+ε

where

Y is the dependent (or response) variable,

X2

Y

X1E(Y) = 0 + 1X 1i + 2X 2i

0

Y i = 0 + 1X 1i + 2X 2i + i

ResponsePlane

(X 1i,X 2i)

(Observed Y )

i

Bivariate model

x1 , x2 ,. .. , xk are the independent (or predictor) variables,

E(Y )=β0+ β1 x1+β2 x2+⋯+βk xk is the deterministic portion of the model,

β i determines the contibution of the

independent variable x i .

Population Multiple Regression Model

Analyzing a Multiple Regression Model

6. Hypothesize the deteministic component of the model. This component relates the mean, E(Y), to the independent variables x1 , x2 ,. .. , xk . This involves the choice of the independent variables to be included in the model.

7. Use the sample data to estimate the

unknown model parameters β0 , β1 ,. . . , βk in the model

8. Specify the probability distribution of the random error term, ε , and estimate the standard deviation of this distribution, σ .

9. Check that the assumption on ε are satisfied, and make model modification if necessary.

10. Statistically evaluate the usefulness of the model

11. When satisfied that the model is useful, use it for prediction, estimation, and other purposes.

Multiple linear regression

Two or more independent variables are used to estimate 1 dependent variable.

Notes:1) i~independent N(0,2)2) 0, 1, …,p-1 are parameters with

corresponding estimates of 3) Xi1,…,Xi,p-1 are known constants 4) The second subscript on Xij denotes the jth

independent variable. 5) i=1,…n

Parameter Estimation Example

1. Slope ( k ) Estimated Y Changes by k for Each 1 Unit Increase in X k Holding All Other Variables Constant Example: If 1 = 2, then Sales ( Y ) Is Expected to Increase by 2 for Each 1 Unit Increase in Advertising ( X 1 ) Given the Number of Sales Rep’s ( X 2 )

2. Y-Intercept ( 0 ) Average Value of Y When X k = 0

^

^

^

You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00).

You’ve collected the following data: Resp Size Circ 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6

^

Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214

ADSIZE 1 0.2049 0.0588 3.656 0.0399

CIRC 1 0.2805 0.0686 4.089 0.0264

P

2

0

1 ^

^

^

^

1. Slope ( 1 ) # Responses to Ad Is Expected to Increase by .2049 (20.49) for Each 1 Sq. In. Increase in Ad Size Holding Circulation Constant

2. Slope ( 2 ) # Responses to Ad Is Expected to Increase by .2805 (28.05) for Each 1 Unit (1,000) Increase in Circulation Holding Ad Size Constant

^

Interpretation of Coefficients Solution

^

Assumptions for Random Error ε

1. For any given of values of x1 , x2 ,. .. , xk , the random error ε has a normal probability distribution with mean equal to 0 and variance

equal to σ2

.2. The random errors are independent.

Estimator of σ2

for a Multiple Regression Model with k Independent Variables

s2= SSEn−(k+1 )

=∑ ( y i− y i )

2

n−(k+1 ) ,

where k= Number of estimated β parameters

Test of an Individual parameter Coefficient in the Multiple Regression Model

One-Tailed Test

H0 : β i=0

Ha : βi<0 [or Ha : βi>0 ]

Two-Tailed Test

H0 : β i=0

Ha : βi≠0

Test statistic

t=β i

sβ i

Reject H0 : β i=0 if p-value is less than α , (for example, α =0.05 )

A 100(1-α )% Confidence Interval for a β parameter

β i±tα /2⋅sβ i=( βi−t α /2¿ s βi

, β i+t α /2¿ s βi)

where tα /2 is based on n-(k+1) degree of freedom.

n= Number of observations

k+1=Number of β parameters in the model.

1. Shows If There Is a Linear Relationship Between All X Variables Together & Y 2. Uses F Test Statistic 3. Hypotheses H 0 : 1 = 2 = ... = k = 0 No Linear Relationship H a : At Least One Coefficient Is Not 0 At Least One X Variable Affects Y

Testing Overall Significance

Testing Global Usefulness of the Model of the Model: The Analysis of Variance F -Test

H0 : β1=β2=⋯=βk=0

(All model terms are unimportant for predicting y)

Ha: At least one β i≠0

(At least one model term is useful for predicting y)

Test statistic

F= MSRMSE

Reject H0 if p-value is less than α .

Analysis of Variance Sum of Mean

Source DF Squares Square F Value Prob>F

Model 2 9.2497 4.6249 55.440 0.0043

Error 3 0.2503 0.0834

C Total 5 9.5000k n - k -1

n - 1 P-Value

MS(Model) MS(Error)

Testing Overall Significance Computer Output

Types of Regression Models

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

1 < 01 > 0Y

X1

Y

X1

First-Order Model With 1 Independent Variable

1. Relationship Between 1 Dependent & 1 Independent Variable is Linear

2. Used When Expected Rate of Change in Y Per Unit Change in X is Stable

3. Used With Curvilinear Relationships If Relevant Range Is Linear

First-Order Model Relationships

E(Y )=β0+ β1 X1i

2i12i110 XX)Y(E

Linear effect

Curvilinear effect

Y

X1

Y

X1

Y

X1

Y

X1

2 > 02 > 0

2 < 02 < 0

Second-Order Model With 1 Independent Variable

1. Relationship Between 1 Dependent & 1 Independent Variables Is a Quadratic Function

2. Useful 1St Model If Non-Linear Relationship Suspected

3. Model

3i13

2i12i110 XXX)Y(E

Linear effect Curvilinear effects

Y

X1

Y

X1

3 < 03 > 0

Third-Order Model With 1 Independent Variable

1. Relationship Between 1 Dependent & 1 Independent Variable Has a ‘Wave’

2. Used If 1 Reversal in Curvature

3. Model

Third-Order Model Relationships

Effect (slope) of X 1 on E ( Y ) does not depend on X 2 value

E(Y)

X 1

4

8

12

0 0 1 0.5 1.5

E ( Y ) = 1 + 2 X 1 + 3(2) = 7 + 2 X 1

E ( Y ) = 1 + 2 X 1 + 3 X 2

E ( Y ) = 1 + 2 X 1 + 3(1) = 4 + 2 X 1 E ( Y ) = 1 + 2 X 1 + 3(0) = 1 + 2 X 1

E ( Y ) = 1 + 2 X 1 + 3(3) = 10 + 2 X 1

First-Order Model With 2 Independent Variables

1. Relationship Between 1 Dependent & 2 Independent Variables Is a Linear Function

2. Assumes No Interaction Between X1 & X2

Effect of X1 on E(Y) Is the Same Regardless of X2 Values

3. Model

No Interaction

E(Y )=β0+ β1 X1i+β2 X 2i

First-Order Model Relationships

Interaction Model With 2 Independent Variables

1. Hypothesizes Interaction Between Pairs of X Variables

Response to One X Variable Varies at Different Levels of Another X Variable

2. Contains Two-Way Cross Product Terms

3. Can Be Combined With Other Models

Example: Dummy-Variable Model

E(Y )=β0+ β1 X1i+β2 X 2i+β3 X1 i X2 i

Effect (slope) of X 1 on E ( Y ) does depend on X 2 value

E(Y)

X 1

4

8

12

0 0 1 0.5 1.5

E ( Y ) = 1 + 2 X 1 + 3 X 2 + 4 X 1 X 2 E ( Y ) = 1 + 2 X 1 + 3(1) + 4 X 1 (1) = 4 + 6 X 1

E ( Y ) = 1 + 2 X 1 + 3(0) + 4 X 1 (0) = 1 + 2 X 1

Effect of Interaction

1. Given:

2. Without Interaction Term, Effect of X1 on Y Is Measured by 1

3. With Interaction Term, Effect of X1 on Y Is Measured by 1 + 3X2

Effect Increases As X2i Increases

Interaction Model Relationships

E(Y )=β0+ β1 X1i+β2 X 2i+β3 X1 i X2 i

Y

X2X1

Y

X2X1

Y

X2X1

4 + 5 > 0

4 + 5 < 0

32 > 4 4 5

2i25

2i14

i2i13

i22i110

XX

XX

XX)Y(E

Second-Order Model With 2 Independent Variables

1. Relationship Between 1 Dependent & 2 or More Independent Variables Is a Quadratic Function

2. Useful 1St Model If Non-Linear Relationship Suspected

3. Model

Second-Order Model Relationships

E (Y )=β0+ β1 X1i+β2 X 2i+β3 X 1i X2 i

+β4 X1i2 + β5 X2i

2

Types of Regression Models

Dummy-Variable Model

1. Involves Categorical X Variable With 2 Levels

e.g., Male-Female; College-No College

2. Variable Levels Coded 0 & 1

3. Number of Dummy Variables Is 1 Less Than Number of Levels of Variable

4. May Be Combined With Quantitative Variable (1st Order or 2nd Order Model)

ExplanatoryVariable

1stOrderModel

3rdOrderModel

2 or MoreQuantitative

Variables

2ndOrderModel

1stOrderModel

2ndOrderModel

Inter-ActionModel

1Qualitative

Variable

DummyVariable

Model

1Quantitative

Variable

Same slopes

Given: Starting salary of college grad'sGPA

Males (

):

Y X XYX

Y X X

i i i

i i i

X

0 1 1 2 2

1

0 1 1 2 0 1 1(0)

2 0

iif Femalef Male

X 201

Y X Xi i i 0 1 1 2 0 1 1(1)

Females ( ):X 2 1 )2

Y

X 1

0 0

Same Slopes 1

0

0 + 2

^

^

^ ^

Females

Males

Interpreting Dummy-Variable Model Equation

Dummy-Variable Model Relationships

Same slopes

Computer Output:

Males (

):

Y X X

Y X X

i i i

i i i

X

3 5 7

3 5 7(0) 3 5

1 2

1 1

2 0

f Maleif Femalei

X 012

Females Y X Xi i i

3 5 7(1) (3 + 7)

51 1

):(X 2 1

Dummy-Variable Model Example

Residual Analysis

1. Graphical Analysis of Residuals

Plot Estimated Errors vs. Xi Values Difference Between Actual Yi &

Predicted Yi

Estimated Errors Are Called Residuals Plot Histogram or Stem-&-Leaf of Residuals

2. Purposes

Examine Functional Form (Linear vs. Non-Linear Model)

Evaluate Violations of Assumptions

Linear Regression Assumptions

1. Mean of Probability Distribution of Error Is 0

2. Probability Distribution of Error Has Constant Variance

3. Probability Distribution of Error is Normal

4. Errors Are Independent

Multicollinearity

1. High Correlation Between X Variables

2. Coefficients Measure Combined Effect

3. Leads to Unstable Coefficients Depending on X Variables in Model

4. Always Exists -- Matter of Degree

5. Example: Using Both Age & Height as Explanatory Variables in Same Model

Correlation Analysis Pearson Corr Coeff /Prob>|R| under HO: Rho

=0/ N=6 RESPONSE ADSIZE CIRC

RESPONSE 1.00000 0.90932 0.93117 0.0 0.0120 0.0069

ADSIZE 0.90932 1.00000 0.74118 0.0120 0.0 0.0918

CIRC 0.93117 0.74118 1.00000 0.0069 0.0918 0.0

rY1 rY2All 1’s

r12

Detecting Multicollinearity

1. Examine Correlation Matrix

Correlations Between Pairs of X Variables Are More than With Y Variable

2. Examine Variance Inflation Factor (VIF)

If VIFj > 5, Multicollinearity Exists

3. Few Remedies

Obtain New Sample Data Eliminate One Correlated X Variable

Correlation Matrix Computer Output

Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 Variance Variable DF Inflation INTERCEP 1 0.0000 ADSIZE 1 2.2190 CIRC 1 2.2190

VIF1 5

Y

Interpolation

X

Extrapolation

Extrapolation

Relevant Range

Variance Inflation Factors Computer Output

Extrapolation

13. Categorical Data Analysis

Learning Objectives

1. Explain 2 Test for Proportions

2. Explain 2 Test of Independence

3. Solve Hypothesis Testing ProblemsTwo or More Population Proportions Independence

Data Types

Qualitative Data

Data

Quantitative Qualitative

Discrete Continuous

1. Qualitative Random Variables Yield Responses That Classify

Example: Gender (Male, Female)2. Measurement Reflects # in Category3. Nominal or Ordinal Scale4. Examples

Do You Own Savings Bonds? Do You Live On-Campus or Off-

Campus?

Hypothesis Tests Qualitative Data

Chi-Square (2) Test for k Proportions

QualitativeData

Z Test Z Test 2 Test

Proportion Independence1 pop.

2 Test

2 or morepop.

2 pop.

1. Tests Equality (=) of Proportions OnlyExample: p1 = .2, p2=.3, p3 = .5

2. One Variable With Several Levels

3. AssumptionsMultinomial ExperimentLarge Sample Size

All Expected Counts 5

4. Uses One-Way Contingency Table

Multinomial Experiment

1. n Identical Trial2. k Outcomes to Each Trial3. Constant Outcome Probability, pk

4. Independent Trials5. Random Variable is Count, nk

6. Example: Ask 100 People (n) Which of 3 Candidates (k) They Will Vote For

One-Way Contingency Table

Candidate

Tom Bill Mary Total

35 20 45 100

Outcomes ( k = 3)

Number of responses

1. Shows # Observations in k Independent Groups (Outcomes or Variable Levels)

2 Test for k Proportions

1. Hypotheses H 0 : p 1 = p 1,0 , p 2 = p 2,0 , ..., p k = p k ,0 H a : Not all p i are equal 2. Test Statistic

3. Degrees of Freedom: k - 1

Observed count

Expected count

Number of outcomes

Hypothesized probability

cells alli

2ii2

nEnEn

2 Test Basic Idea

1. Compares Observed Count to Expected Count If Null Hypothesis Is True

2. Closer Observed Count to Expected Count, the More Likely the H0 Is True

Upper Tail Area DF .995 … .95 … .05 1 ... … 0.004 … 3.841 2 0.010 … 0.103 … 5.991

20 5.991

Reject

What is the critical 2 value if k = 3, & =.05?

= .05

2 Table (Portion)

df = k - 1 = 2

If n i = E ( n i ) , 2 = 0. Do not reject H 0

Measured by Squared Difference Relative to Expected Count

Reject Large Values

Finding Critical Value Example

2 Test for k Proportions Example

As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair. 45 rated Method 2 as fair. 72 rated Method 3 as fair. At the .05 level, is there a difference in perceptions?

H 0 : p 1 = p 2 = p 3 = 1/3 H a : At least 1 is different = .05 n 1 = 63 n 2 = 45 n 3 = 72 Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject at = .05

There is evidence of a difference in proportions 20 5.991

Reject

= .05

2 = 6.3

2 Test for k Proportions Solution

E (n i )=npi ,0

E (n1)=E (n2 )=E (n3)=180 (1/3 )=60

χ2= ∑all cells

[ ni−E (ni ) ]2E (ni )

¿[n1−60 ]260

+[n2−60 ]260

+[ n3−60 ]260

¿[ 63−60 ]2

60+

[ 45−60 ]2

60+

[72−60 ]2

60=6 .3

2 Test of Independence

1. Shows If a Relationship Exists Between 2 Qualitative Variables

One Sample Is DrawnDoes Not Show Causality

2. Assumptions

Multinomial ExperimentAll Expected Counts 5

3. Uses Two-Way Contingency Table

2 Test of Independence Contingency Table

E (n i )=npi ,0

E (n1)=E (n2 )=E (n3)=180 (1/3 )=60

χ2= ∑all cells

[ ni−E (ni ) ]2E (ni )

¿[n1−60 ]260

+[n2−60 ]260

+[ n3−60 ]260

¿[ 63−60 ]2

60+

[ 45−60 ]2

60+

[72−60 ]2

60=6 .3

House Location House Style Urban Rural Total Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

1. Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables

Levels of variable 2

Levels of variable 1

2 Test of Independence

1. Hypotheses

H0: Variables Are Independent Ha: Variables Are Related (Dependent)

2. Test Statistic

Degrees of Freedom: (r - 1)(c - 1)

χ2= ∑all cells

[nij−E (nij ) ]2E (n ij )

Computing expected cell counts

The null hypothesis is that there is no relationship between row variable and column variable in the population. The alternative hypothesis is that these two variables are related.

Here is the formula for the expected cell counts under the hypothesis of “no relationship”.

Expected Cell Counts

Expected count = row total×column total

n

The null hypothesis is tested by the chi-square statistic, which compares the observed counts with the expected counts:

X 2=∑ ( observed−expected )2

exp ected

Under the null hypothesis, X2 has

approximately the χ2

distribution with (r-1)(c-1) degrees of freedom. The P-value for the test is

3. P( χ2≥X 2)

where χ2

is a random variable having the χ2

(df) distribution with df=(r-1)(c-1).

Figure. Chi-Square Test for Two-Way Tables

Example In a study of heart disease in male federal employees, researchers classified 356 volunteer subjects according to their socioeconomic status (SES) and their smoking habits. There were three categories of SES: high, middle, and low. Individuals were asked whether they were current smokers, former smokers, or had never smoked, producing three categories for smoking habits as well. Here is the two-way table that summarizes the data:

This is a 3¿3 table, to which we have added the marginal totals obtained by summing across rows and columns. For example, the first-row total is 51+22+43=116. The grand total, the number of subjects in the study, can be computed by summing the row totals, 116+141+99=356, or the column totals, 211+52+93=356.

observed counts for smoking and SES

  SES  

Smoking High Middle Low Total

Current 51 22 43 116

Former 92 21 28 141

Never 68 9 22 99

Total 211 52 93 356

Example What is the expected count in the upper-left cell in the table of Example, corresponding to high-SES current smokers, under the null hypothesis that smoking and SES are independent?

The row total, the count of current smokers, is 116. The column total, the count of high-SES subjects, is 211. The total sample size is n=356. The expected number of high-SES current smokers is therefore

(116 )(211 )356

=68 .75

We summarize these calculations in a table of expected counts:

Expected counts for smoking and SES

  SES  

Smoking High Middle Low All

Current 68.75 16.9430.3

0115.9

9

Former 83.57 20.6036.8

3141.0

0

Never 58.68 14.4625.8

6 99.00

Total 211.0 52.092.9

9355.9

9

Computing the chi-square statistic

The expected counts are all large, so we preceed with the chi-square test. We compare the table of observed counts with the table of

expected counts using the X2 statistic. We must

calculate the term for each, then sum over all nine cells. For the high-SES current smokers, the observed count is 51 and the expected count

is 68.75. The contribution to the X2 statistic for

this cell is

(51−68 . 75)2

68 . 75=4 . 583

Similarly, the calculation for the middle-SES current smokers is

(22−16 . 94 )2

16 . 94=1 . 511

The X2 statistic is the sum of nine such terms:

X 2=∑ ( observed−expected )2

exp ected

=(51−68 .75 )2

68 . 75+(22−16 . 94 )2

16 .94+(43−30 .30 )2

30 .30

+(92−83 .57 )2

83 . 57+(21−20 .60 )2

20 .60+(28−36 . 83)2

36 . 83

+(68−58 .68 )2

58 . 68+(9−14 . 46 )2

14 .46+(22−25 .86 )2

25.86

=4 . 583+1. 511+5 . 323+0 . 850+0 .008+2 .117

+1 . 480+2 .062+0. 576=18 .51

Because there are r=3 smoking categories and c=3 SES groups, the degrees of freedom for this statistic are

(r-1)(c-1)=(3-1)(3-1)=4

Under the null hypothesis that smoking and

SES are independent, the test statistic X2 has

χ2 (4 ) distribution. To obtain the P-value, refer to the row in Table corresponding to 4 df.

The calculated value X2=18.51 lies between

upper critical points corresponding to probabilities 0.001 and 0.0005. The P-value is therefore between 0.001 and 0.0005. Because the expected cell counts are all large, the P-value from Table F will be quite accurate. There

is strong evidence (X2=18.51, df=4, P<0.001) of

an association between smoking and SES in the population of federal employees.

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63 54.6 49 57.4 112

Ranch 15 23.4 33 24.6 48

Total 78 78 82 82 160

112·82 160

48·78 160

48·82 160

112·78 160

size Sample

alColumn tot totalRow =count Expected

Expected Count Calculation

2 Test of Independence Example

2 Test of Independence Solution

You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level, is there evidence of a relationship?

H 0 : No Relationship H a : Relationship = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s):

Test Statistic:

Decision:

Conclusion:

Reject at = .05

There is evidence of a relationship20 3.841

Reject

= .05

2 = 54.29

2 Test of Independence Thinking Challenge

OK. There is a statistically significant relationship between purchasing Diet Coke & Diet Pepsi. So what do you think the relationship is? Aren’t they competitors?

You Re-Analyze the Data

Diet Pepsi Diet Coke No Yes Total

No 80 2 82 Yes 8 120 128 Total 88 122 210

Diet Pepsi Diet Coke No Yes Total

No 4 30 34 Yes 40 2 42 Total 44 32 76

Low Income

High Income

True Relationships*

Apparent relation

Underlying causal relation

Control or intervening variable

(true cause)

Diet Coke

Diet Pepsi

Conclusion

1. Explained 2 Test for Proportions

2. Explained 2 Test of Independence

3. Solved Hypothesis Testing Problems

Two or More Population Proportions Independence

Using R-Web Software

Consider University of Illinois business school data:

Major Female Male

Accounting 68 56

Administration 91 40

Economics 5 6

Finance 61 59

We wish to determine if the proportion female differs between the four majors.

This is a test of the null hypothesis Ho: p_ac=p_ad=p_e=p_f

We use the Pearson 2 statistic, as in previous problems.

If the test gives a small p-value, how do we determine if the groups differ?

2 Contributions

Answer: We look at a table of contributions to the 2 statistic.

Cells with large values are contributing greatly to the overall discrepancy between the observed and expected counts.

Large values tell us which cells to examine more closely.

Residuals

As we have seen previously in regression problems, we can measure the deviation from what was observed from what is expected under the Ho by using a residual.

Re siduali=Oi−Ei

√E i

Residual Usage

Think of these residuals as being on a standard normal scale.

This means a residual of -3.26 means the observed count was far less (neg) than what would be expected under the Ho.

A residual of 2.58 means the cell’s observed value was far above what would be expected under Ho.

A residual like .24 or -.39 means the cell is not far from what would be expected under Ho.

The sign + or – of the residual tells if the observed cell count was above or below what is expected under Ho.

Abnormally large (in absolute value) residuals will also have large contributions to 2.

Input the Table

The R-Web command for inputting the Illinois student table data is:

x <- matrix(c(68, 56, 91, 40, 5 , 6, 61, 59), nc = 2, byrow=T)

This means input the cell counts by rows, where the table has 2 columns, (nc=2).

Obtaining Test Statistic & P-Val

chisq.test(x)This command produces the Pearson 2 test

statistic, p-value, and degrees of freedom.

Contributions to 2

To find the cells that contribute most to the rejection of the Ho, type :

chisq.test(x)$residuals^2

Residuals

Type: chisq.test(x)$residuals

Observed & Expected Tables

Type: chisq.test(x)$observed

chisq.test(x)$expected These will help you understand the table

behavior.

Example

Submit these commands:

x <- matrix(c(68, 56, 91, 40, 5 , 6, 61, 59), nc = 2, byrow=T)

chisq.test(x)

chisq.test(x)$residuals^2

chisq.test(x)$residuals

chisq.test(x)$observed

chisq.test(x)$expected

Pearson's Chi-squared test data: x X-squared = 10.8267, df = 3, p-value = 0.0127 Rweb:> chisq.test(x)$residuals^2 [,1] [,2] [1,] 0.2534128 0.3541483 [2,] 2.8067873 3.9225288 [3,] 0.3109070 0.4344974 [4,] 1.1447050 1.5997431 Rweb:> chisq.test(x)$residuals [,1] [,2] [1,] -0.5034012 0.5951036 [2,] 1.6753469 -1.9805375

[3,] -0.5575903 0.6591641 [4,] -1.0699089 1.2648095 Rweb:> chisq.test(x)$observed [,1] [,2] [1,] 68 56 [2,] 91 40 [3,] 5 6 [4,] 61 59 Rweb:> chisq.test(x)$expected [,1] [,2] [1,] 72.279793 51.720207 [2,] 76.360104 54.639896 [3,] 6.411917 4.588083 [4,] 69.948187 50.051813

Example Conclusion

First, note the p-value for the test is small and this means evidence the proportions female differ between the four majors.

How do they differ?From the contributions to 2 and the residuals we

see the second row (Administration) has the biggest discrepancy between observed and expected counts.

From either the residuals or the observed vs expected tables we see that females are much more likely to major in administration than would be expected and males less likely than expected under the Ho.

The administration proportion is much higher than the others for females, and this is the primary major that produces the evidence that the majors differ.

14. Nonparametric Statistics

Learning Objectives

1. Distinguish Parametric & Nonparametric Test Procedures

2. Explain a Variety of Nonparametric Test Procedures

3. Solve Hypothesis Testing Problems Using Nonparametric Tests

4. Compute Spearman’s Rank Correlation

Hypothesis Testing Procedures

HypothesisTesting

Procedures

NonparametricParametric

Z Test

Kruskal-WallisH-Test

WilcoxonRank Sum

Test

t Test One-WayANOVA

Parametric Test Procedures

1. Involve Population ParametersExample: Population Mean

2. Require Interval Scale or Ratio ScaleWhole Numbers or FractionsExample: Height in Inches (72, 60.5,

54.7)3. Have Stringent Assumptions

Example: Normal Distribution4. Examples: Z Test, t Test, 2 Test

Nonparametric Test Procedures

1. Do Not Involve Population ParametersExample: Probability Distributions,

Independence2. Data Measured on Any Scale

Ratio or Interval

Ordinal Example: Good-Better-Best

Nominal Example: Male-Female

3. Example: Wilcoxon Rank Sum Test

Advantages of Nonparametric Tests

1. Used With All Scales2. Easier to Compute

Developed Originally Before Wide Computer Use

3. Make Fewer Assumptions4. Need Not Involve Population

Parameters5. Results May Be as Exact as Parametric

Procedures

Recommended