107
Chapter 7 Please pick up an assignment sheet and notes packet

Chapter 7

  • Upload
    bunme

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Chapter 7. Please pick up an assignment sheet and notes packet. Random Variable -. A grocery store manager might be interested in the number of broken eggs in each carton (dozen of eggs). OR An environmental scientist might be interested in the amount of ozone in an air sample. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 7

Chapter 7

Please pick up an assignment sheet and

notes packet

Page 2: Chapter 7

Random Variable -• A numerical variable whose value

depends on the outcome of a chance experiment

• Associates a numerical value with each outcome of a chance experiment

• Two types of random variables– Discrete– Continuous

A grocery store manager might be interested in the number of broken

eggs in each carton (dozen of eggs).OR

An environmental scientist might be interested in the amount of ozone in

an air sample.

Since these values change and are subject to some uncertainty, these are examples of random variables.

Page 3: Chapter 7

Two Types of Random Variables:• Discrete – its set of possible values is

a collection of isolated points along a number line

• Continuous - its set of possible values includes an entire interval on a number line

This is typically a “count” of something

This is typically a “measure” of

something

In this chapter, we will look at different

distributions of discrete and continuous

random variables.

Page 4: Chapter 7

Identify the following variables as discrete or continuous1. The number of broken eggs in each

carton

2. The amount of ozone in samples of air

3. The weight of a pineapple

4. The amount of time a customer spends in a store

5. The number of gas pumps in use

Discrete

Discrete

Continuous

Continuous

Continuous

Page 5: Chapter 7

Probability Distributions for Discrete Random

VariablesProbability distribution is a model that describes the

long-run behavior of a variable.

Page 6: Chapter 7

Number of Pets

Prob

abilit

y

In a Wolf City (a fictional place), regulations prohibit no more than five dogs or cats per household. Let x = the number of dogs and cats in a randomly selected household in Wolf City

x 0 1 2 3 4 5P(x) .26 .31 .21 .13 .06 .03

Is this variable discrete or continuous?What are the possible values for x?

The Department of Animal Control has collected data over the course of

several years. They have estimated the long-run probabilities for the

values of x.

What do you notice about the sum of these probabilities?

This is called a discrete probability distribution. It can also be

displayed in a histogram with the probability on the vertical axis.

Page 7: Chapter 7

Discrete Probability Distribution1) Gives the probabilities associated with

each possible x value

2) Each probability is the long-run relative frequency of occurrence of the corresponding x-value when the chance experiment is performed a very large number of times

3) Usually displayed in a table, but can be displayed with a histogram or formula

Page 8: Chapter 7

Properties of Discrete Probability Distributions

1) For every possible x value,

0 < P(x) < 1.

2) For all values of x, S P(x) = 1.

Page 9: Chapter 7

Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City

x 0 1 2 3 4 5P(x) .26 .31 .21 .13 .06 .03

What is the probability that a randomly selected household in Wolf City has at most 2 pets?

What does this mean?

P(x < 2) =

Just add the probabilities for 0, 1, and 2

.26 + .31 + .21 = .78

Page 10: Chapter 7
Page 11: Chapter 7
Page 12: Chapter 7

• Finish the dog and cat probability problems on the second page of your notes

Page 13: Chapter 7

Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City

x 0 1 2 3 4 5P(x) .26 .31 .21 .13 .06 .03

What is the probability that a randomly selected household in Wolf City has less than 2 pets?

What does this mean?

P(x < 2) =

Notice that this probability does NOT include 2!

.26 + .31 = .57

Page 14: Chapter 7

Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City

x 0 1 2 3 4 5P(x) .26 .31 .21 .13 .06 .03

What is the probability that a randomly selected household in Wolf City has more than 1 but no more than 4 pets?

What does this mean?

P(1 < x < 4) =

.21 + .13 + .06 = .40

When calculating probabilities for discrete random variables, you MUST pay close attention to whether certain values are

included (< or >) or not included (< or >) in the calculation.

Page 15: Chapter 7

Suppose that each of four random selected customers purchasing a hot tub at a certain store chooses either an electric (E) or a gas (G) model. Assume that these customers makes their choices independently of one another and that 40% of all customers select an electric model. This implies that for any particular one of the four customers P(E) = 0.40 and P(G) = 0.60. One possible experimental outcome is EFFE, where the first and fourth customers select electric models and the other two choose gas models. Because the customers make their choices independently the multiplication rule for independent events implies that

P(EGGE) = P(1st chooses E AND 2nd chooses G AND 3rd chooses G AND 4th chooses E) = = P(E)P(G)P(G)P(E) = (0.4)(0.6)(0.6)(0.4) = 0.0576

Page 16: Chapter 7

Suppose that each of four random selected customers purchasing a hot tub at a certain store chooses either an electric (E) or a gas (G) model. Assume that these customers makes their choices independently of one another and that 40% of all customers select an electric model. This implies that for any particular one of the four customers P(E) = 0.40 and P(G) = 0.60. One possible experimental outcome is EFFE, where the first and fourth customers select electric models and the other two choose gas models. Because the customers make their choices independently the multiplication rule for independent events implies that

P(EGGE) = P(1st chooses E AND 2nd chooses G AND 3rd chooses G AND 4th chooses E) = = P(E)P(G)P(G)P(E) = (0.4)(0.6)(0.6)(0.4) = 0.0576

Outcomes and Probabilities for Hot Tub ModelsOutcome Probability # of electric

models sold  Outcome Probability # of electric

models soldGGGG 0.1296 0   GEEG 0.0576 2EGGG 0.0864 1   GEGE 0.0576 2GEGG 0.0864 1   GGEE 0.0576 2GGEG 0.0864 1   GEEE 0.0384 3GGGE 0.0864 1   EGEE 0.0384 3EEGG 0.0576 2   EEGE 0.0384 3EGEG 0.0576 2   EEEG 0.0384 3EGGE 0.0576 2   EEEE 0.0256 4

Page 17: Chapter 7

Outcomes and Probabilities for Hot Tub ModelsOutcome Probability # of electric

models sold  Outcome Probability # of electric

models soldGGGG 0.1296 0   GEEG 0.0576 2EGGG 0.0864 1   GEGE 0.0576 2GEGG 0.0864 1   GGEE 0.0576 2GGEG 0.0864 1   GEEE 0.0384 3GGGE 0.0864 1   EGEE 0.0384 3EEGG 0.0576 2   EEGE 0.0384 3EGEG 0.0576 2   EEEG 0.0384 3EGGE 0.0576 2   EEEE 0.0256 4

P(x = 0) = 0.1296p(x = 1) =0.3456p(x = 2) = 0.3456p(x = 3) = 0.1536p(x = 4) = 0.0256

P(2 ≤ x ≤ 4) =0.3456 + 0.1536 + 0.0256 = 0.5248

P(x ≤ 3) = 0.1296 + 0.3456 + 0.3456 + 0.1536 = 0.9744

0 1 2 3 40

0.050.1

0.150.2

0.250.3

0.350.4

Probability of Selling X Electric Hot Tubs per Four Customers

Number of Electric Hot Tubs Purhased Per Four Customers

Rela

tive

Prob

abili

ty

Page 18: Chapter 7

Probability Distributions for Continuous Random

Variables

Page 19: Chapter 7

Consider the random variable:x = the weight (in pounds) of a full-term newborn childSuppose that weight is reported to the nearest pound. The following probability histogram displays the distribution of weights.Now suppose that weight is reported to the nearest 0.1 pound. This would be the probability histogram.

What type of variable is this?The area of the rectangle centered

over 7 pounds represents the probability 6.5 < x < 7.5

What is the sum of the areas of all the rectangles?Notice that the rectangles are

narrower and the histogram begins to have a smoother appearance.

If weight is measured with greater and greater accuracy, the

histogram approaches a smooth curve.

The shaded area represents the probability 6 < x < 8.This is an example

of a density curve.

Page 20: Chapter 7

Probability Distributions for Continuous Variables

• Is specified by a curve called a density curve.

• The function that describes this curve is denoted by f(x) and is called the density function.

• The probability of observing a value in a particular interval is the area under the curve and above the given interval.

Page 21: Chapter 7

Properties of continuous probability distributions

1. f(x) > 0 (the curve cannot dip below the horizontal axis)

2. The total area under the density curve equals one.

Page 22: Chapter 7

Let x denote the amount of gravel sold (in tons) during a randomly selected week at a particular sales facility. Suppose that the density curve has a height f(x) above the value x, where

The density curve is shown in the figure:

otherwise010)1(2)( xx

xf

1

1

2

Tons

Density

Page 23: Chapter 7

1 – ½(0.5)(1) = .75

Gravel problem continued . . .What is the probability that at most ½ ton of gravel is sold during a randomly selected week?

1

1

2

Tons

Density

P(x < ½) = The probability would be the

shaded area under the curve and above the interval from 0 to

0.5.

This area can be found by use the formula for the area of a

trapezoid: hbbA 212

1

OR, more easily, by finding the area of the triangle,

and subtracting that area from 1.

bhA 21

Page 24: Chapter 7

0

Gravel problem continued . . .What is the probability that exactly ½ ton of gravel is sold during a randomly selected week?

1

1

2

Tons

Density

P(x = ½) = The probability would be the area

under the curve and above 0.5.How do we find the area of a line

segment?Since a line segment has NO area, then the probability that exactly ½ ton is sold equals 0.

Page 25: Chapter 7

= 1 – ½(0.5)(1) = .75

Gravel problem continued . . .What is the probability that less than ½ ton of gravel is sold during a randomly selected week?

1

1

2

Tons

Density

P(x < ½) =

Does the probability change whether the ½ is included or

not?

P(x < ½)

This is different than discrete probability

distributions where it does change the

probability whether a value is included or

not!

Page 26: Chapter 7

Suppose x is a continuous random variable defined as the amount of time (in minutes) taken by a clerk to process a certain type of application form. Suppose x has a probability distribution with density function:

The following is the graph of f(x), the density curve:

otherwise0645.)( x

xf

0.5

4 5 6

Time (in minutes)

Dens

ity

Page 27: Chapter 7

Application Problem Continued . . .What is the probability that it takes more than 5.5 minutes to process the application form?

0.5

4 5 6

Time (in minutes)

Dens

ityP(x > 5.5) =.5(.5) = .25

Find the probability by calculating the area of the

shaded region (base × height).

When the density is constant over an interval (resulting in a horizontal density

curve), the probability distribution is called a uniform distribution.

Page 28: Chapter 7

Other Density CurvesSome density curves resemble the one below. Integral calculus is used to find the area under the these curves. Don’t worry – we will use tables (with the values already calculated). We can also use calculators or statistical software to find the area.

Page 29: Chapter 7

The probability that a continuous random variable x lies between a lower limit a and an upper limit b is

P(a < x < b) = (cumulative area to the left of b) – (cumulative area to the left of a)

P(a < x < b) = P(x < b) – P(x < a)

This will be useful later in this chapter!

Page 30: Chapter 7
Page 31: Chapter 7
Page 32: Chapter 7

Means and Standard Deviations of Probability Distributions• The mean value of a random variable

x, denoted by mx, describes where the probability distribution of x is centered.

• The standard deviation of a random variable x, denoted by sx, describes variability in the probability distribution

Page 33: Chapter 7

Mean and Variance for Discrete Probability Distributions• Mean is sometimes referred to as the

expected value (denoted E(x)).

• Variance is calculated using

• Standard deviation is the square root of the variance.

xpμx

ms px x22

Page 34: Chapter 7

mx = 1.51 pets

Dogs and Cats Revisited . . . Let x = the number of dogs and cats in a randomly selected household in Wolf City

x 0 1 2 3 4 5P(x) .26 .31 .21 .13 .06 .03

What is the mean number of pets per household in Wolf City?

First multiply each x-value times its corresponding probability.

xP(x) 0 .31 .42 .39 .24 .15

Next find the sum of these values.

xP(x) 0 + .31 + .42 + .39 + .24 + .15

Page 35: Chapter 7

sx2 = (0-1.51)2(.26) + (1-

1.51)2(.31) + (2-1.51)2(.21) + (3-1.51)2(.13) + (4-1.51)2(.06) + (5-1.51)2(.03) = 1.7499

Dogs and Cats Revisited . . . Let x = the number of dogs or cats per household in Wolf City

x 0 1 2 3 4 5P(x) .26 .31 .21 .13 .06 .03

What is the standard deviation of the number of pets per household in Wolf City?

First find the deviation of each x-value from the mean. Then

square these deviations.

Next multiply by the corresponding probability. Then

add these values.

This is the variance – take the square root of this value.

sx = 1.323 pets

Page 36: Chapter 7

Mean and Variance for Continuous Random VariablesFor continuous probability distributions, mx and sx can be defined and computed using methods from calculus.

• The mean value mx locates the center of the continuous distribution.

• The standard deviation, sx, measures the extent to which the continuous distribution spreads out around mx.

Page 37: Chapter 7

A company receives concrete of a certain type from two different suppliers.Let x = compression strength of a randomly selected batch from Supplier 1

y = compression strength of a randomly selected batch from Supplier 2

Suppose that mx = 4650 pounds/inch2 sx = 200 pounds/inch2

my = 4500 pounds/inch2 sy = 275 pounds/inch2

The first supplier is preferred to the second both in terms of mean

value and variability.

45004300 4700 4900

my mx

Page 38: Chapter 7

Suppose Wolf City Grocery had a total of 14 employees. The following are the monthly salaries of all the employees.

The mean and standard deviation of the monthly salaries are

mx = $1700 and sx = $603.56

Suppose business is really good, so the manager gives everyone a $100 raise per month. The new mean and standard deviation would be

mx = $1800 and sx = $603.56

3500 1200 1900 1400 2100 1800 12001300 1500 1700 2300 1200 1400 1300

What happened to the

means?

What happened

to the standard

deviations?

What would happen to the mean and standard deviation if we had to

deduct $100 from everyone’s salary because of business being bad?

Let’s graph boxplots of these monthly salaries to see what happens to the distributions . . .We see that the distribution just shifts to the right 100 units but the spread is the same.

Page 39: Chapter 7

Wolf City Grocery Continued . . .

mx = $1700 and sx = $603.56

Suppose the manager gives everyone a 20% raise - the new mean and standard deviation would be

mx = $2040 and sx = $724.27

Notice that both the mean and standard deviation increased by 1.2.

Let’s graph boxplots of these monthly salaries to see what happens to the distributions . . .Notice that multiplying by a constant stretches the distribution, thus, changing the standard deviation.

Page 40: Chapter 7

Mean and Standard Deviation of Linear functions

If x is a random variable with mean, mx, and standard deviation, sx, and a and b are numerical constants, and the random variable y is defined by

andbxay

xyxbxay

xbxay

bb

ba

sssss

mmm

or2222

Page 41: Chapter 7

Consider the chance experiment in which a customer of a propane gas company is randomly selected. Let x be the number of gallons required to fill a propane tank. Suppose that the mean and standard deviation is 318 gallons and 42 gallons, respectively. The company is considering the pricing model of a service charge of $50 plus $1.80 per gallon. Let y be the random variable of the amount billed. What is the equation for y?What are the mean and standard deviation for the amount billed?

my = 50 + 1.8(318) = $622.40

y = 50 + 1.8x

sy = 1.8(42) = $75.60

Page 42: Chapter 7

Suppose we are going to play a game called Stat Land! Players spin the two spinners below and move the sum of the

two numbers.

mA = 2.5 mB = 3.5sA = 1.118 sB = 1.708

List all the possible sums (A + B).2 3 4 5 6 73 4 5 6 7 84 5 6 7 8 95 6 7 8 9 10

?Move 1 s

1 234

Spinner A

2

45

61 3

Spinner B

Here are the mean and standard deviation for

each spinner.

Find the mean and standard deviation for

these sums.

mA+B = 6sA+B =2.041

Notice that the mean of the sums is the sum of the

means!

How are the standard deviations

related?

Not sure – let’s think about it and return in just a few minutes!

Page 43: Chapter 7

Stat Land Continued . . .Suppose one variation of the game had players move the difference of the spinners

mA = 2.5 mB = 3.5sA = 1.118 sB = 1.708

List all the possible differences (B - A).0 -1 -2 -31 0 -1 -22 1 0 -13 2 1 04 3 2 15 4 3 2

?Move 1 s

1 234

Spinner A

2

45

61 3

Spinner BFind the mean and standard deviation for

these differences.

mB-A= 1sB-A =2.041

Notice that the mean of the

differences is the difference of the

means!

WOW – this is the same value as the standard deviation

of the sums!

How do we find the standard deviation for

the sums or differences?

Page 44: Chapter 7

Mean and Standard Deviations for Linear CombinationsIf x1, x2, …, xn are random variables with means m1, m2, …, mn and variances s1

2, s22, …,

sn2, respectively, and

y = a1x1 + a2x2 + … + anxn

then

22222

221 ...

21 nxnxxy aaa ssss

nxnxxy aaa mmmm ...21 21

This result is true regardless of whether the x’s are independent.

This result is true ONLY if the x’s are independent.

Page 45: Chapter 7
Page 46: Chapter 7

A commuter airline flies small planes between San Luis Obispo and San Francisco. For small planes the baggage weight is a concern. Suppose it is known that the variable x = weight (in pounds) of baggage checked by a randomly selected passenger has a mean and standard deviation of 42 and 16, respectively.Consider a flight on which 10 passengers, all traveling alone, are flying.The total weight of checked baggage, y, is

y = x1 + x2 + … + x10

Page 47: Chapter 7

Airline Problem Continued . . . mx = 42 and sx = 16

The total weight of checked baggage, y, is y = x1 + x2 + … + x10

What is the mean total weight of the checked baggage?

mx = m1 + m2 + … + m10

= 42 + 42 + … + 42 = 420 pounds

Page 48: Chapter 7

Airline Problem Continued . . . mx = 42 and sx = 16

The total weight of checked baggage, y, is y = x1 + x2 + … + x10

What is the standard deviation of the total weight of the checked baggage?

sx2

= sx12 + sx2

2 + … + sx102

= 162 + 162 + … + 162 = 2560 pounds

s = 50.596 pounds

Since the 10 passengers are all traveling alone, it is reasonable to

think that the 10 baggage weights are unrelated and therefore independent.

To find the standard deviation, take the square root of this value.

Page 49: Chapter 7

The Attila Barbell Company makes bars for weight lifting. The weights of the bars are independent and are normally distributed with a mean of 720 ounces (45 pounds) and a standard deviation of 4 ounces. The bars are shipped 10 in a box to the retailers. The weights of the empty boxes are normally distributed with a mean of 320 ounces and a standard deviation of 8 ounces. The weights of the boxes filled with 10 bars are expected to be normally distributed with a mean of 7520 ounces and a standard deviation of:

Page 50: Chapter 7

The Attila Barbell Company makes bars for weight lifting. The weights of the bars are independent and are normally distributed with a mean of 720 ounces (45 pounds) and a standard deviation of 4 ounces. The bars are shipped 10 in a box to the retailers. The weights of the empty boxes are normally distributed with a mean of 320 ounces and a standard deviation of 8 ounces. The weights of the boxes filled with 10 bars are expected to be normally distributed with a mean of 7520 ounces and a standard deviation of:

𝜎 𝑥+𝑦=√𝜎𝑥2+𝜎 𝑦

2

Page 51: Chapter 7

The Attila Barbell Company makes bars for weight lifting. The weights of the bars are independent and are normally distributed with a mean of 720 ounces (45 pounds) and a standard deviation of 4 ounces. The bars are shipped 10 in a box to the retailers. The weights of the empty boxes are normally distributed with a mean of 320 ounces and a standard deviation of 8 ounces. The weights of the boxes filled with 10 bars are expected to be normally distributed with a mean of 7520 ounces and a standard deviation of:

𝜎 𝑥+𝑦=√𝜎𝑥2+𝜎 𝑦

2

𝜎 𝑏𝑎𝑟𝑠 𝑎𝑛𝑑𝑏𝑜𝑥=√10(4 )2+82

Page 52: Chapter 7

Number of Courses

Probability (Number of Courses)*(Probability)

Mean Deviation Deviation^2 (Deviation^2)*(Probability)

1 0.02          2 0.03          3 0.09          4 0.25          5 0.40          6 0.16          7 0.05          Sum            Mean  Variance  Standard Deviation

 

Page 53: Chapter 7

Number of Courses

Probability (Number of Courses)*(Probability)

Mean Deviation Deviation^2 (Deviation^2)*(Probability)

1 0.02 0.02 4.66 -3.66 13.3956 0.2679122 0.03 0.06 4.66 -2.66 7.0756 0.2122683 0.09 0.27 4.66 -1.66 2.7556 0.2480044 0.25 1 4.66 -0.66 0.4356 0.10895 0.4 2 4.66 0.34 0.1156 0.046246 0.16 0.96 4.66 1.34 1.7956 0.2872967 0.05 0.35 4.66 2.34 5.4756 0.27378

Sum 1 4.66       1.4444Mean 4.66Variance 1.4444Standard Deviation

1.20

Page 54: Chapter 7

Number of Courses

Probability (Number of Courses)*(Probability)

Mean Deviation Deviation^2 (Deviation^2)*(Probability)

1 0.02 0.02 4.66 -3.66 13.3956 0.2679122 0.03 0.06 4.66 -2.66 7.0756 0.2122683 0.09 0.27 4.66 -1.66 2.7556 0.2480044 0.25 1 4.66 -0.66 0.4356 0.10895 0.4 2 4.66 0.34 0.1156 0.046246 0.16 0.96 4.66 1.34 1.7956 0.2872967 0.05 0.35 4.66 2.34 5.4756 0.27378

Sum 1 4.66       1.4444Mean 4.66Variance 1.4444Standard Deviation

1.20

P(# of courses > xm ) = p(# of courses > 4.66) = p(5) + p(6) + p(7) = 0.61

P( xm - 2* xs < # of courses < xm + 2* xs ) = p(4.66 – 2.4 < # of courses < 4.66 + 2.40)

= p(2.26 < # of courses < 7.06) = p(3) + p(4) + p(5) + p(6) +p(7) = 0.95 p( xm - 2* xs > # of courses OR # of courses > xm + 2* xs )

= p(# of courses < 2.26 OR # of courses > 7.06) = p(2) + p(1) = 0.05

Page 55: Chapter 7
Page 56: Chapter 7

Special DistributionsTwo Discrete Distributions:Binomial and Geometric

One Continuous Distribution:Normal Distributions

Page 57: Chapter 7

Suppose we decide to record the gender of the next 25 newborns at a particular hospital.

What is the chance

that at least 15 are

female?What is the chance that between 10 and 15 are female?

Out of the 25

newborns, how many

can we expect to be

female?These questions can be

answered using a binomial distribution.

Page 58: Chapter 7
Page 59: Chapter 7

Properties of a Binomial Experiment1.There are a fixed number of trials2.Each trial results in one of two mutually

exclusive outcomes. (success/failure)3.Outcomes of different trials are

independent4.The probability that a trial results in

success is the same for all trialsThe binomial random variable x is defined as x = the number of successes observed when

a binomial experiment is performed

We use n to denote the fixed number of trials.

Page 60: Chapter 7

Are these binomial distributions?1) Toss a coin 10 times and count

the number of headsYes

2) Deal 10 cards from a shuffled deck and count the number of red cards

No, probability does not remain constant

3) The number of tickets sold to children under 12 at a movie theater in a one hour period

No, no fixed number

Page 61: Chapter 7

Binomial Probability Formula:Let n = number of independent trials in a binomial experimentp = constant probability that any trial results in a success

xnx ppxnx

nxP

)1()!(!!)(

Where:

)!(!!

xnxnCx

nxn

Appendix Table 9 can be used to find binomial probabilities.Technology, such as calculators and

statistical software, will also perform this calculation.

Page 62: Chapter 7

Instead of recording the gender of the next 25 newborns at a particular hospital, let’s record the gender of the next 5 newborns at this hospital.Is this a binomial experiment?Yes, if the births were not multiple births (twins, etc).Define the random variable of interest.x = the number of females born out of the next 5 birthsWhat are the possible values of x?x 0 1 2 3 4 5

Will a binomial random variable always include the value of 0?

What is the probability of “success”?

What will the largest value of the binomial random value be?

Page 63: Chapter 7

Newborns Continued . . .

What is the probability that exactly 2 girls will be born out of the next 5 births?

What is the probability that less than 2 girls will be born out of the next 5 births?

3125.5.05.0)2( 3225 CxP

)1()0()2( ppxP

1875.5.5.5.5. 41

1550

05

CC

Page 64: Chapter 7

mx = 0(.03125) + 1(.15625) + 2(.3125) + 3(.3125) + 4(.15625) + 5(.03125) =2.5

Newborns Continued . . .Let’s construct the discrete probability distribution table for this binomial random variable:

What is the mean number of girls born in the next five births?

x 0 1 2 3 4 5p(x) .03125 .1562

5.3125 .3125 .1562

5.0312

5

Since this is a discrete distribution, we could use:

m xpx

Notice that this is the same as multiplying n × p

Page 65: Chapter 7

Formulas for mean and standard deviation of a binomial distribution

pnpnp

x

x

s

m

1

Page 66: Chapter 7

Newborns Continued . . .

How many girls would you expect in the next five births at a particular hospital?

What is the standard deviation of the number of girls born in the next five births?

5.2)5(.5 m npx

118.1)5)(.5(.5)1(

s pnpx

Page 67: Chapter 7

Remember, in binomial distributions, trials should be independent.However, when we sample, we typically sample without replacement, which would mean that the trials are not independent. . .In this case, the number of success observed would not be a binomial distribution but rather hypergeometric distribution.

But when the sample size, n, is small and the population size, N, is large, probabilities calculated using binomial distributions and hypergeometric distributions are VERY close!

The calculation for probabilities in a hypergeometric distribution are

even more tedious than the binomial formula!

When sampling without replacement if n is at most 5% of N, then the

binomial distribution gives a good approximation to the probability

distribution of x.

Page 68: Chapter 7

• Suppose a particular breed of dog gives birth to a male dog 59% of the time and gives birth to a female dog 41% of the time.

• Let M = event that a male pup is born• Let F = event that a female pup is born• Let x = the number of male pups born in a litter of four pups• Fill in the following table:

Outcome Probability Number of Male Pups (x)

  Outcome Probability Number of Male Pups (x)

FFFF       FMMF    

MFFF       FMFM    

FMFF       FFMM    

FFMF       MMMF    

FFFM       MMFM    

MMFF       MFMM    

MFMF       FMMM    

MFFM       MMMM    

Page 69: Chapter 7

Outcome Probability Number of Male Pups (x)

  Outcome Probability Number of Male Pups (x)

FFFF 0.0283 0   FMMF 0.0585 2MFFF 0.0406 1   FMFM 0.0585 2FMFF 0.0406 1   FFMM 0.0585 2FFMF 0.0406 1   MMMF 0.0842 3FFFM 0.0406 1   MMFM 0.0842 3MMFF 0.0585 2   MFMM 0.0842 3MFMF 0.0585 2   FMMM 0.0842 3MFFM 0.0585 2   MMMM 0.1212 4

0 1 2 3 40

0.1

0.2

0.3

0.4

Probability of Getting a Number of Male Pups in a Litter of Four

Pups

Number of Male Pups in a LItter of Four

Prob

abili

ty

Page 70: Chapter 7

Newborns Revisited . . .

Suppose we were not interested in the number of females born out of the next five births, but which birth would result in the first female being born?

How is this question different from a binomial distribution?

Page 71: Chapter 7

Properties of Geometric Distributions:• There are two mutually exclusive

outcomes that result in a success or failure• Each trial is independent of the others• The probability of success is the same for

all trials.

A geometric random variable x is defined as x = the number of trials UNTIL the FIRST

success is observed ( including the success).

x 1 2 3 4

So what are the possible values of x

How far will this go?To infinity

. . .

Page 72: Chapter 7
Page 73: Chapter 7

Probability Formula for the Geometric Distribution

Letp = constant probability that any trial results in a success

Where x = 1, 2, 3, …

ppxp x 1)1()(

Page 74: Chapter 7

Suppose that 40% of students who drive to campus at your school or university carry jumper cables. Your car has a dead battery and you don’t have jumper cables, so you decide to stop students as they are headed to the parking lot and ask them whether they have a pair of jumper cables. Let x = the number of students stopped before finding one with a pair of jumper cables

Is this a geometric distribution?Yes

Page 75: Chapter 7

Jumper Cables Continued . . . Let x = the number of students stopped before finding one with a pair of jumper cablesp = .4What is the probability that third student stopped will be the first student to have jumper cables?

What is the probability that at most three student are stopped before finding one with jumper cables?

P(x = 3) = (.6)2(.4) = .144

P(x < 3) = P(1) + P(2) + P(3) =(.6)0(.4) + (.6)1(.4) + (.6)2(.4)

= .784

Page 76: Chapter 7

Welcome back! Please pick up:• Notes Packet• Assignment Sheet• t-score table

Page 77: Chapter 7

Normal Distributions• Continuous probability distribution• Symmetrical bell-shaped (unimodal) density

curve defined by m and s• Area under the curve equals 1• Probability of observing a value in a

particular interval is calculated by finding the area under the curve

• As s increases, the curve flattens & spreads out

• As s decreases, the curve gets taller and thinner

How is this done mathematically?To overcome the need for calculus, we rely

on technology or on a table of areas for the standard normal distribution

Page 78: Chapter 7

A

B

Do these two normal curves have the same mean? If so, what is it?Which normal curve has a standard deviation of 3?Which normal curve has a standard deviation of 1?

6

YES

B

s s

A

Page 79: Chapter 7

Notice that the normal curve is curving downwards from the center (mean) to points that are one standard deviation on either side of the mean. At those points, the normal curve begins to turn upward.

Page 80: Chapter 7

Standard Normal Distribution• Is a normal distribution with m = 0 and s

= 1• It is customary to use the letter z to

represent a variable whose distribution is described by the standard normal curve (or z curve).

Page 81: Chapter 7

Using the Table of Standard Normal (z) Curve Areas• For any number z*, from -3.89 to 3.89 and

rounded to two decimal places, the Appendix Table 2 gives the area under the z curve and to the left of z*.

P(z < z*) = P(z < z*)

Where the letter z is used to represent a random variable whose distribution is the standard normal distribution.

To use the table:

• Find the correct row and column (see the following example)

• The number at the intersection of that row and column is the probability

Page 82: Chapter 7

Suppose we are interested in the probability that z* is less than -1.62.

P(z < -1.62) =

z* .00 .01 .02

-1.7 .0446 .0436 .0427 .0418-1.6 .0548 .0537 .0526 .0516-1.5 .0668 .0655 .0643 .0618

… … … … …

.0526In the table of areas:• Find the row labeled -1.6• Find the column labeled 0.02• Find the intersection of the row and column

Page 83: Chapter 7

Suppose we are interested in the probability that z* is less than 2.31.

P(z < 2.31) =

z* .00 .01 .02

2.2 .9861 .9864 .9868 .98712.3 .9893 .9896 .9898 .99012.4 .9918 .9920 .9922 .9925

… … … … …

.9896

Page 84: Chapter 7

Suppose we are interested in the probability that z* is greater than 2.31.

P(z > 2.31) =

z* .00 .01 .02

2.2 .9861 .9864 .9868 .98712.3 .9893 .9896 .9898 .99012.4 .9918 .9920 .9922 .9925

… … … … …

1 - .9896 = .0104

The Table of Areas gives the area to the LEFT of the z*.

To find the area to the right, subtract the value in the table from 1

Page 85: Chapter 7

Suppose we are interested in the finding the z* for the smallest 2%.

P(z < z*) = .02

z* .03 .04 .05

-2.1 .0162 .0158 .0154-2.0 .0207 .0202 .0197-1.9 .0262 .0256 .0250

… … … … …

z* = -2.08z*

To find z*:

Look for the area .0200 in the body of the Table. Follow the row and column

back out to read the z-value.

………

Since .0200 doesn’t appear in the body of the Table, use the value

closest to it.

Page 86: Chapter 7

Suppose we are interested in the finding the z* for the largest 5%.

P(z > z*) = .05

z* .03 .04 .05

1.5 .9382 .9398 .94061.6 .9495 .9505 .95151.7 .9591 .9599 .9608

… … … … …

z* = 1.645z*

Remember the Table of Areas gives the area to the LEFT of z*.

1 – (area to the right of z*)Then look up this value in the body of

the table.………

.95Since .9500 is exactly between .9495 and .9505, we can average the z* for

each of these

Page 87: Chapter 7

Finding Probabilities for Other Normal Curves

• To find the probabilities for other normal curves, standardize the relevant values and then use the table of z areas.

• If x is a random variable whose behavior is described by a normal distribution with mean m and standard deviation s , then

P(x < b) = P(z < b*)P(x > a) = P(z > a*)

P(a < x < b) = P(a* < z < b*)Where z is a variable whose distribution is standard

normal and

sm

bb*

sm

aa*

Page 88: Chapter 7

Data on the length of time to complete registration for classes using an on-line registration system suggest that the distribution of the variable x = time to registerfor students at a particular university can well be approximated by a normal distribution with mean m = 12 minutes and standard deviation s = 2 minutes.

Page 89: Chapter 7

Registration Problem Continued . . .x = time to registerm = 12 minutes and s = 2 minutesWhat is the probability that it will take a randomly selected student less than 9 minutes to complete registration?

P(x < 9) =

5.12129*

b

Look this value up in the table.

Standardized this value.

.0668

9

Page 90: Chapter 7

Registration Problem Continued . . .x = time to registerm = 12 minutes and s = 2 minutesWhat is the probability that it will take a randomly selected student more than 13 minutes to complete registration?

P(x > 13) =

5.21213*

a

Look this value up in the table and subtract from 1.

Standardized this value.

1 - .6915 = .3085

13

Page 91: Chapter 7

Registration Problem Continued . . .x = time to registerm = 12 minutes and s = 2 minutesWhat is the probability that it will take a randomly selected student between 7 and 15 minutes to complete registration?

P(7 < x < 15) =

5.121215*

a

Look these values up in the table and subtract

(value for a*) – (value for b*)

Standardized these values.

.9332 - .0062 = .9270

5.22127*

b 7 15

Page 92: Chapter 7

Registration Problem Continued . . .x = time to registerm = 12 minutes and s = 2 minutesBecause some students do not log off properly, the university would like to log off students automatically after some time has elapsed. It is decided to select this time so that only 1% of students will be automatically logged off while still trying to register. What time should the automatic log off be set at?P(x > a*) = .01

Use the formula for standardizing to find x.

Look up the area to the left of a* in the table.

a* = 16.66

a*.01

21233.2

x .99

Page 93: Chapter 7
Page 94: Chapter 7

Ways to Assess NormalitySome of the most frequently used statistical methods are valid only when x1, x2, …, xn has come from a population distribution that at least is approximately normal. One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data.A normal probability plot is a scatterplot of (normal score, observed values) pairs.

What should happen if our

data set is normally

distributed?

Page 95: Chapter 7

Consider a random sample with n = 5.To find the appropriate normal scores for a sample of size 5, divide the standard normal curve into 5 equal-area regions.

Why are these regions not the same

width?

Each region has an area equal to 0.2.

Page 96: Chapter 7

1.28.524-.524

Consider a random sample with n = 5.Next – find the median z-score for each region.

-1.28 0

Why is the median not in the “middle”

of each region?

These are the normal scores that we would plot our data against.

We use technology (calculators or statistical software) to compute these

normal scores.

Page 97: Chapter 7

Ways to Assess NormalitySome of the most frequently used statistical methods are valid only when x1, x2, …, xn has come from a population distribution that at least is approximately normal. One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data.A normal probability plot is a scatterplot of (normal score, observed values) pairs. A strong linear pattern in a normal probability plot suggest that population normality is plausible.On the other hand, systematic departure from a straight-line pattern indicates that it is not reasonable to assume that the population distribution is normal.

Such as curvature which would indicate skewness in

the dataOr outliers

Page 98: Chapter 7

Let’s construct a normal probability plot. Since the values of the normal scores depend on the sample size n, the normal scores when n = 10 are below:

-1.539 -1.001 -0.656 -0.376 -0.123 0.123 0.376 0.656 1.001 1.539

The following data represent egg weights (in grams) for a sample of 10 eggs.

53.04 53.50 52.53 53.00 53.07 52.86 52.66 53.23 53.26 53.16

Sketch a scatterplot by pairing the smallest normal score with the

smallest observation from the data set & so on

-1.5 -1.0 -0.5 0.5 1.0 1.5

52.5

53.0

53.5

Since the normal probability plot is approximately linear, it is plausible that the distribution of egg weights

is approximately normal.

Page 99: Chapter 7

Using the Correlation Coefficient to Assess Normality•The correlation coefficient, r, can be calculated for the n (normal score, observed value) pairs.•If r is too much smaller than 1, then normality of the underlying distribution is questionable.

Consider these points from the weight of eggs data:(-1.539, 52.53) (-1.001, 52.66) (-.656,52.86) (-.376,53.00) (-.123, 53.04) (.123,53.07) (.376,53.16)

(.656,53.23) (1.001,53.26) (1.539,53.50)

Calculate the correlation coefficient for these points.

r = .986

How smaller is “too much smaller than

1”?

Values to Which r Can be Compared to Check for Normality

n 5 10 15 20 25 30 40 50 60 75Critical

r .832 .880 911 .929 .941 .949 .960 .966 .971 .976

Since r > critical r,

then it is plausible that the sample of egg weights came from

a distribution that was approximately normal.

Page 100: Chapter 7

Transforming Data to Achieve Normality• When the data is not normal, it is common

to use a transformation of the data. • For data that shows strong positive

skewness (long upper tail), a logarithmic transformation usually applied.

• Square root, cube root, and other transformations can also be applied to the data to determine which transformation best normalizes the data.

Page 101: Chapter 7

Consider the data set in Table 7.4 (page 463) about plasma and urinary AGT levels.

A histogram of the urinary AGT levels is strongly positively skewed.A logarithmic transformation is applied to the data. The histogram of the log urinary AGT levels is more symmetrical.

Page 102: Chapter 7
Page 103: Chapter 7

Using the Normal Distribution to Approximate a Discrete DistributionSuppose the probability distribution of a discrete random variable x is displayed in the histogram below.The probability of a particular

value is the area of the rectangle centered at that

value.

Often, a probability histogram can be well approximated by a normal curve. If so, it is customary to say that x has an approximately normal distribution.

6

Suppose this bar is centered at x = 6. The bar actually begins at 5.5 and

ends at 6.5. Theses endpoints will be used in calculations.

This is called a continuity correction.

Page 104: Chapter 7

Normal Approximation to a Binomial DistributionLet x be a random variable based on n trials and success probability p, so that:

If n and p are such that:np > 10 and n (1 – p) > 10

then x has an approximately normal distribution.

npm )1( pnp s

Page 105: Chapter 7

Premature babies are born before 37 weeks, and those born before 34 weeks are most at risk. A study reported that 2% of births in the United States occur before 34 weeks. Suppose that 1000 births are randomly selected and that the number of these births that occurred prior to 34 weeks, x, is to be determined.

np = 1000(.02) = 20 > 10 n(1 – p) = 1000(.98) = 980 > 10

Find the mean and standard deviation for the approximated normal distribution.

Can the distribution of x be approximated by a normal

distribution?

Since both are greater than 10,

the distribution of x can be

approximated by a normal distribution

20)02(.1000 m np427.4)98)(.02(.1000)1( s pnp

Page 106: Chapter 7

.8925 - .0089 = .8836

Premature Babies Continued . . .

m = 20 and s = 4.427

What is the probability that the number of babies in the sample of 1000 born prior to 34 weeks will be between 10 and 25 (inclusive)?

P(10 < x < 25) =24.1427.4

205.25*

b

To find the shaded area, standardize

the endpoints.37.2427.4

205.9*

a

Look up these values in the table and subtract the

probabilities.

Page 107: Chapter 7

Image for Question 9 Images for Question 10