39
Chapter 6 Chapter 6 Continuous Distributions Continuous Distributions The Gaussian (Normal) The Gaussian (Normal) Distribution Distribution

Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Embed Size (px)

Citation preview

Page 1: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Chapter 6Chapter 6

Continuous DistributionsContinuous Distributions

The Gaussian (Normal) DistributionThe Gaussian (Normal) Distribution

Page 2: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

When Discrete Distributions Aren’t When Discrete Distributions Aren’t EnoughEnough

• Discrete distributions are used in situations involving counts. (Others are possible but this is the vast majority.)

• What happens when you want to measure things?– Height– Weight– Miles per Gallon

• These aren’t counts. (Why not?)• Measurements involve rounding and precision.• When any level of precision is theoretically possible, we

call this a “continuous” variable.• The values come from the set of Real Numbers, ie, the

number line.

Page 3: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Real NumbersReal Numbers

--------------------------------------------------------∞ … -3 -2 -1 0 1 2 3 … ∞

• The Real Numbers include all possible values between the pictured integers.

• That includes rational numbers like ½, 1/3, 237/573, etc.

• It also includes irrational numbers like π and √2.• Real numbers have an infinite string of decimal

places. • There are “uncountably many” real numbers

between any two specified real numbers.

Page 4: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

IntervalsIntervals• An interval is a “piece” of the number line, or a

subset of the Real numbers. • There are no “gaps.” For any two numbers in it,

all real numbers between them are included.• Therefore an interval is described by its

endpoints—with a few special considerations.• The endpoints may or may not be included.

Round brackets are used to exclude the endpoints, square brackets to include them.Write in order. Ex.: [0,1], (9, 100), [3,6), (0,7].

• If an interval goes on to infinity, the ∞ or -∞ symbol is used with a round bracket, since infinity is not a number. Ex.: [0,∞), (-∞,-10).

Page 5: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Definition of Continuous R.V.Definition of Continuous R.V.• A continuous random variable takes on values in

some Real Interval.-------------------------------------------------------

-∞ … -3 -2 -1 0 1 2 3 … ∞• Suppose a r.v. X takes values in [0,1].

How many different values are there?• Suppose you assign some tiny probability to

each Real Number in [0,1]. What is the total probability?

• Suppose you divide [0,1] up into 10 subintervals.• Can you assign probabilities to these so the total

is 1?

Page 6: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Definition of Continuous R.V.Definition of Continuous R.V.

• This illustrates the problem with assigning probabilities to individual numbers, and the contrasting ease of assigning probability to intervals.

• Summary:– Any continuous distribution has infinitely many values. – No single point has a positive probability.– Said another way: Every individual value of a

continuous random variable has probability zero, and as such is an impossible event.

– Intervals can be assigned positive probability.

Page 7: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

The ParadoxThe Paradox• Obviously, a r.v., X, must take on some value, and if it

does, that value is not impossible (it has P>0).• We never actually “mean” a single value. Measurements

are given with a certain precision.• Example: temperature is continuous, but measured to

the nearest degree, “70” really means the interval [69.5,70.5).

• Intervals can have positive probability, and we can make them as small as we like.

• The fact that a continuous r.v. cannot take a single value agrees nicely with the fact that it is impossible to measure anything to the exact real number value.

• Instead, we divide up our scale using equal-width subintervals based on the precision of the measuring device. These subintervals have positive probability.

Page 8: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Continuous ProbabilitiesContinuous Probabilities

• Probabilities for a continuous random variable, X, are given by a probability function, P.

• P(X=k)=0 for any k.• We might find positive probabilities for

expressions like– P(X>k), Note: the interval is (k,∞)– P(X<k), or Note: the interval is (-∞,k)– P(a<X<b). Note: the interval is (a,b)

• A formula that gives probabilities for X would need to give probabilities for intervals, rather than single values.

Page 9: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Has anything prepared us for Has anything prepared us for thisthis??

• Tables of probability for discrete r.v.’s?Not if only individual values were given.

• Ungrouped histograms? No, same.• Grouped histograms? Let’s see….• Each bar represents

a frequency for aninterval, even thoughthis is a discreteexample.

Page 10: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

What about relative histograms?What about relative histograms?• Look at the histogram for the number of three’s

showing in a two-dice toss.• Notice it shows the probabilities for 3 discrete values.• Replace the discrete

values with intervals,[0,1), [1,2), and [2,3).

• Then this histogramlooks like it belongsto a continuousdistribution with values in [0,3). 0 1 2

Three's Face Up

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Re

lativ

e F

requ

en

cy

Page 11: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Making the LeapMaking the Leap• Change the horizontal axis to show that the bars

belong to each interval. • Each bar is 1 unit wide and its height represents the

probability for that interval.• Each bar is a rectangle, whose area is 1 x height.• Since the heights add up

to 1, the total area of theshaded region is 1.

• Make the transition to thecontinuous case: Insteadof representing probabilityby height, use area.

Page 12: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

What did we leap over?What did we leap over?• This has been more of an analogy than an explanation.

Many details that require calculus are glossed over.• The problem: can’t represent probabilities by height at a

point, because points all have probability zero.• Solution: switch to areas, where the bottom boundary

(on the x axis) represents an interval for which we want to determine probability. The area of the graph above that interval represents its probability.

• In calculus, these areas are called “definite integrals.” You don’t really need to know that, but you may come across the following symbol, which means “the integral from a to b.” b

a

Page 13: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Uniform DistributionUniform Distribution• A uniform distribution is defined for an interval outside of

which there is no positive probability. (This is to prevent the area from being infinite.)

• Inside that interval, it has the same probabilities for any sub-interval of a given size (they are “always the same”).

• A uniform distribution on the interval [0,3] is shown here. Note that the height is 1/3, because 3 x 1/3 = 1.

• However, we should not say that 1/3 is the probability of anything in particular.

Page 14: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Uniform ExamplesUniform Examples

• Let X be a uniform r.v. on the interval [1,5].• Find P(X>3), P(X<5), P(2<X<3), and P(0<X<3).• Solution: The width of the distribution is 4, so

the height of the graph is ¼ between 1 and 5. The area for any interval will be ¼ x the width of the interval.– P(X>3)=(5-3)/4=1/2– P(X<5)=(5-1)/4=1– P(2<X<3)=(3-2)/4=1/4– Careful! P(0<X<3)=P(1<X<3)=(3-1)/4=1/2

Page 15: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Uniform ExamplesUniform Examples

• Let X be a uniform r.v. on the interval [1,5].• Find P(X>3), P(X<5), P(2<X<3), and P(0<X<3).• Solution: The width of the distribution is 4, so

the height of the graph is ¼ between 1 and 5. The area for any interval will be ¼ x the width of the interval.– P(X>3)=(5-3)/4=1/2– P(X<5)=(5-1)/4=1– P(2<X<3)=(3-2)/4=1/4– Careful! P(0<X<3)=P(1<X<3)=(3-1)/4=1/2

Page 16: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

More Uniform ExamplesMore Uniform Examples• Let X be a uniform r.v. on the interval [0,8].

– Find P(X>3), P(X<5), and P(2<X<3 or 7<X<8).– Find the median and the 90th percentile.

• Solution: The width of the distribution is 8, so the height of the graph is 1/8.– P(X>3)=(8-3)/8=5/8.– P(X<5)=(5-0)/8=5/8.– P(2<X<3 or 7<X<8)=P(2<X<3)+P(7<X<8)=1/4.– The median must have half the probability above it

and half below. Therefore the median is 4.– P90 is a number such that 90% of the probability is

below it, so we have (P90-0)/8=.9, so P90 =7.2.

Page 17: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Probability Density FunctionProbability Density Function• We have been dealing with the uniform distribution in

terms of graphs. Before moving on, we need to put these ideas into the form of mathematical notation.

• We were focusing on the areas of portions of a graph like the one below. But how do we define the region we want the area for?– The bottom boundary is the x axis– The sides are vertical lines going through the x values we want– The top of the region is a special “curve” (straight lines are

curves too).• This curve is defined by a function, called the probability

density function, or pdf. For our graph, it is:

1/ 4 if 1 5( )

0 otherwise

xf x

Page 18: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Normal Probability DistributionsNormal Probability Distributions

• The normal probability distribution (Gaussian Distribution) is the most important distribution in all of statistics.

• Many continuous random variables have normal or approximately normal distributions.

• A normal distribution is defined by its pdf.

Page 19: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

The Normal pdfThe Normal pdf

• The parameters are μ and σ.• The mean of the distribution is μ.• The standard deviation is σ.• The median and mode are also μ.• There is a normal distribution for every

combination of values of μ and σ

21

21( )

2

x

f x e

Page 20: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Basic ShapeBasic Shape

• Here we see the basic shape of anormal distribution.

• The blue band is anexample of an “areaunder the curve” thatwe might want to calculate.

• This particular distribution has μ=110 and σ=10.

• The “x” axis represents values of the r.v. X.

Page 21: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Effect of Changing Effect of Changing μμ

Changing μ just causes a horizontal shift, centering the graph in a different place.

Page 22: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Effect of Changing Effect of Changing σσ

• Changing σ causes the graph to stretch out or squeeze together around the mean.

Page 23: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

What does this mean?What does this mean?• The normal pdf is a complicated formula. It is not easy

to calculate probabilities from it, even if you know calculus. So, we use tables (or computers).

• We can’t have a table for every possible normal distribution.

• We have one table for the “standard” normal distribution, which has μ=0 and σ=1. This r.v.is called Z.

• It is easy to convert proba-bility statements from other normal distributionsto Z.

Page 24: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Table 3, Appendix B entries:

The table contains the area under the standard normal curve between 0 and a specific value of z.

0 z

Page 25: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: Find the area under the standard normal curve between z = 0 and z = 1.45.

A portion of Table 3:

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06

1.4 0.4265

0 1.45

P(0 1.45) 0.4265z

Page 26: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: Find the area under the normal curve to the right of Z = 1.45; P(Z > 1.45).

P( 1.45) 0.5000 0.4265 0.0735Z

0 4265.

1.45

Area asked for

0

Page 27: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: Find the area to the left of Z = 1.45; P(Z < 1.45).

0 1.45

0.5000 0.4265

P( 1.45) 0.5000 0.4265 0.9265Z

Page 28: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: Find the area between Z = 1.26 and the mean (Z = 0).

01.26 1.26

Area asked for Area from table 0.3962

P( 1.26 0) 0.3962Z

Page 29: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: Find the area to the left of .98; P(Z < .98).

0.98 .98

Area asked for Area from table0.3365

P( .98) 0.5000 0.3365 0.1635Z

Same as area asked for

Page 30: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Applications of Normal DistributionsApplications of Normal Distributions

• Apply the techniques learned for the Z distribution to all normal distributions.

• Start with a probability question in terms of x-values.• Convert, or transform, the question into an equivalent

probability statement involving z-values.

Page 31: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

StandardizationSuppose X is a normal r.v. with mean and standard deviation .

The r.v. has a standard normal distribution.X

Z

x0

x

Page 32: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: A bottling machine is adjusted to fill bottles with a mean of 32.0 oz of soda and standard deviation of 0.02. Assume the amount of fill is normally distributed and a bottle is selected at random.

1. Find the probability the bottle contains between 32 oz and

32.025 oz.

2. Find the probability the bottle contains more than 31.97 oz.

32 32.025 32When 32.025; 1.25

.02x z

32 32 32When 32; 0

.02x z

32 32 32 32.025 32P(32 32.025) P

.02 .02 .02

P(0 1.25) .3944

XX

Z

Page 33: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Other Normal Applications

Find a cutoff point: a value of X such that there is a certain probability in a specified interval defined by x.

Example:

The waiting time X at a certain bank is approximately normally distributed with a mean of 3.7 minutes and a standard deviation of 1.4 minutes. The bank would like to claim that 95% of all customers are waited on by a teller within c minutes. Find the value of c that makes this statement true.

Page 34: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Solution:

P( ) .95

3.7 3.7P .95

1.4 1.4

3.7P .95

1.4

X c

X c

cZ

c

c

c

3 714

1645

1645 14 3 7 6 003

6

..

.

( . )( . ) . .

minutes

3 7. c x0 1645. z

0 5000. 0 4500.

0 0500.

Page 35: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Notation:

If X is a normal random variable with mean and standard deviation , this is often denoted: X ~ N(, ).

Example: Suppose X is a normal random variable with = 35 and = 6. A convenient notation to identify this random variable is: x ~ N(35, 36).

z() and zare commonly used notations for the z-score (point on the z axis) such that there is of the area (probability) to the right of z() or z.

Page 36: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Illustrations:

0 z( . )010 z

010.

0z( . )0 80 z

0 80.

z(0.10) represents the value of Z such that the area to the right under the standard normal curve is 0.10

z(0.80) represents the value of Z such that the area to the right under the standard normal curve is 0.80

Page 37: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: Find the numerical value of z(0.10).

Use Table 3: look for an area as close as possible to 0.4000

z(0.10) = 1.28

0 z( . )010 z

0.10 (area information from notation)

Table shows this area (0.4000)

Page 38: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Note:

The values of Z that will be used regularly come from one of the following situations:

1. The z-score such that there is a specified area in one tail of the normal distribution.

2. The z-scores that bound a specified middle proportion of the normal distribution.

Page 39: Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution

Example: Find the z-scores that bound the middle 0.99 of the normal distribution.

Use Table 3:

0 z( . )0 005z( . )0 995or

z( . )0 005

0 495.0 495.0 005.0 005.

z z z( . ) . ( . ) ( . ) .0 005 2 575 0 995 0 005 2 575 and