Chapter 4: The Normal Distribution - Part...

Preview:

Citation preview

Chapter 4: The Normal DistributionPart II

Eric D. Nordmoe

Math 261Department of Mathematics and Computer Science

Kalamazoo College

Spring 2009

Outline! Some Turtle Data! Assessing Normality

Leatherback Turtle DataCaliper measurements from leatherbacks nesting at PlayaGrande, Costa Rica during the 2005–06 season:

! Straight carapace length (SCL)! Straight carapace width (SCW)

Straight Carapace Lengths

! Mean = 144.1 cm! SD = 6.3 cm! n = 510

Straight Carapace Widths

! Mean = 103.4 cm! SD = 4.8 cm! n = 498

Normality Checks! Formal tests of normality! Visual inspection of the histogram! The 68-95-99.7 rule! Normal probability plots (Q-Q Plots)

The 68-95-99.7 RuleFor every normal curve,

! 68% of the area is in the interval µ± !.! 95% of the area is in the interval µ± 2!.! 99.7% of the area is in the interval µ± 3!.

Assess normality by checking agreement of empirical data withthese expected percents.

Example: SCL Measurements1. Use mean and SD to find the intervals:

! One SD interval: 144.1± 6.3 ! (137.8, 150.4)! Two SD interval: 144.1± 2 · 6.3 ! (131.5, 156.7)! Three SD interval: 144.1± 3 · 6.3 ! (125.2, 163.0)

2. Compute the observed percent in each interval andcompare to expected:

! Percent in One SD interval: 71.6% (365 of 510)! Percent in Two SD interval: 93.9% (479 of 510)! Percent in Three SD interval: 100% (510 of 510)

Reasonably close agreement with expected 68-95-99.7%values.

Example: SCW MeasurementsCaution: outliers present.

1. Use mean and SD to find the intervals:! One SD interval: 103.4± 4.8 ! (98.6, 108.2)! Two SD interval: 103.4± 2 · 4.8 ! (93.8, 113.0)! Three SD interval: 103.4± 3 · 4.8 ! (89.0, 117.8)

2. Compute the observed percent in each interval andcompare to expected:

! Percent in One SD interval: 73.3% (365 of 498)! Percent in Two SD interval: 97.8% (487 of 498)! Percent in Three SD interval: 99.6% (496 of 498)

Percents within 1 and 2 SDs exceed the 68-95% values.

Normal Probability PlotsThe Normal probability plot is a tool for assessing normality thatis:

! More sensitive than a histogram.! More informative than 68-95-99.7 computations.

The Normal probability plot identifies deviations from normalitythat are large enough to invalidate statistical tests.

! Requires use of judgement.

Normal Probability Plot Examples100 Observations of a N (100, 10) Random Variable

Histogram Normal Probability Plot

Normal Probability Plot Examples100 Observations of a Skewed Right Random Variable

Histogram Normal Probability Plot

What Does a Normal Distribution Look Like?Three Examples of Samples of Size 30 from the Standard Normal Distribution

What is a Normal Probability Plot?! A normal probability plot is a plot of the sorted sample data

(Y ) versus the expected z-score (approximately) for thecorresponding rank of a random normal sample of thesame size.

! For a sample of size 30 from a N (0, 1) distribution, Tukey’smethod gives:

! The expected maximum value is 2.01! The expected 2nd largest value is about 1.60.! The expected 3rd largest value is about 1.35.! . . ..! The expected minimum value is about -2.01.

! Interpretation of the plot:! Points lie on a straight line ! Normality! Points deviate from a straight line! Non-normality

Non-Normal ExamplesUsing a Standardized Scale

Skewed Right Skewed Left

Non-Normal ExamplesUsing a Standardized Scale

Short Tails Long Tails

Assessing Normality of the Leatherback Data

SCL SCW

! Both fit the normal distribution fairly closely! SCL has slight skewness! SCW has outliers

Last Thoughts on the Normal Probability Plot! SPSS can produce Normal probability plots but these do

not exactly match those shown in the text.! When Y is not normal, transformations (e.g., log Y ,

"Y )

often have a normal distribution.

Approximating Discrete Distributions with the NormalHypothetical Bacteriology Data

! The number of bacteria per unit area can be approximatedby a normal distribution with mean µ = 10 and standarddeviation ! =

"10.

! Below is a histogram of a sample of 1000 observationsfrom the population.

Number of Bacteria

Dens

ity

0 2 4 6 8 10 12 14 16 18 20 22

0.00

0.04

0.08

0.12

Approximating a Discrete Distribution by a Normal! The normal distribution can also be used to compute

approximate probabilities of discrete distributions.! For greater accuracy, we must take into account the

discreteness of the data.! Compute the normal probability over intervals which match

the endpoints of the histogram bars.! These endpoints are the midpoints between the possible

discrete values.

Approximating Bacteria CountsFind the probability using the normal approximation:

Pr(Y # 10) $ Pr(W # 9.5) = .5628

where W % N (10,"

10).

Number of Bacteria

Dens

ity

0 2 4 6 8 10 12 14 16 18 20 22

0.00

0.04

0.08

0.12

Approximating Bacteria CountsFind the probability using the normal approximation:

Pr(Y = 5) $ Pr(4.5 & W & 5.5) = .0364

where W % N (10,"

10).

Number of Bacteria

Dens

ity

0 2 4 6 8 10 12 14 16 18 20 22

0.00

0.04

0.08

0.12

Recommended