30
CHAPTER 6 Normal distribution Are the diameters of the rods exactly equal? I /

CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

  • Upload
    ledan

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

CHAPTER

6

Normal distribution

Are the diameters of the rods exactly equal?

I

/

Page 2: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

168 NORMAL DISTRIBUTION

In your earlier studies of statistics, frequency distributions were considered and it was stated that, although they are sometimes of importance in themselves, they are mainly important

in providing information about the population from which the sample is drawn.

a b a b

Figure 6-1: Histogram Figure 6-2: Frequency curve

Consider a relative frequency distribution of a continuous variable, represented by the

histogram (Figure 6-1). It is not difficult to imagine that, if the sample size is increased

indefinitely, it will approach the population size and, if the class intervals are made as small as possible, the histogram will approach a well defined smooth curve (Figure 6-2). Such a

curve is called afrequency curve or probability density curve and it provides us with

information about a population in much the same way as the histogram does about the sample. The shaded area in the histogram represents the relative frequency with which the

variable lies between the values a and bin the sample, and corresponds to the shaded area in the frequency curve which represents the relative frequency or probability with which the

variable Hes between the values a and bin the population. If the probability density curve

is drawn so that the total area under the curve is unity and if the equation of the curve y = f(x) is known, then the probability that X lies in the interval a to b can be calculated

by the process of integration:

Pr(a < X < b) = [b

f(x)dx.

The mean and standard deviation of a population are denoted by Greek letters µ and a

respectively and are called parameters to distinguish them from the mean and standard

deviation of a sample which are denoted by Roman letters x ands respectively and are called statistics.

6.1 The normal distribution

y

a

µ-3cr µ-2cr µ-cr µ µ+cr µ+2cr µ+3cr X

Figure 6-3: Normal curve

Page 3: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 169

One of the most important examples of a probability distribution of a continuous variable

is the normal distribution. If a random variable, X, is normally distributed, its frequency

curve has a typical symmetrical bell shape as.shown in Figure 6-3. This curve has been

found to give an adequate fit to a great variety of frequency distributions. The heights of

children of a certain age, the diameters of metal cylinders, the number of burning hours

of electric light globes manufactured by a particular firm, the Intelligence Quotient of

children in a certain area, are a few examples of distributions which are approximately

normal.

The normal distribution is defined by the equation of its frequency curve:

where:

and:

1 -! (�)' Y = --e

i u

ufu

µ, = the mean value of X in the population

u = the standard deviation of X in the population.

How the equation of this curve is derived is beyond the scope of this book, but the following

properties should be noted.

(i) The curve extends, theor"etically at least, to infinity in either direction and so X can

assume all values.

(ii) The curve is symmetrical about the ordinate x = µ, and so the mean, mode and median

of the normal distribution coincide.

(iii) Practically all of the population (about 99. 7 per cent) lies in the interval µ, ± 3a; about

95 per cent lies in the interval µ, ± 2a; about j of the population lies in the interval

µ, ± (1.

(iv) The total area under the curve is unity.

(v) The probability that X lies in the interval a to b is equal to the area under the curve

bounded by the ordinates x = a and x = b.

6.2 Standard normal curve

Normal distribution curves will differ in location and degree of spread according to the

values of the parameters µ, and a respectively.

y y y

X µ X

Figure 6-4 Figure 6-5

In Figure 6-4, the curves differ in location but have the same degree of spread, i.e. a is the

same for both, but µ, is different.

In Figure 6-5, the curves have the same location but different degree of spread, i.e.µ, is the

same for both, but a is different.

Page 4: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

170 NORMAL DISTRIBUTION

It would appear then, that the area for any interval would require special determination for each particular curve. However, this is not the case, since all normal curves can be transformed into a standard normal curve by putting µ = 0 and a = l. The equation of the standard curve is:

1 _! ,' Y = --e 2

-J2ir

To transform the normal equation to the standard normal equation put: x-µ

z = --and Y= ay.(1

Whenz = 0, Y = -1-e0 = -1- = 0.40 -J2ir -J2ir

Whenz = ± 1, Y = },_e-f = 0.24 v21r

When z = ± 2, Y = . },_ e - 2 = 0.05

v21r

When z = ± 3, Y = . },_ e - 4-5 = 0.004 (using a calculator)

v21r For example, to evaluate -,/2ir e - 4

-5

, we can proceed as follows:

(g) a a e • 111:1@ o CID 11111111 e

-3 -2 -1

Figure 6-6: Standard normal curve

z

0.4

0.3

0.2

0.1

z,

0

0°0044

2 3 z

The total area under this curve is unity but the area from - oo to any positive value of z is given in the normal probability tables provided (see page 178). To find the area under any normal curve between the ordinates x1 and X2, we find the corresponding values of z1 and z2 from the transformation formulae:

X1 - µ X2 - µ z1 = ---andz2 = ---a a

and then determine from the tables the areas from - oo to z1, and from - oo to z2. Then subtract to find the required area. This area measures the probability that any individual selected at random from a normally distributed population will have a value lying between X1 and X2.

Page 5: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 171

Example 1 The heights of VCE students in Victoria may be considered to be normally distributed with a mean of 170 cm and a standard deviation of 5 cm. a What is the probability that a student, selected at random, has a height between 174 cm

and 178 cm? b Out of a group of 150 VCE students, how many would be expected to have a height less

than 164 cm? c What proportion of students would be expected to have heights deviating from the mean

by more than two standard deviations?

y

A

155 160 :165 170 175 180 185 X-3 -2 :-1 0

'

: 1 2 3 z

-1'.2 o.'8 1.6

Figure 6-7

a The shaded area, A, in Figure 6-7 measures the required probability. Whenx = 174:

Whenx = 178:

z = X - /J, = 174 - 170 = 0.8a 5

Pr(X < 174) = Pr(z < 0.8) = 0.7881 (from tables)

z = X � /J, = 178 ; 170 = 1.6

Pr(X < 178) = Pr(z < 1.6) = 0.9452 (from tables)

Pr(174 <X < 178) = 0.9452 - 0.7881 = 0.1571

b The shaded area, B, in Figure 6-7 measures the probability of a student having a height less than 164 cm. Whenx = 164: 4 0 z = X - µ, = 16 - 17 = _ 1.2

a 5

Pr(X < 164) = Pr(z < - 1.2) = Pr(z > 1.2) (from symmetry of curve) = 1 - Pr(z < 1.2) = 1 - 0.8849 = 0.1151

Expected number = 150 x 0.1151 = 17

Page 6: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

172 NORMAL DISTRIBUTION

y

155 160 165 170 175 180 185 X

Figure 6-8 -3 -2 -1 0 2 3 z

c The required proportion is the sum of the areas C and Din Figure 6-8. By symmetry, these areas are equal. For two standard deviations above or below the mean,z = ± 2.

Pr (X> µ + 2o-) = Pr(z> 2) = 1 - Pr(z<2) = 1 - 0.9772 (from tables) = 0.0228

Pr(X> µ + 2a or<µ - 2a) = 2 x 0.0228 (from symmetry) = 0.0456

This means that about 5 OJo of the students have heights deviating from the mean by more than two standard deviations, or that about 95% of students have heights within two standard deviations from the mean. This is one of the characteristics of the normal distribution. Verify that about� of the population lies within one standard deviation from the mean and that practically all of the population (about 99.7%) lies within three standard deviations from the mean.

Example 2 A lathe turns out brass cylinders with a mean diameter of 2.00 cm and a standard deviation of 0.04 cm. Assuming that the distribution of diameters is normal, find the limits to the acceptable diameters if, on checking, it is found that five per cent in the long run are rejected because they are oversize and five per cent are rejected because they are undersize.

Each of the shaded areas in Figure 6-9 is five per cent of the total area. This is an inverse type of problem in which the proportions are given and the x-values of the inside ends of the shaded portions are to be found. Using the inverse normal distribution tables on page 178, we find the z value such that the area from -oo to this z value is 0.95. Its value is 1.6449. By symmetry, the other z value is -1.6449.

Figure 6-9

1.92 1.96

-2 -1-1.6449

y

2.00

0

,

1.6449

0.05 ,

,

Page 7: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 173

I / -----........__......._ \ x- µ·"-...z -� 7"-,..- ') 'Z

V

"'- .

+ 1 6449 : ·x� 2·00- .

0.04

X = 2.00 ± 1.6449 X 0.04 = 2.00 ± 0.07 = 1.93 or 2.07 (cm)

These are the acceptable limits.

Example 3 The mean life of a certain type of television tube is 10 000 hours with a standard deviation

of 1000 hours. Assuming the distribution of lifetimes, X, is normal, find the probability that Xis less than any specified value x. Using integral multiples of the standard deviation, plot the cumulative probability curve.

Figure 6-1 O

y

Pr (X< x)

�-+--'�--,...---t--��-i----i------1�x

7

-3

8

-2

9

-1

10 X 11

0

12

2

13 hours ('000)

3 z

The shaded area of Figure 6-10 shows the probability that Xis less than x, where the values of the variable lie almost certainly between 7000 and 13 000.

When x = 7000:

z = - 3 and Pr(X < 7000) = 0.0013 (from tables)

When x = 8000: z = - 2 and Pr(X < 8000) = 0.0228

Similarly, for x = 9000, 10 000, .. . 13 000 as shown in the following table:

x('OOO) 7 8 9 10 11 12 13

Pr(X<x) 0.0013 0.0228 0.1587 0.5 0.8413 0.9772 0.9987

This table gives a cumulative probability distribution of a normal variable, X. It is similar to a cumulative relative frequency distribution. From the table we can see, for example, that 84.13 per cent of tubes have a lifetime less than 11 000 hours.

Figure 6-11 shows the cumulative probability curve from which the different quantiles and relative frequencies can be calculated approximately. For example, about 28 per cent of tubes have a lifetime less than about 9400 hours. The 0.6 quantile is approximately 10 250.

,'

Page 8: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

174 NORMAL DISTRIBUTION

1.0

� 0.8

0.6

0.4

0.28-0.2

0 7

V ----

I------- ------- -------

i'i I

------- ------- _/ V '

------' '

8 '

9 : 10 11 12 13 hours ('000)

Figure 6-11 9_'4 10.25

Example4 A machine makes metal rods with a mean length of 50 cm and a standard deviation of 1 cm . Assume that the distribution of lengths is normal. a What proportion of rods whose length is greater than 49 cm will have a length in excess

of 50cm? b If five rods are selected at random, what is the probability that not more than one of

these rods will have a length greater than 49 cm?

49 -1

y

50 0

51 52 2

53 X

3 z

Figure 6-12

a Pr(X> 49) = Pr(z > -1) = Pr(z < 1) = 0.8413 Pr(X> 50) = 0.5

This question involves conditional probability. Using the formula: Pr(A n B)

Pr(B I A) = Pr(A)we get:

d Pr(X> 50)

Pr(X>.5 IX> 49) = Pr(X> 49). Why?

--- 0.5 =

0.8413

= 0.5943

Or, simply using the geometry of the situation as shown in Figure 6-12:

Pr(X > 50 I x > 49) = area to the right of X = 50area to the right of X = 49

Page 9: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 175

b Since each rod has the same probability 0.8413 of having length greater than 49 cm, we are dealing with a binomial variable, Y, withp = 0.8413, q = 0.1587 and n = 5.

Pr(Y � 1) = Pr(Y = 0) + Pr(Y = 1)

Exercises 6a

= (0.1587) 5 + (\0) (0.1587)4(0.8413)

= 0.0001 + 0.0053 = 0.0054

1 Plot the curve of the normal distribution:

Y = _l _ e-½(7)'crfiir whenµ = 20 and <1 = 5 and, by counting squares, verify that the area under the curve is approximately one square unit.

:£ A normal variable has meanµ and standard deviation <1. What is the probability that any value of the variable, randomly selected, lies betweenµ + 0.4<1 and µ + 2.6<1?

3 A manufacturer of electric light globes finds that these articles have an average life of 1200 burning hours with a standard deviation of 200 hours. Assuming that the distribution of life-times is normal: a what is the probability of a globe selected at random having a life between 1240 and

1320 hours? b out of a batch of 200 globes, how many would be expected to fail in the first 880

burning hours? c what proportion of globes manufactured would be expected to have a life less than

1100 hours or more than 1460 hours?

@ Tests on breaking strengths of two different kinds of fibre, one being silk and the other a silk-rayon mixture, yielded the following data: Silk: mean 10 kg wt; standard deviation 2.5 kg wt. Silk-rayon: mean 15 kg wt; standard deviation 5 kg wt. Calculate: a the probability that a piece of silk, selected at random, will be at least as strong as

the mean for the silk-rayon mixture. b the probability that a piece of silk-rayon selected at random will be no stronger than

the mean of the silk.

5 A machine makes electrical resistors which have a mean resistance of 50 ohms with a standard deviation of 2 ohms. a Assuming the distribution to be normal, find the proportion of resistors made which

have resistance of less than 47 .5 ohms. b Calculate the limits a and b, equally spaced on either side of the mean, so that the

manufacturer can correctly claim that, in the long run, no more than one resistor in 500 lies outside these limits.

· V The local authorities in a certain city install ! 000 electric lamps in the streets of the city.a If the lamps have an average life of 2000 burning hours with a standard deviation

of 400 hours, and the life of the lamps is normally distributed, what number of lamps might be expected to fail in the first 1500 burning hours?

b What period of burning would you expect to elapse before 80 per cent of the lamps fail?

Page 10: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

176 NORMAL DISTRIBUTION

7 Steel rods are manufactured to be 5 cm in diameter, but they are acceptable if they are between 4.95 and 5 .05 cm. The manufacturer finds that, in the long run, about four per cent are rejected as oversize and four per cent as undersize. If the diameters are normally distributed, find the distribution's standard deviation.

8 Speedometers of cars are not accurate. Suppose that, when the speedometer of a randomly chosen car registers 60 km/h, the actual speed of the car is a variable having a normal distribution with meanµ = 62 ana standard deviation a = 2. What proportion of cars are exceeding 60 km I h when their speedometers register 60 km/h?

9 The 'threshold', or smallest amount, of a certain poison which is sure to kill a rat, is known to vary from rat to rat, following a normal distribution with mean 25 .0 mg and standard deviation 2.5 mg. a Find the proportion of rats that would be killed by a dose of 27 .0 mg. b Plot a graph (using integral multiples of the standard deviation) showing how this

proportion changes as the dose is increased or decreased. c Find the smallest dose that would kill 90 per cent of rats. d What changes would you expect in the graph if the poison were diluted by adding

two parts of inert bait to one part of the original poison?

10 Butter, marketed in 250 g packages, has a weight which is normally distributed with its advertised weight of 250 g as the mean. It may be regarded as appreciably underweight if the actual weight is less than 225 g. Find the maximum allowable value of the standard deviation if, in the long run, not more than one package in 100 is rejected as being underweight.

11 The mean annual income for a sample of 200 persons selected at random from a certain industry was $20 800 with a standard deviation of $4160. Of these, eight earned less than $272 per week and 24 earned more than $480 Qer week. Does this sample tend to confirm or refute the claim that incomes of the population from which this sample was selected were normally distributed? Give reasons for your answer.

12 A firm producing brass washers to a specified thickness of 0.5 cm has found that the thickness varies normally about a mean of 0.5 cm with a standard deviation of 0.005 cm. All washers with a thickness between 0.49 cm and 0.51 cm are regarded as satisfactory. In a batch of 2000 washers, how many would you expect to be rejected?

13 The average height of male students at a certain university is 170 cm with variance 25. What proportion of these students whose height is greater than 160 cm will have a height in excess of 170 cm assuming the heights are normally distributed?

14 If Xis a normally distributed random variable with mean 10, and the probability that Xis greater than 12 is 0.1056, find: a the standard deviation of X b Pr(X> 8)

c Pr(X> 8 J X< 12) d the value of x for which Pr(X > x) = 0.85

15 .·Xis a normally distributed random variable and the variance of Xis 4. Given that Pr(X> 16) = 0.95, findµ, the mean value of X and determine: a Pr(X < 16 IX<µ), b Pr(X < µ IX< 20).

1'6; The weight distribution for packages of flour is normal with a mean weight of 500 grams and standard deviation of.5 grams. An automatic device rejects packages with a weight less than 490 grams. Find the probability that an accepted package will weigh less than 500 grams.

Page 11: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

/

NORMAL DISTRIBUTION 177

17 A certain population of plants has a distribution of heights, measured in centimetres, which is normal with mean 30 cm and standard deviation 2 cm. a Cals;ulate the probability that a randomly selected plant will be less than 27 cm in

height. b If five plants are selected at random, what is the probability that, at most, one is less

ilimn����n

18 The average life of a certain type of light globe is 1200 h with a standard deviation of 240 h. a Assuming that the lengths of life of this type of globe are normally distributed,

complete the following proportionate frequency distribution.

Length of life <480 <720 <960 < 1200 < 1440 < 1680 < 1920

Proportion of tubes

b Use the table above to draw a cumulative proportion curve and, from the graph, find: (i) the proportion of tubes which have a life less than 700 h.

(ii) the 0.9 quantile, stating what it represents.

19 The wingspan of birds of a particular species has a normal distribution with mean 50 cm and standard deviation 5 cm. a Find the probability that a randomly selected bird has a wingspan greater than 60 cm. b If the wingspan is measured to the nearest centimetre, find the probability that a

randomly selected bird has a wingspan measured as 50 cm.

20 The length of a certain species of fish has a normal distribution with mean 30 cm and standard deviation 2.5 cm. a Find the probability that a randomly selected fish has a length greater than 36 cm. b If the lengths of the fish are measured correct to the nearest centimetre, show that

the probability of a randomly selected fish having a length which is measured as 30 cm is about 0.16.

c If five fish are randomly selected, find the probability that exactly two will have their lengths measured as 30 cm.

21 Suppose that the strengths of mass-produced items are normally distributed with mean µ and standard deviation 0.5. The value of µ can be controlled by a machine setting. If the strength of an item is less than 5, it is classified as defective. Revenue from sales of non-defective items is $20 per item, while revenue from defective items is $2 per item. The cost of production of items with mean-µ is $2 per item. Find the expected profit per item ifµ = 6.

2� A machine makes electrical resistors which have a mean resistance of 50 ohms with a ·� · standard deviation of 2 ohms.

23

a Assuming the distribution of resistances to be normal, find the proportion of resistors made which have resistance less than 47 .5 ohm�.

b If IO resistors are selected randomly, what is the probability that no more than one will have a resistance less than 47 .5 ohms?

A chain is made of five links which are selected at random from a population of links. The strengths of the links are assumed to be normally distributed with mean of 500 units and a standard deviation of IO units. Find the probability that: a a randomly selected link has a strength of at least 490 units b a chain has a strength of at least 490 units c at least two links in a chain have strength of at least 490 units.

! ..

Page 12: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

178 NORMAL DISTRIBUTION

Area under standard normal curve giving area as function of x, x � 0

/( 0 1 2 CD 4 5 6 7 6 9 1

0.0 �1000 'Q'.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 0.1 ,5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 0.2 0.5793 o.5832 o.5871 o.591 o 0.594�8t) 0.6026 0.6064 0.6103 0.6141 4 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4

0.5 0.6915 0.6950 0.6985 0.7019 0. 7054 0. 7088 0. 7123 0. 7157 0. 7190 0. 7224 3 0.6 0.7257 0.7291 0.7324 0.7357 0. 7389 0. 7 422 0. 7 454 0.7486 0.7517 0.7549 3 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 0.8 0.7881 o. 791 o o. 7939 o. 7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3

1.0 0.84� 3 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.87 49 0.8770 o.8790 o.881 o o.8830 2 1.2 0.8849 0.8869 0:8888 0.8907 0.8925-l�-8962 0.8980 0.8997 Q.9015 2 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 !f.9fl 5 0.9131 0.9147 0.9162 0.9117 2 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1 1.6 0.9452 0.9463 0.947 4 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.97 44 0.9750 0.9756 0.9761 0.9767 1 '

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 o 2.1 .0.982) 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 o 2.2 0.9861 0.9864 0.9868 0.9811 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 o

@ 0.9893 0.9896 0.9898 ().99011 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 o 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0. 9932 0. 9934 0. 9936 o

, .. )' )

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 o 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 o 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.997 4 o 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 o 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 o.9985 o.9986 o.9986 o

:r.o 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 o 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 o 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 o 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 o 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 o

3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 o 3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 o 3.9 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 o

Inverse normal tables

0.50 0.0000 0.60 0.2533 0.70 0.5244 0.80 0.8416 0.90 0.51 0.0251, 0.61 0.2793 0.71 0.5534 0.81 0.8779 0.91 0.52 0.0502 0.62 0.3055 0.72 0.5828 0.82 0.9154 0.92 0.53 0.0753 0.63 0.3319 0.73 0.6128 0.83 0.9542 0.93 0.54 0.1004 0.64 0.3585 0.74 0.6433 0.84 0.9945 0.94

0.55 0.1257 0.65 0.3853 0.75 0.6745 0.85 1.0364 0.95 0.56 0.1510 0.66 0.4125 0.76 0.7063 0.86 1.0803 0.96 0.57 0.1764 0.67 0.4399 0.77 0.7388 0.87 1.1264 0.97 0.58 0.2019 0.68 0.4677 0.78 0.7722 0.88 1.1750 0.975 0.59 0.2275 0.69 0.4959 0.79 0.8064 0.89 1.2265 0.98

Mean dlfferen es

2 (!: 4 5 6 7 8 9

8 12 76 20 24 28 32 36 8 12 16 20 24 28 32 36 8 12 15 19 23 27 31 35 8 11 15 19 22 26 30 34 7 11 14 18 22 25 29 32

7 10 14 17 21 24 27 31 6 10 13 16 19 23 26 29 6 9 12 1518 21 24 27 6 8 11 14 17 19 22 25 5 8 10 13 15 18 20 23

5 7 9 12 14 16 18 21 4 6 8 10 12 14 16 19 4 5 7 9 11 13 15 16 3 5 6 8 10 11 13 14 3 4 6 7 8 10 11 13

2 4 5 6 7 8 10 11 2 3 4 5 I\ 7 8 9 2 3 3 4 5 6 7 8 1 2 3 4 4 5 6 6 1 2 2 3 4 4 5 5

1 1 2 2 3 3 4 4 1 1 2 2 2 3 3 4 1 1 1 2 2 2 3 3 0 I 1 1 1 2 2 2 2 o 1 1 1 1 1 2 2

o o 1 1 1 1 1 1 o o o 1 1 1 1 1 o o o o 1 1 1 1 o o o o o o o 1 o o o o o o o 1

o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o

o o o o o o o o o o o o o o o o o o o o o o o o

1.2816 0.990 2.3263

1.3408 0.991 2.3656

1.4051 0.992 2.4089

1.4758 0.993 2.4573

1.5548 0.994 2.5121

1.6449 0.995 2.5758

1.7507 0.996 2.6521

1.8808 0.997 2.7478

1.9600 0.998 2.8782

2.0537 0.999 3.0902

Page 13: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 179

6.3 Normal approximation to binomial distribution When n is fairly large (30 or more), andp is not too small (not less than about 0.1) or not too large (not greater than about 0.9), the normal distribution with µ, = np and standard deviation a = -Jnjiq can be used as an approximation for the binomial distribution.

The histograms below and on the next page have been drawn for p = 0.8 and for values of n = 5, 10, 15 and 20. Observe that, as n increases, the histograms become less skewed, leading then to the idea that the curve drawn through the midpoints of the top of each rectangle of the histogram has the characteristic symmetrical shape of the normal distribution curve.

Pr

0.4

0.3

0.2

0.1

I µ 0

0 2 3 4

Number of successes

Figure6-13:p = 0.8,n = 5,p. = 4

Pr

0.3

0.2

0.1

µ 0

0 2 3 4 5 6 7 8 9 10

Number of successes

Figure 6-14: p = 0.8, n = 10, p. = 8

Page 14: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

180 NORMAL DISTRIBL,ITION

Pr

0.3

0.2

0.1

3 6 7 8 9 10 11 12 13 14 15

Number of successes

Figure 6-15: p = 0.8, n = 15, µ, = 12

Pr

0.3

0.2

0.1

0

I

I

µ

4 8 10 11 12 13 14 15 16 17 18 19 20

Number of successes

Figure 6-16: p = 0.8, n = 20, µ, = 16

The binomial variable is discrete and can assume only integral values 0, 1, 2, ... , n. How

then can we represent its probability distribution by means of a histogram which can be

drawn for a continuous variable only? This can be justified by the fact that the areas of

the rectangles are proportional to the probabilities and, since the width of the base of each

rectangle is one unit, the height of each rectangle represents the probability of the midpoint

of the base. The sum of the areas of all the rectangles is 1 unit of area, corresponding to

the fact that the sum of the probabilities is 1.

Example 5 A Gallup Poll establishes that 80 per cent of people interviewed are in favour of a certain

proposal. If 20 people are interviewed, find, using the normal approximation to the

binomial distribution, the probability that:

a there will be exactly 14 in favour of the proposal

b there will be more than 14, but fewer than 18, in favour.

Page 15: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

µ = np = 20 X 0. 8 = 16 a = -,/npq = --J16 X 0.2 = 1.789

NORMAL DISTRIBUTION 181

A normal distribution with mean 16 and standard deviation -13.2 can be used as an approximation in this case.

a With reference to Figure 6-16 it will be seen that, to find the probability that X = 14, it will be necessary to find the probability that 13.5 <X* < 14.5. It should be remembered that the normal variable is continuous, whereas the binomial variable is discrete. Note:

If Xis a binomial variable whose distribution is approximated by a normal variable X*, then:

Pr(X =a)""' Pr(a - 0.5<X*<a + 0.5) and: Pr(a <X < b) ""' Pr(a + 0.5 <X* < b - 0.5) for integer values a and b such that a< b.

WhenX* = 13.5:

WhenX* = 14.5:

z =

13.5 - 16""' -1.40

-13.2

z =

14.5 - 16""' -0.84-13.2

Pr(X* � 13.5) = Pr(z � -1.40) = 0.0808 Pr(X* < 14.5) = Pr(z < -0.84) = 0.2005

Pr(13.5 < X* < 14.5) = 0.2005 - 0.0808 = 0.1197

Check whether this is a good approximation by evaluating (i�) (0.2)6(0.8) 14•

b It will be necessary to find Pr(14.5 <X* < 17.5)

When X* = 14.5:

WhenX* = 17.5:

z =

14.5 - 16""' -0.84

-13.2

z =

11.5 - 16""' 0.84

-13.2

Pr(X* � 14.5) = Pr(z � -0.84) = 0.2005 Pr(X*< 17.5) = Pr(z< 0.84) = 0.7995

Pr(14.5<X*< 17.5) = 0.7995 - 0.2005 = 0.5990

Compare this result with the arithmetical drudgery involved in calculating:

(f�) (0.2)5(0.8) 15 + G�) (0.2)4(0.8) 16 + G�) (0.2) 3(0.8) 17

Page 16: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

182 NORMAL DISTRIBUTION

Exercises 6b (In each of the following questions, use the normal approximation to the binomial distribution where applicable.) 1 A dental inspector finds that about 20 per cent of children of a certain area have tooth

decay. If a group of 400 children is randomly selected: a how many would be expected to have tooth decay? b what is the probability of exactly this number? c what is the probability that the number lies within one standard deviation of the

expected number?

2 A fair coin is tossed 100 times. a How many heads do we expect to turn up? b What is the probability of this number? c What is the probability that the number of heads is greater than 45 but less than 55?

3 A fair coin is tossed 500 times. Find the probability that the number of heads uppermost will not differ from 250 by more than 10.

4 A fair die is thrown 180 times. What is the probability that: a a six will turn up exactly 40 times? b an odd number will turn up at least 100 times?

5 A targetshooter finds that a bull's-eye is scored on 20 per cent of occasions. What is the probability that at least 24 bull's-eyes will be scored out of 100 attempts?

6 a Assuming that the length of life of a certain type of television tube is normally distributed with a mean of 1000 hours and a standard deviation of 250 hours, what proportion of tubes would be expected to have a life not exceeding 780 hours?

b If 100 such tubes are randomly selected, how many would be expected to have a life not exceeding 780 hours and what is the probability that the number exceeds 21?

7 A manufacturer of metal pistons finds, that on average, 10 per cent of the pistons are rejected because they are either oversize or undersize. What is the probability that a batch of 900 pistons will contain: a no more than 100 rejects? b at least 80 rejects?

8 Hospital records show that of patients suffering from a certain complaint, 75 per cent recover. What is the probability that, of 48 randomly selected patients, at least 40 recover?

9 In packets of flower seeds, 40 per cent are known to produce pink flowers. If 250 seeds are planted and they all flower: a how many pink flowers would we expect? b what is the probability of this number? c within what limits·would the number of pink flowers very probably lie?

6.4 Probability limits for a single value of the normal /variable

It has been stated that one of the characteristic properties of a normal distribution is that:

(i) about� of the population lies in the intervalµ ± a (Figure 6-17).

(ii) about 95 per cent of the population lies in the intervalµ± 2a (Figure 6-18).(iii) practically all of the population lies in the interval µ ± 3a (Figure 6-19).

Page 17: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

68.27%

µ-cr--µ µ+cr

Figure 6-17

NORMAL DISTRIBUTION 183

95.45% 99.74%

µ -20' µ µ + 20' µ-30' µ µ + 30'

Figure 6-18 Figure 6-19

From this we infer that a single value, x, of the variable will almost certainly lie within 3standard deviations of the mean: i.e. I almost certainly, µ, - 3a � x � µ, + 3a.

Very probably (probability of about 0.95) it will lie within 2 standard deviations of themean: i.e. very probably, µ, - 2a � x � µ, + 2a.

These limits for the value of a variable are called the 3 sigma and 2 sigma probability limitsrespectively. If a value of the variable lies beyond either of these limits, it is said to differ significantly from the mean at that particular level of significance. Example& A manufacturer of electric light globes finds that the globes have an average life of 2000 burning hours with a standard deviation of 200 hours. Assuming that the distribution oflifetimes is normal, within what interval will the lifetimes almost certainly lie?

Their lifetimes will almost certainly lie in the interval µ, ± 3a.

i.e. in the interval 2000 ± 600 hoursi.e. between 1400 and 2600 hours

If a globe, randomly selected, had a lifetime of only 1200 hours, what conclusion could wedraw? We could conclude thit this is most unlikely to have occurred by chance so perhaps thereis some factor which needs to be taken into consideration, such as a fault in the manufacturing process. Further sampling would be necessary to discover whether the lifetime of globes wasconsistently lower than expected. In Section 6.3 it was stated that, when n is fairly large (30 or more) andp not too small(not less than about 0.1) or not too large (not greater than about 0.9), the normal distribution with µ, = np and a = -./npq can be used as an approximation to the binomialdistribution - in which case we can apply the probability limits stated above.

Page 18: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

184 NORMAL DISTRIBUTION

Example 7 In the long run, 64 per cent of patients treated for a particular disease with drug X are cured. If 100 patients, not specially selected, are treated with this drug and 75 are cured, determine whether this number is significantly higher than the expected numbe� of cures.

Expected number of cures:

µ = np = 100 X 0.64 = 64

a = -Jnpq

= -JlO0 X 0.64 X 0.36 = 4.8

µ + 2a = 64 + 2 X 4.8= 73.6

µ + 3a = 64 + 3 X 4.8= 78.4

So 75 cures are significantly more than the expected number at the 2 sigma limit but not at the 3 sigma limit.

The mean, µ, is a characteristic parameter of a Poisson distribution and the variance and standard deviation areµ and -Jµ, respectively, the derivation of which is beyond the scope of this book. Whenµ, is large, the normal distribution gives a satisfactory approximation to the Poisson distribution with practically all values of the Poisson variable lying in the rangeµ ± 3-Jµ, since a = -Jµ, and very probably (probability of about 0.95) lying in the range µ ± 2-Jµ,.

Example& The number of demands for a certain item of equipment varies randomly from week to week, following a Poisson distribution with mean 20. What is the smallest number of items a firm must have in stock each week to be almost certain of not having to refuse a demand for this item?

µ = 20 a = ,,/µ

= .J26 = 4 .4 72.

It is almost certain that the demand will not be more thanµ, + 3-Jµ,, i.e. not more than 20 + 3 x 4.472 = 33.416. The firm should have at least 34 items in stock. Very probably the demand will lie in the interv.al µ, ± 2-Jµ,, i.e. in the interval 20 ± 2 x 4.472. The demand will very probably be between 11 and 29.

Exercises 6c

1 Assuming that the length of life of a certain type of television tube is normally distributed with a mean of 1000 hours and standard deviation of 250 hours, after how many hours is a tube almost certain to fail?

2 A targetshooter scores a bull's-eye, on the average, on 75 per cent of occasions. If 48 shots are fired at a target and 25 bull's-eyes are scored, use 3 sigma limits to determine whether this is significantly less than the expected number.

Page 19: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 185

3 A dental inspector finds that about 20 per cent of children of a certain age have tooth decay. In a certain area the inspector finds that 25 out of 200 children examined have tooth decay. Use 3 sigma limits to determine whether this number is significantly less than the expected number.

4 A Gallup Poll establishes that 60 per cent of people are in favour of a certain proposal. If a sample of 120 were interviewed, use 2 sigma limits to estimate the number in favour of the proposal.

5 Electricity power failures occur according to a Poisson law with an average of three failures every twenty weeks. If, over a period of 40 weeks, there were actually 9 failures, use 3 sigma limits to determine whether this is significantly more than the expected number.

6 Variables X and Y are known to be connected by the formula: Y = 10 + bX.

X can be measured accurately but the measurements of Y are subject to a random error which is normally distributed with mean of 0 and standard deviation of 0.20. An observation Y = 15 is obtained when X = 2. Determine limits within which b almost certainly lies.

7 The cost, $C per article, of manufacturing an article is related to the weight, w g, by the equation:

C = 2w + 25

The weight of the articles is normally distributed with mean_ 5 g and standard deviation 0.1 g. Give limits within which the cost of the article will almost certainly lie.

8 On the average, one student in every ten wears glasses. a From a group of 90 students, how many would be expected to wear glasses? Give

limits between which this number very probably lies. b How large would a group of such students need to be for us to be almost certain

that the number of students in the group wearing glasses is at least 63?

9 A fair coin is tossed 100 times. a What is the mean and standard deviation of the number of heads appearing

uppermost? b Give limits between which the number of heads:

(i) very probably will lie(ii) almost certainly will lie.

10 A manufacturer of metal pistons markets the product in batches of 10 and finds that 15 per cent of batches contain at least one defective piston. In 1000 batches, estimate the mean and standard deviation of the number of defective pistons and give limits between which this number will almost certainly lie, assuming the Poisson law.

11 The number of demands for a certain item of equipment varies randomly from week to week, following a Poisson distribution with mean 4. If X denotes the number of demands per week, find Pr(X � µ, - a).

12 A retailer keeps a record of sales and finds that, on 82 out of 1000 days, there was no demand for a particular item of clothing. On one particular day there was a demand for 13 such items. Assuming the Poisson law, determine whether this is significantly more than the expected number.

13 Electricity power failures occur according to a Poisson law, with an average of three failures every twenty weeks. If, over a period of 40 weeks, there were actually nine failures, use three sigma limits to determine whether this is significantly more than the expected number.

(

Page 20: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

·/

186 NORMAL DISTRIBUTION

14 Cans of peas are tested for infection by certain organisms by storing them for a period of time before they leave the factory. Cans which contain one or more organisms burst open on account of fermentation. a If the number of organisms in a can is a Poisson variate, and on the average 76 in

every 10 000 cans burst open, find the mean number of organisms per can. b In batches of 5000 cans, find approximately the mean and standard deviation of the

number of cans which burst open, and give limits between which this number will almost certainly lie.

15 In samples of milk taken from a bulk transportation vehicle, 40 per cent proved to have no bacterial spores. a Assuming the Poisson law, estimate the mean number of bacterial spores per sample

and determine the proportion of samples-which would contain t�o bacterial spores. b Out of 1000 samples, how many would be expected to have only-one spore each?

Give limits between which this number very probably lies,

6.5 Probability limits for the sample mean of nvalues of the variable

If a random sample of n observations is drawn from a normally distributed population with mean µ, and standard deviation <J, it can be shown that the mean, x, of the sample:

/

(i) more likely than not (probability about�) lies in the interval µ, ± Jn (Figure 6-20).

(ii) very probably (probability of about 0.95) lies in the interval µ, ± 2 J,, (Figure 6-21). n

i.e. very probably,µ, - 2 J,, :,;; x:,;; µ, + 2 J,,

(iii) almost certainly lies in the interval µ, ± 3 Jn (Figure 6-22).

i.e. almost certainly, µ, - 3 Jn :,;; x :,;; µ, ,+ 3 Jn

68.27%

µ-.Q. µ µ+.Q. .Jn .Jn

Figure 6'...20

95.45% 99.74%

µ-gg µ µ+gg µ-;m µ µ+� .Jn .Jn .Jn .Jn

Figure 6-21 Figure 6-22

Page 21: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 187

The quantity J,, is known as the standard error of the mean. So, in Example 6, the mean

life of 100 randomly selected globes would almost certainly lie in the interval

2000 ± 3 :;..:oo, i.e. in the interval 2000 ± 60 h, and very probably would lie in the interval

100 2000 ±

2 �

O, i.e. in the interval 2000 ± 40 h.100

The standard error, J,,, gets smaller and smaller as n gets larger and larger, and

J,,---+O asn---+oo. n

If n = 400, the mean life of 400 randomly selected globes would almost certainly lie in the

interval 2000 ± 3 x �. i.e. in the interval 2000 ± 30.-v400

As n increases, x should give us a more reliable estimate of µ,. This is what we would expect. It seems feasible to suggest, then, that in order to get a true estimate of the population mean, take as large a sample as possible. However, this is not practical in many situations .. Certainly, in the case of the globes it would be wasteful and expensive. Why? We use general phrases such as 'more likely than not', 'very probably', 'almost certainly' when referring to probability limits and levels of significance such as the 2 sigma level and

the 3 sigma level. The probability that x lies in the intervalµ, ± 1.96 � is 0.95. .

-vn

Example 9 The mean weekly wage in a c,ertain industry is $500 with a standard deviation of $30. A random sample of 25 employees in this industry has a mean wage of $475. ls this significantly less, at the 3 sigma level, than the mean wage of the population?

At the 3 sigma level,

µ, ± 3 � = 500 ± 3 X � . -vn -v25

= 500 ± 18

Since 475 is not in this interval, it is significantly less at the 3 sigma level.

Exercises 6d

1 The mean weight of boys of a certain age is 50 kilograms with standard deviation of 5 kilograms. Within what limits would the mean weight of a random sample of 64 boys of this age very probably be?

2 At a certain school, the mean IQ (Intelligence Quotient) of the students is 100, with a standard deviation of 15. The mean IQ of a sample of 25 students was 112. Is this significantly higher than would be expected?

3 A sample, 55, 63,--69;-'B, is drawn from a population whose standard deviation is 4 and whose mean is thought to be 60. Do you think the population mean has been wrongly given?

-�

4 A machine makes electrical resistors which have a mean resistance of 50 ohms with a standard deviation of 2 ohms.

a Within what limits would we expe�t the mean of 25 randomly selected resistors to lie with a probability of 0.95? W\'',0 '

b Could we assume that 4 resistors whose resistances are 45, 46, 47 and 48 ohms are very probably drawn from this population?

Page 22: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

188 NORMAL DISTRIBUTION

5 Butter is marketed to retailers in cartons containing 16 packages drawn from a population normally distributed with a mean weight of 0.5 kg and a standard deviation of 0.02 kg.The butter in a particular carton weighed 8.15 kg. Is this significantly more than the expected weight at the 2 sigma level of significance?

6 The mean height of a sample of 25 students is 150 cm. Can we infer that this sample is drawn from a population of students of mean height 160 cm and standard deviation 10 cm?

'1-:: 2

7 The length of a certain species of fish has a normal distribution with mean 30 cm and standard deviation 2.5 cm. An angler caught nine such fish whose average length was 27 cm. Is this significantly less than the expected value at the 3a level?

6.6 Confidence limits

a Population mean Confidence limits are limits for the value of a parameter estimated from a particular value of a statistic. Discussion will be confined in this section to estimating a population mean from a single value of the variable or from the mean of a set of n observations. If xis a particular value of a variable, it was stated in Section 6.4 that almost certainly:

µ, - 3a ,,;; x ,,;; µ, + 3a . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (1)

Transposing (1) gives:

µ, ,,;; x + 3a and x - 3a ,,;; µ,i.e. x - 3a ,,;; µ, ,,;; x + 3a ................................................................. (2)

The population mean, µ,, therefore almost certainly lies in the interval x ± 3a. These limits for the value of µ, are called the 3 sigma confidence limits. Very probably, µ, lies between x ± 2a, these limits being called the 2 sigma confidence limits. If we have a random sample of n observations with mean x, we would expect x to give a better estimate of µ, than a single value of the variable. It can be shown that µ, almost certainly lies in the interval x ±

3;. The standard error, �.will decrease as n increases. 'V n 'V n

We have assumed that the standard deviation, a, of the population is known. If it is not known, the sample standard deviation, s, may be used as an estimate of it if the sample is large. The approximate 95% confidence interval for µ, would be given by:

Example 10

- 2a - 2ax- c,,;;µ,,,;;x+ ---,

vn ·. vn

The IQ (Intelligence Quotient) of a sample of 100 VCE students had a mean of 108 with a standard deviation of 15. Find a 95% confidence interval for the mean IQ of the population of VCE students.

n = 100 x = 108 s = 15 = estimate of a

ax = __!!__ = __11_ = 1 5 -Jn _!Too

Page 23: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 189

The approximate 95% confidence interval would be given by: - 2a - 2ax-

-Jn �µ�x+ -Jni.e. 108 - 2 X 1.5 � µ � 108 + 2 X 1.5

105 � µ � 111

We can be about 95 % sure that the mean IQ of the population lies in the interval 105 to 111. The confidence limits are 105 and 111.

Example 11 A random sample of 25 employees has a mean weekly wage of $320. Could this sample have been taken from a population of employees whose weekly wage is normally distributed with mean of $290 and standard deviation of $40?

n = 25 x = 320 a= 40

ax=_.!!_= 40= 8

-Jn 5

We can be almost certain that the mean wage of the population would be in the

interval .x ± ;fn i.e. in the closed interval 320 ± 3 x 8, i.e. (296, 344]Since 290 does not lie in this interval, we can reject the hypothesis that the sample is taken from a population whose mean weekly wage is $290.

b Proportions

Example 12 A random sample of 400 manufactured articles contains 80 defectives. Give 2 sigma confidence limits for the number of defectives in samples of 400 articles and give the proportion of defectives in the whole output of all samples of 400 articles.

Using this sample to estimate the probability of defectives:

n = 400 - 80 p = 400 = 0·2

where j3 denotes the sample proportion.

q = 0.8

Standard deviation = -J npq = vr-

4--,-00

-,------x-0

----,. 2

,-----x-o-=--.----=8

= 8.

The 2 sigma confidence limits for the number of defectives in samples of 400 articles are 80 ± 2 x 8. At this confidence level, the number of defectives lies between 64

and 96. Therefore the proportion of defectives lies between 4�6 and 10

60,

i.e. between 16 per cent and 24 per cent.

Page 24: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

190 NORMAL DISTRIBUTION

Alternatively:

Usingp as the sample proportion:- X 80 O P = n = 400 = ·2

q = l - p = 0.8

The 95% confidence intervals for pare given by:

P - 2 �P (1 n- p) :,;;,_ P :,;;,_ P + 2 �P (1 n- p)

i.e. 0.2 _ 2. /0.24�00.8 :5:: :5:: 0 2 + 2. /0.2 X 0.8

'\J ..._,_p ..._,_ • '\J 400

i.e. 0.16 :,;;,_ p :,;;,_ 0.24 as before

Formulae for the mean and standard deviation of a binomial distribution

Random variable Mean

Number of occurrences (np) µnjj = np

Proportion of occurrences (p) µ. p =P

Example 13

Standard deviation

<ln;; = .Jnpq

<l;; = vnpq= �n n

A coin is tossed 500 times, and a head appears uppermost 320 times. Give 95% confidencelimits for the porportion of heads and state whether you consider the coin�iassed?

n = 500,p = ��� = 0.64

The 95% confidence limits for pare given by:

P ± 2 �P (1 n-p)

= 0 64 + 2. /0.64 X 0.36. - '\J 500 = 0.64 ± 0.04

Sop lies between 0.60 and 0.68. In the case of a fair coin, p = 0.5 which is notcontained in this interval. The coin is biassed!

Page 25: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

�--,

NORMAL DISTRIBUTION 191

c Differences between population means

Example 14 A commercial traveller travels regularly between two towns, A and B, and has a choice of either of two routes, 1 and 2. To determine whether there is any significant difference in times taken, the traveller records the mean and standard deviation of times taken on a number of occasions on each route as shown:

Route Mean Standard deviation Sample size

1 x1

= 52 (min) s1 = 8 (min) 80

2 x2

= 50 (min) s2 = 6 (min) 100

Different sample sizes have been assumed to illustrate that they need not necessarily be the same. Let the mean and standard deviation of the population of times taken for routes 1 and 2 be µ1 and µ2 and 0"1 and u2 respectively. These are the population parameters corresponding to the sample statistics x1 and x2 and s1 and s2 respectively. If x 1 and X2 are independent random variables it can be shown that: a the mean of the difference X1 - x2 is:

i.e. the mean of the difference = the difference of the means

b the variance of the difference .x1 - x2 is:

..................................... (1)

(1� - = (1� + (1� • • • • • • • .. • • .. • • • • • • • • • • • .. • • • • • • .. • • • .. .. • • • • • • • • .. • .. • • • • .. • .. .. • • • .. • .. • • (2)xi -x2 xi X2

i.e. the variance of the difference = the sum of the variances(1 2 (12

Since: u� = _.!_ and u� = -1., equation (2) becomes: xi n1 2 n2

uf u½ (1� - = - + - . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . .. . . .. . . . . . . . .. . .. . .. . . . . . .. . .. . . . . .. . . .. .. (3)

xi -x2 n1 n2

and, so, the standard deviation of the difference· xi - x2 is given by:

v(12 (12

O'x -x = _.!_ + -1.1 2 n1 n2 ............................... (4)

If, then, X1 and X2 are the means of two large independent samples from populations with means µ1 and µ2 and standard deviations u1 and u2 respectively, and, if we make the assumption (hypothesis) that µi and µ2 are equal, then the sampling distribution of X1 - X2

may be considered as a normal distribution with mean µx1 -x2

= 0 and standard deviation:

(1 x - i = . I u1 + u½

. 1 2 'V n1 n2·

Equation (4) is referred to as the standard error of the difference between two means. For large samples, we can use the sample standard deviations, s1 and s2 in place of 0"1 and 0"2, so equation (4) can be approximated by:

Sx -x = . /st +

s½ 1 2 'Yn1 n2

''

Page 26: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

192 NORMAL DISTRIBUTION

Therefore, using the data of the example, x, = 52, s, = 8 and n, = 80 for route 1, and x2 = 50, s2 = 6, and n2 = 100 for route 2, then from equation (4):

8 2 6 2 ax1 -x2 = 80 + 100= 1.08

and: x, - x2 = 52 - 50 = 2At the 95% confidence limits, the expected value of µx

1 -x

2 = 0 should lie in the interval

2 ± 2 x 1.08, i.e. in the interval [ - 0.16, 4.16]. Since it does, we conclude that there is no significant difference in times taken at the 2 sigma confidence levels.

d Differences between population proportions

Example 15 An opinion poll found that 75 out of 100 males and 180 out of 200 females interviewed were in favour of a certain proposal. Is there any significant difference in the overall proportion of males and females favouring the proposal?

Using the subscripts 1 and 2 for male and female respectively, thenp, andp2 refer . to the sample proportions of males and females in favour of the proposal.

75 3 - 180 9 p I 100 =

4 pz = 200 = 10

25 1 - 20 1 q, 100 = 4 qz = 200 10n2 = 200

If p, and p2 denote the population proportions, then, if there is no significant difference in the overall proportion in favour, then p, - pz = 0. Since p, and p 2 are the sample proportions drawn from populations with parameters p, and p2 respectively, then the sampling distribution of p, - pz has mean:

. . . . . . . ......... . . . . . . . . . . ..... (1)

and standard deviation:

Gfi1 -fi

2 = ..J a}1 + a}2 ............................................................. (2)

where a} and a} are the variances of the sampling distributions of p, and pz. Compard (1) and\2) with (1) and (2) in the section on the differences between population means (page 191). Assuming a binomial distribution with mean = np and variance = npq then:

a2. =

n,p,q, =

p,q,P1 n 2 n, I

and: a2. n2p2 qz P2 qz n 2 =--

P2 n2 2

.'. Equation(:?,) becomes:

. . . . . . .. . . . . . . . . . . . . . . . . . (3)

Page 27: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 193

Compare equation (3) with equation (4) in the section on differences betweenpopulation means (page 191). The standard deviation in equation (3) Gfi

1 -fi

2 is

referred to as the standard error of the difference between two proportions.

Using p I and JJ2 as estimates of p I and p2, and the data from the example, equation(3) becomes:

_ _ = �0.75 X 0.25 + 0.9 X 0.1 = 0 048ap 1 -pz 100 200 ·

andp1 - JJ2 = 0.75 - 0.9 = -0.15

At the 950Jo confidence limits, the expected value of P1 - P2 = 0 should lie in theinterval -0.15 ± 2 x 0.048, i.e. in the interval [ -0.246, -0.054]. Since it does not, we conclude that there is a significant difference in the proportions at the 2sigma confidence limits.

Note: If we assume no difference in the proportions in the two populations, then we can use whatis called a 'pooled estimator' Po where:

- n1fi1 + n2fi2Po

=

n1 + n2

75 + 180 100 + 200

= 0.85

which replaces p I and JJ2.

So _ _ = . /0.85 X 0.15 +

0.85 X 0.15 = O 044' ap1 -Pz 'J 100 200 ·

Exercises Se

1 The weighing error of a certain type of balance has a standard deviation of 0.002. Give3 sigma confidence limits for the true weight of a specimen weighed as 12.366 g.

2 The coefficient of variation of the error of a certain measuring instrument is 3.0 percent. An observation taken by the instrument is 12.52. Obtain 3 sigma confidence limitsfor the true value. (The coefficient of variation is the ratio of the standard deviationto the mean.)

3 Within what limits would the mean height of male university students very probablybe if the mean height of a random sample of 100 students is 170 cm with a standard deviation of 5.5 cm?

4 Variables V and Tare known to be connected by the formula: V = 7 .62 + bT.

T can be measured accurately but the measurements of V are subject to a random error which is normally distributed with mean O and standard deviation 0.12. An observationV = 11.23 is obtained when T = 5. Determine limits within which b almost certainlylies.

5 Observations of weight made with a certain balance are equal, on the average, to theI

true weight and have a coefficient of variation from it of 0.5 per cent. a If a large number of observations were made of a specimen having a true weight of

6.000 g, what standard deviation should they have? b If a single observation were made, within what limits is it almost certain to lie?c The mean of 25 observations of the weight of another specimen is 3.000 g. -Determine

the limits between which its true weight will almost certainly lie.

/

Page 28: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

194 NORMAL DISTRIBUTION

6 Within what limits would the mean length of a certain species of fish very probably lie

if the mean length of a random sample of 25 such fish was 30 cm and standard deviation

4 cm, assuming lengths are normally distributed?

7 A random sample of 200 patients was treated with a certain drug and 150 were cured.

Give 2 sigma confidence limits for the number of cures, and also the proportion of cures.

8 A random sample of 150 voters in an electorate indicated that 60 per cent of them were

in favour of a certain candidate at the forthcoming elections. Give 2 sigma confidence limits for the proportion of all voters in the electorate in favour of this candidate.

9 A dental inspector finds that, out of a group of 204> children, 45 have tooth decay. Give

a 950Jo confidence interval for the proportion of children with tooth decay.

10 A Gallup Poll establishes that out of a group of 150 people interviewed 72 were in

favour of a certain proposal. Give two sigma confidence limits for the number, in

groups of 150 people, in favour of the proposal and, therefore, the overall proportion in favour.

11 Out of a group of 100 patients suffering from a particular disease, 75 were cured by

drug X. Give a 950Jo confidence interval for the proportion of p_atients cured by this

drug.

12 A targetshooter fires 200 rounds at the target and scores 120 bull's-eyes. Give two sigma

confidence limts for the number of bull's-eyes in rounds of 200.

13 In a random sample of 100 rods produced in a manufacturing process, 25 were rejected as faulty. Give a 950Jo confidence interval for the overall proportion of rods rejected.

14 A gardener planted 80 seeds of which 64 germinated. Is this number consistent with the

gardener's claim that 900Jo of seeds planted usually germinate? Use two sigma

confidence limits.

15 Hospital records show that 750Jo of patients suffering from a certain complaint recover.

A new drug cured 165 out of 200 patients on whom it was tried. Use 2 sigma confidence

limits to determine whether this new drug is more effective.

16 The heights of a random sample of 100 VCE students is randomly distributed with a

mean of 175 cm and standard deviation of 5 cm. Give a 950Jo confidence interval for

the mean height of VCE students.

17 In a particular electorate, 260 voters out of a random sample of 400 expressed their

intention of voting for a particular party. Give 2 sigma confidence limits for the number of voters who would vote for this party in other samples of 400 in the electorate. Hence

give 950Jo confidence limits for the proportion of voters in the electorate in favour of

this party.

18 An agency conducted a survey of a random sample of 400 viewers of a TV show and

found that 175 of them were children. Is this consistent with the agency's claim that

500Jo of viewers of this show are children? Use a 95 OJo confidence interval of proportions.

19 A machine makes electrical resistors which have a mean resistance ofµ ohms with

standard deviation 2 ohms. A random sample of 25 resistors have mean resistance of

50 ohm. Assuming that the distribution is normal, give an approximate 990Jo confidence

interval forµ. (Use normal distribution tables).

Page 29: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

NORMAL DISTRIBUTION 195

20 If Xis a normally distributed random variable with standard deviation of 5 and themean of 64 observations is 2.5, give a 2 sigma confidence interval for the mean value of X.

21 The wingspan of two different species of birds, A and B, is normally distributed with standard deviation 5 cm. The mean wingspan of a random sample of 25 birds of species A is 35.6 cm, while the mean wingspan of a random sample of 50 birds of species B is 37 cm. a Calculate the standard deviation of the difference in the sample means. b Give 95% confidence limits for the mean wingspan of each species. c Is the difference in the means significant at this confidence limit?

22 The weekly wages in two industries, A and B, are normally distributed. A randomsample of the wages of 50 employees from each industry was taken. For A the mean was $450 and variance $90, and for B the mean was $500 and variance $160. a Calculate the variance of the difference of the sample mean wage in each industry. b Is there any significant difference in their mean wages at the 2 sigma confidence

limits?

1 23 Twenty-five students sat for a standard aptitude test and their mean score was 80 withstandard deviation 3. After special coaching, a different group of 25 students sat for the same test. Their mean score and standard deviation were 83 and 4 respectively. a Give 95 OJo confidence limits for the mean score of all students in each of the two

situations. b What is the standard deviation of the difference in the sample means? c Is there any significant difference in the mean score at the 2 sigma confidence limits?

24 Tests are conducted to see if there is any significant difference in the average weight oftwo brands, A and B, of breakfast cereal. For brand A, the mean and standard deviation of the weights of 100 packets were 600 g and 3 g and, for brand B, they were 590 g and 4 g for 200 packets. Is the difference significant at the 2 sigma level?

�5 Workers were given a choice between a nine-day fortnight and a 10 per cent increasein wages. In Company A, 110 out of a random sample of 200 and, in Company B, 100 out of a random sample of 200 indicated they preferred the increase in wages. Does this indicate any significant difference in population proportions at the two companies in favour of a wage increase?

26 A dental inspector found that, in area A, 20 children out of a random sample of 200and, in areaB, 18 children out of a random sample of 150 had tooth decay. Does this indicate any difference in proportions at the 2 sigma level?

27 Under standard treatment, hospital records show that 120 out of a random sample of150 patients recover. A new drug was tried and was successful in curing 135 out of a random sample of 150 patients. Does this indicate that the new drug is significantly more effective at the 950Jo confidence limit?

28 A Gallup Poll found that, in a random sample of 150 voters in an electorate, 75 indicated a preference towards a particular political party and that, in a random sample of 200 voters in a neighbouring electorate, 110 indicated a preference towards the same party. Can we conclude at the 95 OJo level of significance, that these two electorates differed as regards the proportion of voters who preferred this particular political party?

29 In a random sample of 400 households in Melbourne, 180 showed a preference for aparticular brand of washing powder, and in a random sample of 300 households in Sydney, 120 preferred the same brand. Is there reason to doubt the hypothesis that equal proportions of households in Melbourne and Sydney preferred this brand of washing powder at the 95 OJo level of significance?

/

I (

Page 30: CHAPTER 6mathsbooks.net/Fitzpatrick Data and Reasoning/6 Normal...and then determine from the tables the areas from -oo to z1, and from -oo to z2. Then subtract to find the required

196 NORMAL DISTRIBUTION

30 To determine whether there is any significant difference in the mean weekly wage in two

particular industries, a random sample of size 100 was selected from each industry.

Their mean weekly wage, x, and standard deviation, s, were as follows:

X1 = $500, S1 = $50

X2 = $450, S2 = $60

Do these sample statistics indicate a significant difference in the mean weekly wage?

31 In one country, 120 men out of a random sample of 400 were more than 170 cm tall.

In a neighbouring country, the number was 240 out of 600. Does this indicate that there

is a greater proportion of men over this height in the second country than in the first?

32 A random sample of 50 male employees and 40 female employees was taken from a

particular industry and their mean numbers of hours absent from work for a particular

year were 60 and 57 respectively. Could these samples have been drawn from a

population with the same mean and a = 8 hours?