22
Advanced Level Mathematics Statistics 2 Steve Dobbs and Jane Miller Series editor Hugh Neill

Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

Embed Size (px)

Citation preview

Page 1: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

Advanced Level Mathematics

Statistics 2

Steve Dobbs and Jane Miller

Series editor Hugh Neill

Page 2: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGEThe Pitt Building, Trumpington Street, Cambridge, United Kingdom

CAMBRIDGE UNIVERSITY PRESSThe Edinburgh Building, Cambridge CB2 2RU, UK40 West 20th Street, New York, NY 10011-4211, USA477 Williamstown Road, Port Melbourne, VIC 3207, AustraliaRuiz de Alarcón 13, 28014 Madrid, SpainDock House, The Waterfront, Cape Town 8001, South Africa

http://www.cambridge.org

© Cambridge University Press 2000

This book is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place withoutthe written permission of Cambridge University Press.

First published 2000Fourth printing 2002

Printed in the United Kingdom at the University Press, Cambridge

Typefaces Times, Helvetica Systems Microsoft® Word, MathType™

A catalogue record for this book is available from the British Library

ISBN 0 521 78604 5 paperback

Cover image: Images Colour Library

Page 3: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

Contents

Introduction vii

1 Continuous random variables 1

2 The normal distribution 20

3 The Poisson distribution 48

4 Sampling 68

5 Hypothesis testing: continuous variables 99

6 Hypothesis testing: discrete variable 117

7 Errors in hypothesis testing 130

Revision exercise 151

Mock examinations 155

Answers 173

Index 183

Cumulative binomial probabilities 161

Cumulative Poisson probabilities 168

The normal distribution function 172

Page 4: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

1 Continuous random variables

This chapter looks at the way in which continuous random variables are modelledmathematically. When you have completed it you should

● understand what a continuous random variable is● know the properties of a probability density function and be able to use them● be able to use a probability density function to solve problems involving probabilities● be able to find the median of a distribution in simple cases● be able to calculate the mean and variance of a distribution.

1.1 Comparing discrete and continuous random variables

In S1 Section 6.1, you met the idea of a random variable and its probability distribution.In order to refresh your memory, consider the random variable, H , which is the numberof heads obtained when two coins are spun. This random variable can take the values 0,1 and 2 with probabilities of 1

4, 1

2 and 1

4 respectively.

Check that you know how these probabilities were obtained.

The notation used to express these results is P =H 0 14( ) = , P =H 1 1

2( ) = andP =H 2 1

4( ) = . Alternatively, you can display these results in a table, as in Table 1.1.

Number of heads, h 0 1 2

P =H h( ) 14

12

14

Table 1.1. Probability distribution of H, the number of heads when two coins are spun.

The random variables which you studied in S1 were all discrete random variables; thatis, there were clear steps between the possible values which the variable could take.There are many variables, however, which are not discrete.

Consider the following example. The ‘cars’ on a ski-lift are attached at equal intervalsalong a cable which travels at a fixed speed. The speed of the cars is so low that people canstep in and out of the cars at the station without the cars having to stop. The time intervalbetween one car and the next arriving at the station is 5 minutes. You do not know thetimetable for the cars and so you turn up at the station at a random time and wait for a car.Your waiting time, X (measured in minutes), is an example of a random variable becauseits value depends on chance. However, it is also a continuous variable because the waitingtime can take any value in the interval 0 to 5 minutes, that is 0 5� X < .

In S1 in order to describe a discrete random variable completely, you obtained aprobability for each possible value of the random variable. If you try the same approachwith a continuous random variable you run into difficulties: because the waiting time,X , can take an infinite number of values in the interval 0 to 5 minutes, you cannotwrite out a table like Table 1.1. However, you do know that if you arrive at the station at

Page 5: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

2 STATISTICS 2

a random time, all values of X are equally likely. Would it be possible to describe thedistribution by stating that all values of X between 0 and 5 are equally likely and byassigning the same non-zero probability to each one? This gives rise to another problem.The sum of the probabilities will not be one, as it should be, but will be infinite since Xtakes an infinite number of values. Obviously a different approach is needed in order todescribe the probability distribution of a continuous random variable. This will beconsidered in the next section.

1.2 Defining the probability distribution of a continuous random variable

Although it is not possible to give the probability that X (the waiting time in Section 1.1)takes a particular value, it does make sense to talk about X taking a value within a particularrange. For example, you would expect that P 0 2 5 1

2� �X .( ) = since the interval from 0 to2 5. accounts for half the values which X can take,and all values are equally likely. Extending this idea,you would expect P 0 1 1

5� X <( ) = since this intervalcovers 15 of the total interval. SimilarlyP 1 2� X <( ) = <( ) = <( )P P2 3 3 4� �X X= <( ) =P 4 5 0 2� X . . These probabilities could berepresented on a diagram similar to a histogram, asshown in Fig. 1.2. In this diagram the area of eachblock gives the probability that X lies in thecorresponding interval. Since the width of each blockis 1 its height must be 0 2. in order to make the areaequal to 0 2. . Notice that the total area under the curvemust be one since the probabilities must sum to one.

The choice of the intervals 0 to 1, 1 to 2 and so on isarbitrary. In order to make the model more general youneed to be able to find the probability that X lieswithin any given interval. Suppose that the divisionsbetween the blocks in Fig. 1.2 are removed so as togive Fig. 1.3. In this diagram, probabilities stillcorrespond to areas. For example, P 0 5 1. <( )X � isgiven by the area under the curve between 0 5. and 1.This is shown as the shaded area in Fig. 1.4: its valueis 0 1. as you would expect.

You should now be able to see that the probabilitydistribution of the waiting times can be modelled bythe continuous function fx( ) which describes Fig. 1.3.The required function is

ffor 0otherwise.

xx( ) = <

0 2 50. ,�

0 1 2 3 4 5Waiting time, x (min)

0.2 0.2 0.2 0.2 0.2

0.2

Fig. 1.2. Diagram to represent theprobabilities of waiting times for a ski-lift.

0 1 2 3 4 5Waiting time, x (min)

0.2

Fig. 1.3. Diagram to represent thedistribution of waiting times for a ski-lift.

0 1 2 3 4 5Waiting time, x (min)

0.2

0.5

Fig. 1.4. Diagram to represent theprobability of waiting between 0.5 and 1.0minutes for the lift.

Note that it is usual to define f x( ) for all real values of x.

Page 6: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 3

Once f x( ) has been defined, you can find probabilities by calculating areas below thecurve y = xf ( ) . For example, P 1 3 3 5 0 2 3 5 1 3 0 44. . . . . . .� �X( ) = × −( ) =

Notice that, if you want the probability that X = 1 3. or the probability that X = 3 5. , theanswer is zero. These are just single instants of time, and although it is theoretically possiblethat a car may arrive at either of those instants, the probability is actually zero. This meansthat P P1 3 3 5 1 3 3 5. . . .< <( ) = <( )X X� = P P1 3 3 5 1 3 3 5. . . .<( ) = ( )X X� � � . Thissituation is characteristic of continuous distributions.

The function f x( ) is called a probability density function . It cannot take negative valuesbecause probabilities are never negative. It must also have the property that the total areaunder the curve y = xf ( ) is equal to one. This is because this area represents theprobability that X takes any real value and this probability must be one.

The suitability of this function as a model for actual waiting times could be tested bycollecting some data and comparing a histogram of these experimental results with theshape of y = xf ( ) . Some results are given in Table 1.5.

Waiting time,x (min)

Frequency Relativefrequency

Class width Relative frequencydensity

0 1� x < 107 0.214 1 0.214

1 2� x < 98 0.196 1 0.196

2 3� x < 105 0.210 1 0.210

3 5� x < 190 0.380 2 0.190

Table 1.5. Waiting times for the ski-lift for a sample of 500 people.

The third column gives the relativefrequencies: these are found by dividing eachfrequency by the total frequency, in this case500. The relative frequency gives theexperimental probability that the waiting timelies in a given interval. The fifth column givesthe relative frequency density: this is found bydividing the relative frequency by the classwidth. The data are illustrated by the histogramin Fig. 1.6.

0 1 2 3 4 5Waiting time, x (min)

0.2

Relativefrequencydensity

Fig. 1.6. Histogram of relative frequency for the datain Table 1.5.

Normally a histogram is plotted with frequency density rather than relative frequencydensity on the vertical axis. The reason for using relative frequency density in this case(and others in this section) is that area then represents relative frequency and henceexperimental probability. As a result a direct comparison can be made between thisdiagram and Fig. 1.3, in which area represents theoretical probability. You can see thatthe diagrams are very similar. The experimental probabilities are not exactly equal to thetheoretical ones. For example, the experimental probability of waiting between 1 and 2minutes is 0.196 whereas theoretically it is 0.2. This is not surprising: you saw in S1Section 4.1 that a probability model aims to describe what happens ‘in the long run’.

Page 7: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

4 STATISTICS 2

The model for waiting times for a ski-lift was found by theoretical arguments and thenconfirmed experimentally. However, often it is not possible to predict the form of theprobability density function. Instead, data are collected and the shape of the resultinghistogram of the relative frequency density may suggest the form of the probabilitydensity function. Table 1.7 gives some more data; these relate to the time intervalbetween one patient and the next going into the consulting room at a doctor’s surgery.The doctor’s receptionist always allows at least a 5-minute interval between the start ofone consultation and the start of the next.

Time interval,x (min)

Frequency Relativefrequency

Class width Relative frequencydensity

5 6� x < 16 0.16 1 0.160

6 7� x < 14 0.14 1 0.140

7 8� x < 8 0.08 1 0.080

8 9� x < 9 0.09 1 0.090

9 10� x < 9 0.09 1 0.090

10 11� x < 8 0.08 1 0.080

11 13� x < 12 0.12 2 0.060

13 15� x < 6 0.06 2 0.030

15 20� x < 10 0.10 5 0.020

20 25� x < 6 0.06 5 0.012

Table 1.7. Time intervals between patients entering the consulting room, for 100 patients.

Fig. 1.8 shows a histogram of the relative frequency densities. The shape of this histogramsuggests that a very simple model to describe this situation might be the straight linesegment shown in Fig. 1.9. This line has been drawn to cut the horizontal axis at 25 sinceall the time intervals were less than this value. You can find the equation of this

2010Time interval, x (min)

0

Relativefrequencydensity

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

05 15 25

Fig. 1.8. Histogram of relative frequency for the data in Table 1.7.

f(x)

200 10Time interval, x (min)

c

155 25

Fig. 1.9. Simple model for the time intervals between patients entering the consulting room.

Page 8: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 5

line by remembering that the area under the graph of f x( ) must be one, since the totalprobability must be one. Let f 5( ) equal c. Then, since the region under the graph off x( ) is a triangle of area 1,

12 20 1× × =c , giving c = 1

10 .

Thus the gradient of the line is given by 0

25 5

110 1

200

−−

= − and the equation of the line is

y x− = − −( )110

1200 5 , or y x= − +1

20018.

The probability density function is therefore

f forotherwise.

x x x( ) = − +

1200

18 5 25

0� � ,

This model can then be used to findprobabilities. For example the probability of atime interval of more than 17 minutes is givenby the area of the shaded region in Fig. 1.10.This can be found by using simple geometry.

When x = 17, f x( ) = − × + =1200

18

12517 .

So P X >( ) = × × =17 812

125

425.

f(x)

200 10Time interval, x (min)

0.1

155 25

Fig. 1.10. Diagram to illustrate the theoreticalprobability of a time interval of more than17 minutes.

Alternatively you can use integration to find the area as follows.

P d

as before.

17

25

X x x x x>( ) = − +( )⌠⌡

= − +[ ]= − × + ×( ) − − × + ×( )= − +( ) − − +( )= − +( ) + ( ) = − + =−

17

25 25 17 17

1

1200

18

1400

18

1400

18

1400

18

625400

258

289400

178

625400

289400

258

178

336400

425

2

17

25

2 2

,

You should be able to deduce the value ofP X � 17( ) without further detailed calculation.

Fig. 1.11 reproduces Fig. 1.8. The experimentalvalue of PX >( )17 is given by the shaded areawhich is equal to

3 0 02 5 0 012 0 12 325×( ) + ×( ) = =. . . .

This agrees quite well with the theoreticalvalue. 2010

Time interval, x (min)0

Relativefrequencydensity

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

05 15 25

Fig. 1.11. Diagram to illustrate the experimentalprobability of a time interval of more than 17minutes.

Page 9: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

6 STATISTICS 2

Perhaps you can see a weakness in the model for time intervals in the doctor’s surgery.The model is based on a small data set in which no value is greater than 25 minutes. Asa result the model predicts that the time interval can never exceed 25 minutes. However,in the future it might be longer. Fig. 1.12 is a histogram of the relative frequency densityfor results collected from a larger sample of patients. You can see that quite a few valuesare greater than 25.

This histogram also shows another weakness in the original model: it looks as though acurve would fit the data better than a straight line. For example, a function of the type

f xk

xn( ) = where n and k are positive constants might be more suitable.

250Time interval, x (min)

Relativefrequencydensity

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0

0.18

20100 5 15 30 35 40 45

Fig. 1.12. Histogram of relative frequency for a larger sample of time intervals.

It turns out that the function f xx

( ) = 52 for x � 5 fits the histogram quite well.

Fig. 1.13 shows the histogram with the graph of this function superimposed on it. Youcan see that the curve and the histogram have similar shapes. This model has theadvantage that it sets no upper limit to the time interval.

250Time interval, x (min)

Relativefrequencydensity

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0

0.18

20100 5 15 30 35 40 45

Fig. 1.13. Histogram in Fig. 1.12 with fx( ) superimposed.

Page 10: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 7

Since f x( ) is always greater than zero it has one of the required properties of aprobability density function. In order to accept this function as a probability densityfunction, you also need to check that the area of the region underneath its graph is one.This can be done by integration as follows.

Area f d d d= ( )⌠⌡

= ⌠⌡

+ ⌠⌡

= + −

= − − −( ) =

−∞

−∞

x x xx

x

x

05

05

0 1 1

5

25

5.

So the function

f for

otherwise,x x

x( ) =

5 5

0

2 � ,

has the properties required of a probability density function.

Here is a summary of the properties of a probability density function.

The probability density function , f x( ), of a continuous random variable Xis defined for all real values of x. It has the properties:

(a) f x( ) � 0 for all x,

(b) f dx x( )⌠⌡

=−∞

∞1.

The probability that X lies in the interval a x b� � is given by the areaunder the graph of fx( ) between a and b. This area can sometimes be foundby using geometrical properties or it can be found from the integral

P f da X b x xa

b

� �( ) = ( )⌠⌡

.

Example 1.2.1The continuous random variable X has the probability density function given by

f forotherwise,

x k x x( ) = +( ) −

1 1 10

2 � � ,

where k is a constant.

(a) Find the value of k . (b) Find P 0 3 0 6. .� �X( ). (c) Find P X <( )0 2. .

Page 11: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

8 STATISTICS 2

(a) Using the second property in the shaded box,

f d dx x k x x k x x

k k k k k

( )⌠⌡

= +( )⌠⌡

= +[ ]= + ×( ) − −( ) + × −( )( ) = × − × −( ) =

−∞

− −1

1 1 1 1

2

1

13

1

1

3 3

13

13

13

43

43

83 .

Since f dx x( )⌠⌡

=−∞

∞1, 8

3 1k = giving k = 38.

(b) P d

correct to 3 significant figures.

0 3 0 6 1

0 6 0 6 0 3 0 3

0 136

38

38

13

38

13

38

13

2

0 3

0 63

0 3

0 6

3 3

. .

. . . .

. ,

.

.

.

.� �X x x x x( ) = +( )⌠

⌡= +[ ]

= + ×( ) − + ×( )=

(c) You met x , the modulus of x, in P1 Section 8.3. The statement x < 0 2. isequivalent to − < <0 2 0 2. .x .

P = P d

.

X X x x x x<( ) − < <( ) = +( )⌠⌡

= +[ ]= + ×( ) − −( ) + × −( )( ) =

0 2 0 2 0 2 1

0 2 0 2 0 2 0 2 0 152

38

38

13

38

13

38

13

2

0 2

0 23

0 2

0 2

3 3

. . .

. . . . .

– .

.

– .

.

Example 1.2.2It is proposed to model the annual salary, X , measured in thousands of £, paid to salespersons in a large company by the probability density function

f forotherwise.

x cx x( ) =

− 72 16

0� ,

(a) Find the value of c.(b) Find the probability that a person in this profession chosen at random earns between£20,000 and £30,000 per year.

(a) f d d16

x x cx x cx c c( )⌠⌡

= ⌠⌡

= −[ ] = −( ) − − ×( )−∞

∞−

∞− ∞ −

=72

52

522

525

1256016

0 16 .

Since f dx x( )⌠⌡

=−∞

∞1, 1

2560 1c = , giving c = 2560.

(b) P d

correct to 3 significant figures.

20

30

20 30 2560 2560

2560 30 2560 20

0 365

72

52

52

52

25

25

25

20

30� �X x x x( ) = ⌠

⌡= × −( )[ ]

= × −( ) ×( ) − × −( ) ×( )=

− −

− −

. ,

Page 12: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 9

Exercise 1A

In all the questions in this exercise, c and k are constants. In this and the followingexercises, some questions involve the exponential function ex . If you have not alreadymet this function in P2, you should omit these questions.

1 The probability density function fotherwise.

x c x x( ) = −( )

1 0 80

18 � � ,

(a) Find the value of the constant c. (b) Find PX � 6( ) .

(c) Find P 4 6� �X( ) .

2 The probability density function fotherwise.

x kx x( ) =

2 0 30

� � ,

(a) Find the value of the constant k .

(b) Find PX � 2( ).(c) Find P 1 5 2 5. .� �X( ).(d) Given that the probability that X is less than h is 0.2, find the value of h, correct to 2

decimal places.

3 The probability density function fotherwise.

x c x x( ) = +( )

2 2 0 30

� � ,

(a) Find the value of the constant c. (b) Find PX � 1 5.( ) .

4 The probability density function fotherwise.

x c x x( ) = −( ) −

4 2 20

2 � � ,

(a) Find the value of the constant c. (b) Find PX � 0( ) .

(c) Find PX �1( ). (d) Find P X �1( ) .

(e) Find P −( )0 5 0 5. .� �X .

5 The life, X , of the StayBrite light bulb is modelled by the probability density function

f eotherwise,

x k xx( ) =

−2 00

� ,

where X is measured in thousands of hours.

(a) Find k .

(b) Find the probability that a StayBrite bulb lasts longer than 1000 hours.

(c) Find the probability that a StayBrite bulb lasts less than 500 hours.

6 A computer ink cartridge has a life of X hours. The variable X is modelled by the

probability density function fotherwise.

x k x x( ) =

−2 4000

� ,

(a) Find k .

(b) Find the probability that such a cartridge has a life of at least 500 hours.

(c) Find the probability that a cartridge will have to be replaced before 600 hours of use.

(d) Find the probability that two cartridges will have to be replaced before each has been used for 600 hours.

Page 13: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

10 STATISTICS 2

7 The probability density function

fotherwise,

x k x a x a( ) = −( )

2 00

� � ,

is shown in the sketch.

(a) Use the information given in the sketch andthe properties of probability density functions to find the values of a and k .

(b) Find P X a� 12( ) .

0 a x

f(x)

1

1.3 The median of a continuous random variable

The median, M , of a continuous random variable is defined as the value which dividesthe area under the probability density function into two equal halves. Then theprobability that X is above the median is equal to the probability that it is below. Inmathematical terms the median is defined as follows.

The median, M, of a continuous random variable is that value for which

P f d =X M x xM

�( ) = ( )⌠⌡ −∞

12 .

In simple cases the median can be found byconsidering symmetry. Fig. 1.14 reproducesFig. 1.3 which showed the probability densityfunction for the waiting times for the ski-lift.The line x = 2 5. , which is shown on thediagram, divides the area under fx( ) in halfand so the median waiting time, M , is 2.5minutes.

0 1 2 3 4 5Waiting time, x (min)

0.2

M

Fig. 1.14. Fig. 1.3 with the median shown.

Example 1.3.1Two models are proposed for a garage’s weekly sales, X , of petrol measured in units of100 000 litres.

The first is ffor 0otherwise.

xx x( ) =

2 10

� � ,

The second is g for 0otherwise.

x x x x( ) = −( )12 1 10

3 2 � � ,

(a) Find the median for the first model.

(b) Show that median of the second model is the same as that of the first model.

Page 14: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 11

0 1 xM1

1

2

2M1

f(x)

Fig. 1.15. Graph for part (a) of Example 1.3.1.

0 1 xM1

1

2g(x)

Fig. 1.16. Graph for part (b) of Example 1.3.1.

(a) Fig. 1.15 shows the graph of fx( ). The median is denoted by M1. The area of

the shaded triangle is 0.5. When x M= 1, f x M( ) = 2 1 so

the area of shaded triangle = 12 1 12 0 5× × =M M . ,

giving M112 0 707= = . , correct to 3 significant figures.

(b) Fig. 1.16 shows the graph of gx( ) , with the median, M1, for the first modelmarked. If the shaded area in Fig. 1.16 is equal to 0.5 then M1 is also the medianfor the second model.

The area of the shaded region is

12 1 12 12 3 2

3 2 3 2

3 2

0

3 5

0

4 6

0

14

16

12

12

14

1 11x x x x x x x x

M M M M M

M M M−( )⌠

⌡= −( )⌠

⌡= −[ ]

= − = −( )

d d

.

Recalling that M12 1

2= , the shaded area is 12

12

12

34

14

123 2

2× − ×( )( ) = − = .

Thus the median of the second model is the same as the median of the first model.

Example 1.3.2Find the median salary (to the nearest £100) of the probability density function inExample 1.2.2.

Fig. 1.17 shows the graph of f forotherwise.

x x x( ) =

−2560 160

72 � ,

The median is indicated by M and theshaded area is equal to 0.5.

Set the upper limit of the integral to M ,form an equation and solve it for M asfollows.

0.2

0.1

0 16 xM

f(x)

Fig. 1.17. Graph of the probability density function for Example 1.3.2.

Page 15: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

12 STATISTICS 2

P d dX M x x x x

x

M

M

M M

M

�( ) = ⌠⌡

= ⌠⌡

= × −( )[ ]= × −( ) ×( ) − × −( ) ×( )= − +

−∞

− −

2560 2560

2560

2560 2560 16

1024 1

72

72

52

52

52

52

16

16

25

25

25

.

This probability must equal 0.5, so − + =−1024 152 1

2M , giving 102452 1

2M− = .

Hence M− =52 1

2048, so M52 2048= , giving M = 21 1. , correct to 3 significant

figures.

So the median salary is £21,100 to the nearest £100.

Exercise 1B

1 The probability density function fotherwise.

x x x( ) =

19

2 0 30

� � ,

Find the median value of X .

2 A computer ink cartridge has a life of X hours. The variable X is modelled by the

probability density function fotherwise.

x x x( ) =

−400 4000

2 � ,

Find the median lifetime of these cartridges.

3 The probability density function fotherwise.

xx x( ) = −

2 4 2 30

� � ,

(a) Sketch the graph of fx( ).(b) Find the median value of X .

4 The life, X , of the StayBrite light bulb is modelled by the probability density function

f eotherwise,

x xx( ) =

−2 00

2 � ,

where X is measured in thousands of hours.

(a) Sketch the graph of fx( ).(b) Find the median life of these StayBrite bulbs.

5 The probability density function fotherwise.

x x x( ) = −( )

25

151 0 5

0� � ,

(a) Sketch the graph of fx( ).(b) Find the median value of X .

Page 16: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 13

1.4 The expectation of a continuous random variable

The expectation (or mean) of a continuous random variable, X , is defined by

E f dX x x x( ) = = ( )⌠⌡ −∞

∞µ . (1.1)

It is not possible to deduce this formula, but the following argument may help you to seewhy this definition makes sense.

δx

etc.

0 x

y

y = f(x)

δx

δAy

x x + δx

y + δy

Fig. 1.18a. Generalised probability density function. Fig. 1.18b. Enlarged version of one of the strips in Fig. 1.18a.

Look at Fig. 1.18a. This shows a probability density function, f x( ), for a continuous randomvariable, X . The region underneath y x= f ( ) has been divided into narrow strips of widthδx . Fig. 1.18b shows one of these strips, with some dotted lines added.

The probability that X takes a value between x and x x+ δ is δA, the area of the strip.

Comparing with the equation E PX x X x( ) = =( )∑ , this narrow strip makes a contribution toE(X) which can be denoted by δE where

δE is between x Aδ and x x A+( ) ×δ δ .

Using the ideas which you met in P1 you can say that δA lies between y xδ and y y x+( )δ δ ,

so δE lies between xy xδ and x x y y x+( ) +( )δ δ δ .

Dividing through by δx gives

δδE

x lies between xy and x x y y+( ) +( )δ δ .

When δx tends to 0, δδE

x tends to

d

d

E

x. Also δy tends to 0, so that y y+ δ tends to y. It

follows that d

d

E

xxy= .

Since y x= f ( ) this can also be written d

df

E

xx x= ( ) .

Integrating,

E x x x= ( )⌠⌡ −∞

∞f d .

Page 17: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

14 STATISTICS 2

Note the correspondence between this equation and that for a discrete variable: f dx x( )replaces PX x=( ) and the summation is replaced by an integral.

Example 1.4.1Find the mean salary (to the nearest £100) of the probability density function defined inExample 1.2.2 and compare it with the median salary which was calculated inExample 1.3.2.

From Equation 1.1,

µ = E f d d

d

( )

.

X x x x x x x

x x x

= ( )⌠⌡

= ×⌠⌡

= ⌠⌡

= × −( )[ ]= ( ) − × −( ) ×( ) =

−∞

∞−

−∞

− ∞

2560

2560 2560

0 2560 16 26

72

52

32

32

16

16 16

23

23

23

To the nearest £100, the mean salary is £26,700.

The mean is greater than the median (= £21,100) because the distribution ispositively skewed.

1.5 The variance of a continuous random variable

The variance of a continuous random variable, X , is given by

Var = = f d2X x x x( ) ( )⌠⌡

−−∞

∞σ µ2 2. (1.2)

This formula shows the same parallel with the corresponding formula for a discreterandom variable that was noted in the previous section. The variance of a discreterandom variable is given by

Var = PX x X x( ) =( ) −∑ 2 2µ .

As in the expectation formula f dx x( ) replaces PX x=( ) and the summation is replacedby an integral.

Example 1.5.1For the continuous random variableX with probability density function defined by

f for 0otherwise,

x x x x( ) = −( )

34 2 20

� � ,

find (a) the mean, (b) the variance, (c) Pµ σ µ σ− < < +( )X .

Page 18: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 15

(a) Using Equation 1.1

µ = = ( )⌠⌡

= −( )⌠⌡

= −( )⌠⌡

= × − ×[ ] = × − ×( ) = − =

−∞

∞E f d d

d

( )

.

X x x x x x x

x x x x x

34

2

0

2

32

2 34

3

0

232

13

3 34

14

4

0

2 12

3 316

4

2

2 2 4 3 1

If you look at Fig. 1.19, which shows thegraph of y x= f ( ) , you will see that thereis a much quicker way of arriving at thisresult. Since this graph is symmetricabout the line x = 1, the mean must be 1.

For a symmetrical probability density function, the mean is most easily found by using symmetry.

(b) Using Equation 1.2

0 1 2 x

f(x)x = 1

Fig. 1.19. Graph of the probability density functionfor Example 1.5.1.

σ µ2 2 2

2

0

22 3 4

0

2

4 5

0

2

34

32

34

32

14

34

15

245

15

2 1 1

1 6 0 1 0 2

= ( ) ( )⌠⌡

= × −( )⌠⌡

− = −( )⌠⌡

= × − ×[ ] − = −( ) − ( ) − = =

−∞

∞Var = f d

d d

X x x x

x x x x x x x

x x . .

(c) P P

d d

µ σ µ σ− < < +( ) = − < < +( )= −( )⌠

⌡= −( )⌠

= × − ×[ ]= +( )

+

+

+

X X

x x x x x x

x x

1 0 2 1 0 2

2

1 0 2

34

32

34

32

12

34

13

34

1 0 2

1 0 22

1 0 2

1 0 2

2 3

1 0 2

1 0 2

. .

.

.

.

.

.

.

.

22 3

2 3

2 3

2 3

14

34

14

34

14

34

14

1 0 2

1 0 2 1 0 2

1 447 1 447

0 552 0 552

0 626

− +( )( )− −( ) − −( )( )

= × − ×( )− × − ×( )

=

.

. .

. ... . ...

. ... . ...

. , correct to 3 significant figures.

Exercise 1C

1 The probability density function fotherwise.

x x x( ) =

19

2 0 30

� � ,

Find the mean and variance of X .

Page 19: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

16 STATISTICS 2

2 The probability density function fotherwise.

xx x( ) = −

2 4 2 30

� � ,

Find the mean and variance of X .

3 The probability density function fotherwise.

x x x( ) = −( )

14

181 0 8

0� � ,

(a) Sketch the graph of fx( ). (b) Find the mean and variance of X .

4 The mass, X kg, of silicon produced in a manufacturing process is modelled by the

probability density function fotherwise.

x x x x( ) = −( )

332

24 0 40

� � ,

(a) Sketch the graph of fx( ).(b) Find the mean and variance of the mass of silicon produced.

5 The EverOn torch battery has a life of X hours. The variable X is modelled by the

probability density function fotherwise.

x x x( ) =

−3000 100

4 � ,

(a) Sketch the graph of fx( ).(b) Find the mean and variance of the lives of these EverOn torch batteries.

6 A computer ink cartridge has a life of X hours. The variable X is modelled by the

probability density function fotherwise.

x kx x( ) =

−2 400 9000

� � ,

(a) Sketch the graph of fx( ). (b) Show that k = 720 .

(c) Find the mean and variance of the lives of these cartridges.

Questions 7 and 8 require integration techniques covered in P3.

7* The life, X , of the StayBrite light bulb is modelled by the probability density function

f eotherwise,

x xx( ) =

−2 00

2 � ,

where X is measured in thousands of hours.

Find the mean and variance of the lives of these StayBrite bulbs.

8* The radioactivity of krypton decays according to the probability model

f eotherwise.

x k xx( ) =

−λ � 00

,

(a) Show that λ = k .

(b) Find the mean and variance of X .

Page 20: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 17

Miscellaneous exercise 1

1 A continuous random variable, X , has probability density function given by

felsewhere.

x x x( ) =

18 0 4

0

� � ,

(a) Calculate PX <( )2 . (b) Calculate the expected value of X . (OCR)

2 A continuous random variable, X , has the probability density function

felsewhere.

x x( ) =

15 0 5

0

� � ,

Find

(a) the mean, EX( ), (b) Var X( ) . (OCR)

3 The time, in minutes, between two consecutive calls to a telephone switchboard ismodelled by a continuous random variable, X . The probability density function, fx( ), forthis random variable is given by

fotherwise.

xk x x( ) = −( )

10 0 100

� � ,

(a) Calculate the value of k .

(b) Find the mean time, EX( ), between two consecutive calls.

(c) Find Var X( ) . (OCR)

4 A continuous random variable, X , has the probability density function, f x( ), given by

fotherwise.

xk x x( ) = −( )

4 0 40

� � ,

(a) Sketch the probability density function. (b) Determine the value of k .

(c) Calculate the probability that X > 2 5. . (d) Find EX( ). (OCR)

5 A continuous random variable, X , has the probability density function

fotherwise.

x x x( ) =

12 0 2

0

� � ,

(a) Find the median value of X .

(b) Find E X( ). (c) Find Var X( ) . (OCR, adapted)

6 A continuous random variable, U , is uniformlydistributed on 0 5 2 5. .� �u , as shown in thediagram.

(a) Find the probability density function fu( ) .

(b) State the mean of U .

(c) Use integration to calculate the variance of U .

0 u

f(u)

0.5 2.5

(OCR)

Page 21: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

18 STATISTICS 2

7 The length, in metres, of ‘offcuts’ of wood found in a timber yard can be modelled by acontinuous uniform distribution with density function, fx( ), defined as

fotherwise.

x kx( ) =

10 2 0 8

0

. . ,� �

(a) Write down the value of k .

(b) State the mean length.

(c) Calculate the variance of the length.

8 The random variable, X , has probability density function

fotherwise.

x kx x( ) =

3 0 20

� � ,

(a) Find the value of k .

(b) Find E X( ).(c) Find Var X( ) .

(d) Find the median of the distribution.

(e) Find the probability that an observation lies within one standard deviation of the mean.

(OCR)

9 The random variable, X , has probability density function

fotherwise.

x x x( ) = λ 3 0 40

� � ,

(a) Find the value of λ . (b) Find E X( ).(c) Find Var X( ) . (d) Find the probability P 1 2< <( )X . (OCR)

10 The continuous random variable, X , has probability density function

fotherwise.

xk

xx( ) =

1 2

0

� � ,

(a) Find the value of the constant, k .

(b) Find the mean, EX( ).(c) Find the variance, VarX( ) .

(d) Determine the median value of X .

(e) Show that the probability that X is less than the mean is − ( )ln ln

ln

2

2. (OCR)

11 An internet surfer suggests that the time (t minutes) that he spends on the internet can bemodelled by the probability density function

f eotherwise.

t tt( ) =

−0 1 00

0 1. ,. �

(a) Verify that this is a properly defined probability density function.

(b) Find the probability that the surfer spends less than 4 minutes on the internet.

(c) Find the probability that the surfer spends more than 10 minutes on the internet.

Page 22: Statistics 2 - Assets - Cambridge University Pressassets.cambridge.org/052178/6045/sample/0521786045ws.pdf · understand what a continuous random variable is ... This random variable

CHAPTER 1: CONTINUOUS RANDOM VARIABLES 19

12 The random variable X has probability density function

f eotherwise.

x a xax

( ) =

− 00

� ,

(a) Find the median value of X .

The above distribution, with a = 0 8. , is proposed as a model for the length of life, in years,of a species of bird.

(b) Find the expected number out of a total of 50 birds that would fall in the class interval 2–3 years.

13 The continuous random variable X has probability density function

fotherwise.

xx a a x a x a( ) = −( ) −( )

2 20

� � ,

(a) Show that a3 6= . (b) Find E X( ). (OCR)

14 A farmer needs to install a new water-pump. Pumps almost always run perfectly for thefirst year but thereafter if they fail they are not worth repairing and have to be replaced.They virtually never last more than 9 years. The length of time, in years, that the pumpslast can be modelled by the continuous random variable X which has probability densityfunction given by

fotherwise,

xk

xx( ) =

1 9

0

� � ,

where k is a constant.

(a) Show that k = 1

2 3ln.

(b) Find the median length of life of a pump.

(c) Find the probability that a pump lasts between 1 and 2 years only.

(d) The farmer is offered a guarantee to cover the cost of replacing a pump that fails during the second year, at a cost of £300. Given that the pump will cost £1000 to replace if it fails during this year, what advice would you give the farmer about the merits of purchasing the guarantee?

(e) Pumps can be rented for an installation charge of £200 plus £250 per year, payable in advance. The yearly payment is not refundable if the pump fails before the end of the year. The farmer does not purchase the guarantee. Find the probability that a pump, at the end of its life, would have cost more to rent than to buy for £1000. (OCR, adapted)