Statistics and Probability - Solved Assignments - Semester Fall 2007

Assignment 1 Question 1

A district is divided into two areas, viz urban area and rural area. Total population of the

district is 271,076 out if which only 46,740 live in the urban area. Total male population of the area is 139,699 and that of the urban area is 23,083.Total unmarried population of the district is n112, 352 out of which 36,864 are rural females. In the urban area,

unmarried people number 21,072 out of which 12,149 are males.

Prepare a table showing the population of the district by material status, by

residence and by sex.

SOLUTION

Question 2

The following figures give the numbers of children born to 50 women in a certain locality

up to the age of 40 years; 1, 5, 1, 1, 2, 5, 9, 2, 6, 3, 5, 7, 8, 4, 6, 8,

9, 10, 9, 3, 5, 7, 9, 9, 4, 5, 4, 5, 5, 7, 3, 4, 2, 3, 4, 6, 3, 4, 2, 5, 6, 4, 0, 5, 6, 8, 5, 4, 7, 6

a) Make a frequency distribution taking class interval size 1.

b) Find the cumulative frequency distribution. c) Find relative frequency distribution also.

SOLUTION

a) FREQUENCY DISTRIBUTION

RURAL URBAN Total

Male female Male Female

Married 62200 70856 10934 14734

Unmarried 54416 36864 12149 8923

116616 107720 23083 23657

Total 224336 46740 271076

No of children

tally frequency

0 1

1 3

2 4

3 5

4 8

5 10

6 6

7 4

8 3

9 5

10 1

Total 50

b) COMMULATIVE FREQUENCY DISTRIBUTION

No of children

frequency Cumulative frequency

0 1 1

1 3 4

2 4 8

3 5 13

4 8 21

5 10 31

6 6 37

7 4 41

8 3 44

9 5 49

10 1 50

Total 50

c) RELATIVE FREQUENCY DISTRIBUTION

No of children.

frequency Relative frequency.

0 1 1/50

1 3 3/50

2 4 4/50

3 5 5/50

4 8 8/50

5 10 10/50

6 6 6/50

7 4 4/50

8 3 3/50

9 5 5/50

10 1 1/50

Total 50 1

Question 3

a) In which situations weighted mean and arithmetic mean are used.

b) From the following data, find the weighted mean.

SOLUTION

a) Sometimes we want to find the average of certain values which are not of equal

importance. When the values are not of equal importance, we assign them certain numerical values to express their relative importance. These numerical values are known

as “weights”. When the observations are associated with certain weights then we use the weighted average. While the arithmetic mean is used simply when we want average. Weighted mean and arithmetic mean will be equal when weights and the frequencies are

equal.

b)

Items Expenditure(Rs.)(X)

Weights (W)

WX

Food 3,000 20 60,000

House rent 600 8 4800

Clothing 200 5 1000

Fuel & Electricity 150 4 600

Education 100 2 200

Miscellaneous 50 1 50

Total ∑w=40 ∑wx=666

50

Items Food House

rent

Clothin

g

Fuel &

electricity

Educatio

n

Miscellaneous

Expenditure Rs.

3000 600 200 150 100 50

Weights 20 8 5 4 2 1

Weighted Mean=wX =

W

WX

=40

66650=1666.25

Assignment 2 Question.1

a. Explain why it is difficult to calculate Mode and G.M for the following values, 4, 5, -10, 0, 20.

b. Calculate the Harmonic Mean for the following data.

Solution:

a. AS mode is that value which occurs maximum number of time in the data and in this case each value appears only one time, so mode can not be exist. By definition geometric mean of a set of positive values is the nth root of the

product of the values and in this case one value is with the –ve sign also when all the values will be multiplied they will be zero, square root of zero does not

exist. b.

X f 1/x f(1/x)

1 2 1 2

2 3 0.5 1.5

3 4 0.33 1.32

4 1 0.25 0.25

Total 10 5.07

Now as we know that:

H.M= 1/

f

f x

=10

5.07= 1.972

Question.2

a. Can all quartiles and deciles be expressed as percentiles? Explain.

b. The following data gives the numbers of weeks needed to find a job for 25 older

workers that lost their jobs as a result of corporation downsizing.

13 13 17 7 22 22 26 17 13 14

16 7 6 18 20 10 17 11 10 15 16 8 16 21 11

Find the Range and Coefficient of Dispersion.

x 1 2 3 4

f 2 3 4 1

Solution:

a. Yes, all quartiles and deciles can be expressed as percentiles as one quartile is the 25% of the area, so we ca express a quartile as 25% of the percentiles while on

the other hand a deciles is the 10% of the area and one deciles contains 10 percentiles.

b. Ordered array

6 7 7 8 10 10 11 11 13 13

13 14 15 16 16

16 16 17 17 17 18 20 22 22 26

AS, Range=Xm-X0 where m denotes the max. Observation and 0 denotes the min. observation

=26-6=20

Coefficient of Dispersion=0

0

XX

XX

m

m

=626

626

=

32

20=0.625

Question.3

Find Median and Mode from the following distribution.

Solution:

Daily Wages(Rs.)

25-30

30-35

35-40

40-45 45-50 50-55 55-60

f 1 2 9 10 11 12 5

Daily

wages (Rs.)

Frequency Commutative

frequency

25-30 1 1

30-35 2 3

35-40 9 12

MEDIAN:

Here n/2=50/2=25

So,

Median= )2

( Fn

f

hl

=45+ )2225(11

5

=45 + 11

5(3) = 45 + 1.364=46.364

MODE:

MODE= 1

1 1

( )

( ) ( )

m m

m m m m

f fl h

f f f f

=(12 11)

50 5(12 11) (12 5)

=50+8

)5(1=50+0.625=50.625

40-45 10 22

45-50 11 33

50-55 12 45

55-60 5 50

Total 50


a) Why we use correlation Analysis Technique?

b) A computer while computing the correlation coefficient between two variables x and y from 25 pairs of observations, obtained the following results:

2 225, 125, 650 100, 460, 508n x x and y y xy

It was, however discovered at the time of checking that he had copied down two pairs

of observations as: x y x y

10 9 instead of 15 7 8 6 11 8

Obtain the correct value of correlation coefficient between x and y.

Solution:

a. We usually use the statistical methods to analysis the data involving only one variable. Often an analysis of data concerning two or more variables is needed

to look for any statistical relationship or association between them. The knowledge of such a relationship is important to make inferences from the covariation between variables in a given situation. This is done by Appling the

correlation analysis technique.

b. The corrected values for termed needed in the formula of Person’s coefficient are determined as follows:

2 2 2 22

22 2 22

125 10 8 15 11 133

100 9 6 7 8 100

650 10 8 15 11 832

460 9 6 7 8 456

508 10 9 8 6 15 7 11 8 563

Corrected x

Corrected y

Corrected x

Corrected y

Corrected xy

Appling formula

22 2 2

( )

( )

n xy x yr

n x x n y y

2 2

25 563 133 100

25 832 133 25 456 100r

775 7750.371

2086.963111 1400r

Thus the corrected value of correlation coefficient between x and y is 0.371.

Question 2

Differentiate between permutation and combination.

Solution:

A permutation is any ordered subset from a set of n distinct object. The number of

permutations of r objects, selected in a defined order from n distinct objects is defined

by the symbol n

rP

!

n-r !

n

r

nP

While combination is any sunset of r object, selected without regard to their order,

from a set of n distinct object. It is denoted by n

rC .

!

n-r !r!

n

r

nC

In permutation order is important while in combination order is not important.

Question 2

The data on the profit (in Rs lakh) earned by 60 companies is follows:

Profits Below 10 10-20 20-30 30-40 40-50 50 and over

No. of companies 5 12 20 16 5 2

a) Obtain the limits of profit of the central 50 percent companies.

b) Calculate Bowley’s coefficient of Skewness.

Solution:

(a) Calculations for 50% central limits are given below

Profits Rs in lakh)

Frequency (f)

Cumulative freq.

(cf)

Below 10 10-20 20-30

30-40 40-50

50 and above

5 12 20

16 5

2

5 17 37

53 58

60

(I)

1 ( / 4) (60 / 4) . 10 20.Q sizeof n thobservation thobservation It lies inclass

1

1

1

4

1010 15 5

12

10 8.33 18.33

h nQ l C

f

Q

Q lakh

3 (3 / 4) 45 . 30 40.Q sizeof n thobservation thobservation It lies inclass

3

3

3

3

4

1030 45 37

16

30 5 35

h nQ l C

f

Q

Q lakh

Hence the profit of central 50 percent companies lies between Rs 18.33 lakh and Rs 35 lakh. (II)

Median= size of n/2 th observation which lies in the class 20-30

2

1020 30 17

20

20 6.5 26.5

h nMedian l C

f

Median

Median lakh

As Bowley’s coefficient of Skewness is:

1 3

3 1

2

18.33 35 2 26.5 0.330.020

35 18.33 16.67

k

k

Q Q MedianS

Q Q

S

The positive value of kbS indicate that the distribution is positively skewed

and therefore there is a concentration of large values on the right side of

the distribution.


a. In which situation, we use Permutation and Combination.

b. An MBA applies for a job in two firms X and Y. The probability of being

selected in firm X is 0.7 and being rejected at Y is 0.5. The probability

of at least one of his applications being rejected is 0.6. What is the probability that he will be selected by one of the firm?

c. Husband and wife appear in an interview for two vacancies in the

same post. The probability of husband‟s selection is 1

7 and that of wife

is

1

5 . What is the probability that:

1. Both of them will selected 2. Only one of them is selected

3. None of them is selected

Solution: a. Permutations: When order matters and an object can be chosen more than once then

the number of permutations is

Where n is the number of objects from which you can choose and r is the number to be

chosen. For example, if you have the letters A, B, C, and D and you wish to discover the number of ways to arrange them in three letter patterns matters (e.g., A-B is different

from B-A, both are included as possibilities)

Combinations: When the order does not matter and each object can be chosen only once, the number of combinations is the binomial coefficient binomial coefficient

Where n is the number of objects from which you can choose and r is the number to be chosen.

For example, if you have ten numbers and wish to choose 5 you would have

10!/(5!(10−5)!) = 252 ways to choose.

.

http://www.answers.com/topic/binomial-coefficient

b. Let A and B denote the event that an MBA will selected in firm X and will be rejected in firm Y respectively.

0.7, 1 0.7 0.3

0.5, 1 0.5 0.5, 0.6

P A P A

P B P B P A B

The probability that he will be selected by one of the firms is given by

P A B P A P B P A B

------------ (1)

1 1 0.6 0.4P A B P A B

Putting values in equation (1) we get.

0.7 0.5 0.4 0.8P A B

c. Let A and B be the events if the husband’s and wife’s selection, respectively.

Given that P (A) = 1

7 and P (B) =

1

5

1. the probability that both of them will selected is:

1 1 1

( ) 0.0297 5 35

P Aand B P A P B

2. The probability that only one of them will be selected is:

[1 ] [1 ]

1 1 1 1 1 4 1 6 10(1 ) (1 ) 0.286

7 5 7 5 7 5 5 7 35

P Aand B or P Aand B P Aand B P Aand B

P A P B P A P B

P A P B P A P B

3. The probability that none of them will be selected is:

6 4 24

0.6867 5 35

P B P A

Question 2

a. The personnel department of a company has records which show the following analysis of its 200 engineers.

Age Bachelor Degree

Master Degree

Total

Under 30 90 10 100

30 to 40 20 30 50

Over 40 40 10 20

Total 150 50 200

If one engineer is selected at random from the company, find

1. The probability that he has only a bachelor‟s degree. 2. The probability that he has a master‟s degree, given that he is over 40.

3. The probability that he is under 30, given that he has only a bachelor‟s degree.

b. Suppose an item is manufacture by three machines X, Y and Z. All the three

machines have equal capacity and are operated at the same rate. It is known that the percentages of defective items produced by X, Y and Z are 1, 7, and 12

percent respectively. All the items produced by X, Y and Z are put into one bin. From this bin, one item is drawn at random and is found to be defective. What is probability that this item was produced on Y? Solution:

a. Let A,B, C and D denote the events that an engineer in under 30 years of age, 40 years

of age, has a bachelor’s degree only and has a master’s degree, respectively.

1. The probability that he has only a bachelor’s degree is:

150

0.75200

P C

2. The probability that he has a master’s degree, given that he is over 40 is:

10

10200 0.2050 50

200

P D BP D B

P B

3. The probability that he is under 30, given that he has only a bachelor’s is:

90

90200 0.60150 150

200

P A CP A C

P C

b. Let A be the defective item. We know that the prior probability of defective item

produced on X, Y and Z, that is,

P(X) =1

3 ; P(Y) =

1

3and P (Z) =

1

3 (each have

1

3 chances of being selection).

We also know that, 0.02, 0.07, 0.12P A X P A Y P A Z

Now having known that the items drawn are defective, we want to know the

probability that it was produced by Y. That is

.

. . .

P A Y P YP A Y

P X P A X P Y P A Y P Z P A Z

10.07 .

30.35

1 1 10.02 0.07 0.12

3 3 3

Hence the probability that the defective item was produced on Y is 0.35.


a) Illustrate the necessary conditions for probability distributions.

b) Given the discrete probability distribution

4 4( ) (1/ 2) (1/ 2)x x

xP X x c

Find the complete probability distribution for x=0, 1, 2, 3, 4

Solution:

a)

There must be a finite probability (0 to 1) against every possible value of the random variable , i-e 0≤P(xi)≤1

The sum of all the probabilities must be equal to 1 i-e ∑P(xi)=1 b) Using the given probability distribution we can find the remaining values

x P(x)= xx

xC 4)2/1()2/1(4

0 040

0 )2/1()2/1(4 C =1/16

1 4/16

2 6/16

3 4/16

4 1/16

∑ 1

Question 2

A continuous random variable X that can assume values between x=2 and x=4 has a density function given by

8

1)(

xxf

a) Show that P (2<x<4) =1 b) Find P(X<3.5)

Solution:

a)

1)412(8

1

)22/442/16(8

1

28

1

)1(8

1

8

1

4

2

2

4

2

4

2

xx

dxx

dxx

b)

70.0

)22

45.3

2

5.3(

8

1

28

1

8

1

2

5.3

2

2

5.3

2

xx

dxx


Let 1 2 3, ,X X X be a random sample of size 3 from a population with mean and

variance 2 .Consider the following two estimators of the mean

Which estimator should be prepared? (More efficient) Solution:

First we examine the property of unbiasedness. 1T is sample mean X , which

we know is unbiased.

1 2 31

1

1

1

( )3

( )3

3( )

3

( )

X X XE T E

E T

E T

E T

1 2 32

2

2

2( )

4

1 4( ) 2

4 4

( )

X X XE T E

E T

E T

1 2 31

1 1 2 3

2 2 2

1

2 2 2

1

2

1

( )3

1( ) var( ) ( ) ( )

9

1( )

9

3( )

9

1( )

3

X X XVar T E

Var T X Var X Var X

Var T

Var T

Var T

1 2 32

2 1 2 3

2 2 2

2

2

2

2

2

2( )

4

1( ) var( ) 4 ( ) ( )

16

1( ) 4

16

6( )

16

3( )

8

X X XVar T E

Var T X Var X Var X

Var T

Var T

Var T

1 2

1 3

3 8

( ) ( )Var T Var T

Hence 1T is a better estimator of 2thenT

Question 2

a) Why we call the standard deviation of the sample statistic as standard error of the statistic?

b) Draw all possible samples of two letters each without replacement from

the letters of the word “Management”. Find proportion of latter “M” in each

sample. Also construct the sampling distribution of „M‟.

Solution: a) The deviation of sample statistic T from its parameter is considered an

error. Hence the standard deviation of a sample statistic is called as standard error of the statistic.

b) Number of possible samples (without replacement) = 10

2 45N

nC C

No. Samples ^

p No. Samples ^

p

1 M,A 1/2 23 N,N 0/2

2 M,N 1/2 24 N,T 0/2

3 M,A 1/2 25 A,G 0/2

4 M,G 1/2 26 A,E 0/2

5 M,E 1/2 27 A,M 1/2

6 M,M 2/2 28 A,E 0/2

7 M,E 1/2 29 A,N 0/2

8 M,N 1/2 30 A,T 0/2

9 M,T 1/2 31 G,E 0/2

10 A,N 0/2 32 G,M 1/2

11 A,A 0/2 33 G,E 0/2

12 A,G 0/2 34 G,N 0/2

13 A,E 0/2 35 G,T 0/2

14 A,M 1/2 36 E,M 1/2

15 A,E 0/2 37 E,E 0/2

16 A,N 0/2 38 E,N 0/2

17 A,T 0/2 39 E,T 0/2

18 N,A 0/2 40 M,E 1/2

19 N,G 0/2 41 M,N 1/2

20 N,E 0/2 42 M,T 1/2

21 N,M 1/2 43 E,N 0/2

22 N,E 0/2 44 E,T 0/2

45 N,T 0/2 The sampling distribution of P:

P f f(

^

p )

0 28 28/45

1/2 16 16/45

2/2 1 1/45

45 45/45=1

Question 3

a) Ten vegetables cans, all of the same size, have lost their labels. It is

known that 5 contain tomatoes and 5 contain corns. If 5 are selected at random, what is the probability that all contain tomatoes?

What is the probability that 3 or more contain tomatoes?

b) For a machine making parts, there is a small probability of 0.001 for a part

to be defective. The parts are supplied in bundles of 10. Calculate approximately the number of bundles containing no defective, one defective or two defectives in a consignment of 10,000 bundles, given

that 0.01 0.9900e .

Solution:

a) Let X denotes the number of tomatoes cans

Tomatoes cans Total cans Corn cans Selected cans

K=5 N=10 N-K=5 n=5

The probability distribution of X is:

P(X=x) =.k N k

x n x

N

n

C C

C

All contain tomatoes:

P(X=5) =5 5

5 0

10

5

1 1 1

252 252

C C

C

3 or more contain tomatoes ( 3) ( 3) ( 4) ( 5)P X p X p X P X

5 5 5 55 5

3 2 5 04 1

10 10 10

5 5 5

100 25 1 126

252 252 252 252

C C C CC C

C C C

b) Let P= 0.001 and n=10 Since p is extremely small, we apply the Poisson approximation, using

=np = 100.001=0.01

So, 0.01(0.01)

. ( ; ) 10,000!

x eN P x

x

For x=0 0 0.01(0.01)

. (0; ) 10,0000!

eN P

0.0110,000 e =9900.5

For x=1 1 0.01

0.01

(0.01). (1; ) 10,000

1!

10,000 (0.01)

eN P

e

=99.00 For x=2

2 0.01

20.01

(0.01). (2; ) 10,000

2!

(0.01)10,000

2!

eN P

e

=0.495 .

( 1) ( 1) ( 2)P X P X P X

( 1)P X 0.0099+0.00004995

= 0.01003995

. ( 1)N P X 10000(0.01003995)

= 100.3995


a) Define Non-Sampling Error and how it can be minimize?

b) A continuous manufacturing process produces items whose weights are normally distributed with a mean weight of 800 grams and a standard deviation of 300 grams. A random sample of 16 items is to be drawn from the process.

i. What is the probability that the arithmetic mean of the sample exceed 900 grams? Interpret the results.

ii. Find the values of the sample mean within which the middle 95 percent of

all sample means will fall. 1.95z

Solution:

a) The non-sampling errors are common in both the complete enumeration and in

sample surveys. These include biases and mistakes. The main source of these errors are definition of the population, defect in the method of interviewing,

duplication and substitution, inaccurate response by the respondents, faulty reporting of facts and non-response to the mail questionnaire. These errors can be controlled by giving precise definition of the population, making accurate

frame, improving the method of measurement, proper selection of questionnaire, adequate training of the investigator, cross judgment, following

up of non-response and correct manipulation of the collected information.

b (i) We are given the following information:

800 , 300 16g and n

Since population is normally distributed, the distribution of the sample mean is normal with mean and standard deviation equal to

800x

And 300 300

75416x n

The required probability is 900P x

; i.e.

900 800900

75

1.33

0.5 0.4082 0.0918

x

xP x P z

P z

Hence, 9.18 percent of all possible samples of size n=16 will have a sample mean value greater than 900 g.

b (ii) Since z=1.96 for the middle 95 percent area under the normal curve, therefore using

the formula for z to solve for the value of x

in term of the known values are as follow:

1

1 800 1.96 75 653

x x

x z

x g

And

2

2 800 1.96 75 947

x x

x z

x g

Question 2

a) what should be the sampling size necessary to estimate the population mean at 95 percent confidence with a sampling error of 5 and the standard deviation equal to 20?

b) Suppose we want to estimate the proportion of families in a town which have two

or more children. A random sample of 144 families has been chosen and 48 have

two or more children. Setup a 95 percent confidence interval estimate of the

population proportion of families having two or more children.

Solution:

a) levelconfidenceatzehaveWe %9596.1,20,5 2/

456.61

5

2096.12

22

2

22

2/ e

znThus

So, the required sample size is 62.

b) The sample proportion is

48 1

144 3

xp

n

2

2

1sin inf , 144, , 1.96 95

3

1

U g the ormation n p z at percent confidencecoefficent

p p

wehave p zn

1 2( )( )

1 3 31.96 0.333 0.0773 144

0.256,0.410

Hence, the population proportion of families who have two or more children is likely to be between 0.256 and 0.410.


a) The masses, in grams, of thirteen ball bearings taken at random from a batch

are 21.4,23.1,25.9,24.7,23.4,24.5,25.0,22.5,26.9,26.4,25.8,23.2,21.9

Calculate 95% Confidence interval for the mean mass of the population, supposed normal, from which these masses were drawn(s=1.77) Solution: The 95% confidence interval for the mean mass of the population mean is given by:

( / 2, ) /vX t s n

X

21.4

23.1

25.9

24.7

23.4

24.5

25.0

22.5

26.9

26.4

25.8

23.2

21.9

1

n

i

i

X

=314.7

314.7

13

24.21

X

Xn

X

X

( / 2, ) (.025,12)

1.77

1 13 1 12

2.179v

s

v n

t t

Now, ( / 2, ) /vX t s n

1.7724.21 2.179

13

24.21 2.179(0.49)

24.21 1.07

23.14 25.28

23.14 25.28

or

to

b) Ten oil tins are taken at random from an automatic filling machine. The mean

weight of the tins is 15.8kg and the standard deviation is 0.50kg.Does the sample mean differ significantly from the intended weight of 16kg? Solution:

0H : µ = 16

1H : µ 16

Level of significance:

α = 5 % = 0.05

Critical Region:

( / 2, )

( / 2, )

2.262

2.262

v

v

t t

t t

Test Statistic:

/

Xt

s n

Calculation:

15.8 16

0.5 / 10

0.21.26

0.158

t

t

Conclusion:

Since, t (cal) =-1.26 does not fall in critical region, so we do not reject

0H .Hence µ = 16

Question 2

a) A random sample of 20 students obtained a mean of 72 and a variance of 16

on a college placement test in mathematics. Assuming the scores to be normally

distributed, construct a 95 % confidence interval of σ2. b) The nicotine content of a certain brand of cigarettes is known to be normally

distributed with a variance of 1.3 milligrams. Test the hypothesis that σ2=1.3 against the alternative that σ2 1.3, if a random sample of 8 of these cigarettes has a standard s=1.8. Use α =0.05 level of significance.

Solution: a)

Given that 16,72 2 sx , n=20

And we know that

v

snsn

v ,

)1()1(

21

2

22

,2

2

2

2

2 2

0.05 19 0.051 ,19

2 2

(20 1)16 (20 1)16

,

19,025.012

2

19,025.02

16)19(16)19(

2

2 2

0.025,19 0.975,19

304 304

Now by using the area table of χ2, we get

852.32,19025.02 And 907.8,19975.0

2

And by substituting the values, we get

907.8

304

852.32

304 2

130459.34253.9 2 Sol:

b)

Hypothesis: H0: σ2=1.3

H1: σ2 1.3 Level of significance:

α =0.05

Test Statistics:

2

22 )1(

sn

Critical Region:

690.1,

013.16,

7,975.02

21

22

7025.02

,2

22

v

v

Computations:

3.1

)8.1(7 22 =17.44

Conclusion Since our calculated value of χ2 is greater than the table value, so we reject the null hypothesis at 5% level of significance.

Documents

Statistics and Probability - Solved Assignments - Semester Fall 2007