Upload
muhammad-umair
View
1.954
Download
5
Embed Size (px)
DESCRIPTION
Statistics and Probability - Solved Assignments - Semester Fall 2007
Citation preview
Assignment 1 Question 1
A district is divided into two areas, viz urban area and rural area. Total population of the
district is 271,076 out if which only 46,740 live in the urban area. Total male population of the area is 139,699 and that of the urban area is 23,083.Total unmarried population of the district is n112, 352 out of which 36,864 are rural females. In the urban area,
unmarried people number 21,072 out of which 12,149 are males.
Prepare a table showing the population of the district by material status, by
residence and by sex.
SOLUTION
Question 2
The following figures give the numbers of children born to 50 women in a certain locality
up to the age of 40 years; 1, 5, 1, 1, 2, 5, 9, 2, 6, 3, 5, 7, 8, 4, 6, 8,
9, 10, 9, 3, 5, 7, 9, 9, 4, 5, 4, 5, 5, 7, 3, 4, 2, 3, 4, 6, 3, 4, 2, 5, 6, 4, 0, 5, 6, 8, 5, 4, 7, 6
a) Make a frequency distribution taking class interval size 1.
b) Find the cumulative frequency distribution. c) Find relative frequency distribution also.
SOLUTION
a) FREQUENCY DISTRIBUTION
RURAL URBAN Total
Male female Male Female
Married 62200 70856 10934 14734
Unmarried 54416 36864 12149 8923
116616 107720 23083 23657
Total 224336 46740 271076
No of children
tally frequency
0 1
1 3
2 4
3 5
4 8
5 10
6 6
7 4
8 3
9 5
10 1
Total 50
b) COMMULATIVE FREQUENCY DISTRIBUTION
No of children
frequency Cumulative frequency
0 1 1
1 3 4
2 4 8
3 5 13
4 8 21
5 10 31
6 6 37
7 4 41
8 3 44
9 5 49
10 1 50
Total 50
c) RELATIVE FREQUENCY DISTRIBUTION
No of children.
frequency Relative frequency.
0 1 1/50
1 3 3/50
2 4 4/50
3 5 5/50
4 8 8/50
5 10 10/50
6 6 6/50
7 4 4/50
8 3 3/50
9 5 5/50
10 1 1/50
Total 50 1
Question 3
a) In which situations weighted mean and arithmetic mean are used.
b) From the following data, find the weighted mean.
SOLUTION
a) Sometimes we want to find the average of certain values which are not of equal
importance. When the values are not of equal importance, we assign them certain numerical values to express their relative importance. These numerical values are known
as “weights”. When the observations are associated with certain weights then we use the weighted average. While the arithmetic mean is used simply when we want average. Weighted mean and arithmetic mean will be equal when weights and the frequencies are
equal.
b)
Items Expenditure(Rs.)(X)
Weights (W)
WX
Food 3,000 20 60,000
House rent 600 8 4800
Clothing 200 5 1000
Fuel & Electricity 150 4 600
Education 100 2 200
Miscellaneous 50 1 50
Total ∑w=40 ∑wx=666
50
Items Food House
rent
Clothin
g
Fuel &
electricity
Educatio
n
Miscellaneous
Expenditure Rs.
3000 600 200 150 100 50
Weights 20 8 5 4 2 1
Weighted Mean=wX =
W
WX
=40
66650=1666.25
Assignment 2 Question.1
a. Explain why it is difficult to calculate Mode and G.M for the following values, 4, 5, -10, 0, 20.
b. Calculate the Harmonic Mean for the following data.
Solution:
a. AS mode is that value which occurs maximum number of time in the data and in this case each value appears only one time, so mode can not be exist. By definition geometric mean of a set of positive values is the nth root of the
product of the values and in this case one value is with the –ve sign also when all the values will be multiplied they will be zero, square root of zero does not
exist. b.
X f 1/x f(1/x)
1 2 1 2
2 3 0.5 1.5
3 4 0.33 1.32
4 1 0.25 0.25
Total 10 5.07
Now as we know that:
H.M= 1/
f
f x
=10
5.07= 1.972
Question.2
a. Can all quartiles and deciles be expressed as percentiles? Explain.
b. The following data gives the numbers of weeks needed to find a job for 25 older
workers that lost their jobs as a result of corporation downsizing.
13 13 17 7 22 22 26 17 13 14
16 7 6 18 20 10 17 11 10 15 16 8 16 21 11
Find the Range and Coefficient of Dispersion.
x 1 2 3 4
f 2 3 4 1
Solution:
a. Yes, all quartiles and deciles can be expressed as percentiles as one quartile is the 25% of the area, so we ca express a quartile as 25% of the percentiles while on
the other hand a deciles is the 10% of the area and one deciles contains 10 percentiles.
b. Ordered array
6 7 7 8 10 10 11 11 13 13
13 14 15 16 16
16 16 17 17 17 18 20 22 22 26
AS, Range=Xm-X0 where m denotes the max. Observation and 0 denotes the min. observation
=26-6=20
Coefficient of Dispersion=0
0
XX
XX
m
m
=626
626
=
32
20=0.625
Question.3
Find Median and Mode from the following distribution.
Solution:
Daily Wages(Rs.)
25-30
30-35
35-40
40-45 45-50 50-55 55-60
f 1 2 9 10 11 12 5
Daily
wages (Rs.)
Frequency Commutative
frequency
25-30 1 1
30-35 2 3
35-40 9 12
MEDIAN:
Here n/2=50/2=25
So,
Median= )2
( Fn
f
hl
=45+ )2225(11
5
=45 + 11
5(3) = 45 + 1.364=46.364
MODE:
MODE= 1
1 1
( )
( ) ( )
m m
m m m m
f fl h
f f f f
=(12 11)
50 5(12 11) (12 5)
=50+8
)5(1=50+0.625=50.625
40-45 10 22
45-50 11 33
50-55 12 45
55-60 5 50
Total 50
Assignment 3 Question 1
a) Why we use correlation Analysis Technique?
b) A computer while computing the correlation coefficient between two variables x and y from 25 pairs of observations, obtained the following results:
2 225, 125, 650 100, 460, 508n x x and y y xy
It was, however discovered at the time of checking that he had copied down two pairs
of observations as: x y x y
10 9 instead of 15 7 8 6 11 8
Obtain the correct value of correlation coefficient between x and y.
Solution:
a. We usually use the statistical methods to analysis the data involving only one variable. Often an analysis of data concerning two or more variables is needed
to look for any statistical relationship or association between them. The knowledge of such a relationship is important to make inferences from the covariation between variables in a given situation. This is done by Appling the
correlation analysis technique.
b. The corrected values for termed needed in the formula of Person’s coefficient are determined as follows:
2 2 2 22
22 2 22
125 10 8 15 11 133
100 9 6 7 8 100
650 10 8 15 11 832
460 9 6 7 8 456
508 10 9 8 6 15 7 11 8 563
Corrected x
Corrected y
Corrected x
Corrected y
Corrected xy
Appling formula
22 2 2
( )
( )
n xy x yr
n x x n y y
2 2
25 563 133 100
25 832 133 25 456 100r
775 7750.371
2086.963111 1400r
Thus the corrected value of correlation coefficient between x and y is 0.371.
Question 2
Differentiate between permutation and combination.
Solution:
A permutation is any ordered subset from a set of n distinct object. The number of
permutations of r objects, selected in a defined order from n distinct objects is defined
by the symbol n
rP
!
n-r !
n
r
nP
While combination is any sunset of r object, selected without regard to their order,
from a set of n distinct object. It is denoted by n
rC .
!
n-r !r!
n
r
nC
In permutation order is important while in combination order is not important.
Question 2
The data on the profit (in Rs lakh) earned by 60 companies is follows:
Profits Below 10 10-20 20-30 30-40 40-50 50 and over
No. of companies 5 12 20 16 5 2
a) Obtain the limits of profit of the central 50 percent companies.
b) Calculate Bowley’s coefficient of Skewness.
Solution:
(a) Calculations for 50% central limits are given below
Profits Rs in lakh)
Frequency (f)
Cumulative freq.
(cf)
Below 10 10-20 20-30
30-40 40-50
50 and above
5 12 20
16 5
2
5 17 37
53 58
60
(I)
1 ( / 4) (60 / 4) . 10 20.Q sizeof n thobservation thobservation It lies inclass
1
1
1
4
1010 15 5
12
10 8.33 18.33
h nQ l C
f
Q
Q lakh
3 (3 / 4) 45 . 30 40.Q sizeof n thobservation thobservation It lies inclass
3
3
3
3
4
1030 45 37
16
30 5 35
h nQ l C
f
Q
Q lakh
Hence the profit of central 50 percent companies lies between Rs 18.33 lakh and Rs 35 lakh. (II)
Median= size of n/2 th observation which lies in the class 20-30
2
1020 30 17
20
20 6.5 26.5
h nMedian l C
f
Median
Median lakh
As Bowley’s coefficient of Skewness is:
1 3
3 1
2
18.33 35 2 26.5 0.330.020
35 18.33 16.67
k
k
Q Q MedianS
Q Q
S
The positive value of kbS indicate that the distribution is positively skewed
and therefore there is a concentration of large values on the right side of
the distribution.
Assignment 4 Question 1
a. In which situation, we use Permutation and Combination.
b. An MBA applies for a job in two firms X and Y. The probability of being
selected in firm X is 0.7 and being rejected at Y is 0.5. The probability
of at least one of his applications being rejected is 0.6. What is the probability that he will be selected by one of the firm?
c. Husband and wife appear in an interview for two vacancies in the
same post. The probability of husband‟s selection is 1
7 and that of wife
is
1
5 . What is the probability that:
1. Both of them will selected 2. Only one of them is selected
3. None of them is selected
Solution: a. Permutations: When order matters and an object can be chosen more than once then
the number of permutations is
Where n is the number of objects from which you can choose and r is the number to be
chosen. For example, if you have the letters A, B, C, and D and you wish to discover the number of ways to arrange them in three letter patterns matters (e.g., A-B is different
from B-A, both are included as possibilities)
Combinations: When the order does not matter and each object can be chosen only once, the number of combinations is the binomial coefficient binomial coefficient
Where n is the number of objects from which you can choose and r is the number to be chosen.
For example, if you have ten numbers and wish to choose 5 you would have
10!/(5!(10−5)!) = 252 ways to choose.
.
b. Let A and B denote the event that an MBA will selected in firm X and will be rejected in firm Y respectively.
0.7, 1 0.7 0.3
0.5, 1 0.5 0.5, 0.6
P A P A
P B P B P A B
The probability that he will be selected by one of the firms is given by
P A B P A P B P A B
------------ (1)
1 1 0.6 0.4P A B P A B
Putting values in equation (1) we get.
0.7 0.5 0.4 0.8P A B
c. Let A and B be the events if the husband’s and wife’s selection, respectively.
Given that P (A) = 1
7 and P (B) =
1
5
1. the probability that both of them will selected is:
1 1 1
( ) 0.0297 5 35
P Aand B P A P B
2. The probability that only one of them will be selected is:
[1 ] [1 ]
1 1 1 1 1 4 1 6 10(1 ) (1 ) 0.286
7 5 7 5 7 5 5 7 35
P Aand B or P Aand B P Aand B P Aand B
P A P B P A P B
P A P B P A P B
3. The probability that none of them will be selected is:
6 4 24
0.6867 5 35
P B P A
Question 2
a. The personnel department of a company has records which show the following analysis of its 200 engineers.
Age Bachelor Degree
Master Degree
Total
Under 30 90 10 100
30 to 40 20 30 50
Over 40 40 10 20
Total 150 50 200
If one engineer is selected at random from the company, find
1. The probability that he has only a bachelor‟s degree. 2. The probability that he has a master‟s degree, given that he is over 40.
3. The probability that he is under 30, given that he has only a bachelor‟s degree.
b. Suppose an item is manufacture by three machines X, Y and Z. All the three
machines have equal capacity and are operated at the same rate. It is known that the percentages of defective items produced by X, Y and Z are 1, 7, and 12
percent respectively. All the items produced by X, Y and Z are put into one bin. From this bin, one item is drawn at random and is found to be defective. What is probability that this item was produced on Y? Solution:
a. Let A,B, C and D denote the events that an engineer in under 30 years of age, 40 years
of age, has a bachelor’s degree only and has a master’s degree, respectively.
1. The probability that he has only a bachelor’s degree is:
150
0.75200
P C
2. The probability that he has a master’s degree, given that he is over 40 is:
10
10200 0.2050 50
200
P D BP D B
P B
3. The probability that he is under 30, given that he has only a bachelor’s is:
90
90200 0.60150 150
200
P A CP A C
P C
b. Let A be the defective item. We know that the prior probability of defective item
produced on X, Y and Z, that is,
P(X) =1
3 ; P(Y) =
1
3and P (Z) =
1
3 (each have
1
3 chances of being selection).
We also know that, 0.02, 0.07, 0.12P A X P A Y P A Z
Now having known that the items drawn are defective, we want to know the
probability that it was produced by Y. That is
.
. . .
P A Y P YP A Y
P X P A X P Y P A Y P Z P A Z
10.07 .
30.35
1 1 10.02 0.07 0.12
3 3 3
Hence the probability that the defective item was produced on Y is 0.35.
Assignment 5 Question 1
a) Illustrate the necessary conditions for probability distributions.
b) Given the discrete probability distribution
4 4( ) (1/ 2) (1/ 2)x x
xP X x c
Find the complete probability distribution for x=0, 1, 2, 3, 4
Solution:
a)
There must be a finite probability (0 to 1) against every possible value of the random variable , i-e 0≤P(xi)≤1
The sum of all the probabilities must be equal to 1 i-e ∑P(xi)=1 b) Using the given probability distribution we can find the remaining values
x P(x)= xx
xC 4)2/1()2/1(4
0 040
0 )2/1()2/1(4 C =1/16
1 4/16
2 6/16
3 4/16
4 1/16
∑ 1
Question 2
A continuous random variable X that can assume values between x=2 and x=4 has a density function given by
8
1)(
xxf
a) Show that P (2<x<4) =1 b) Find P(X<3.5)
Solution:
a)
1)412(8
1
)22/442/16(8
1
28
1
)1(8
1
8
1
4
2
2
4
2
4
2
xx
dxx
dxx
b)
70.0
)22
45.3
2
5.3(
8
1
28
1
8
1
2
5.3
2
2
5.3
2
xx
dxx
Assignment 6 Question 1
Let 1 2 3, ,X X X be a random sample of size 3 from a population with mean and
variance 2 .Consider the following two estimators of the mean
Which estimator should be prepared? (More efficient) Solution:
First we examine the property of unbiasedness. 1T is sample mean X , which
we know is unbiased.
1 2 31
1
1
1
( )3
( )3
3( )
3
( )
X X XE T E
E T
E T
E T
1 2 32
2
2
2( )
4
1 4( ) 2
4 4
( )
X X XE T E
E T
E T
1 2 31
1 1 2 3
2 2 2
1
2 2 2
1
2
1
( )3
1( ) var( ) ( ) ( )
9
1( )
9
3( )
9
1( )
3
X X XVar T E
Var T X Var X Var X
Var T
Var T
Var T
1 2 32
2 1 2 3
2 2 2
2
2
2
2
2
2( )
4
1( ) var( ) 4 ( ) ( )
16
1( ) 4
16
6( )
16
3( )
8
X X XVar T E
Var T X Var X Var X
Var T
Var T
Var T
1 2
1 3
3 8
( ) ( )Var T Var T
Hence 1T is a better estimator of 2thenT
Question 2
a) Why we call the standard deviation of the sample statistic as standard error of the statistic?
b) Draw all possible samples of two letters each without replacement from
the letters of the word “Management”. Find proportion of latter “M” in each
sample. Also construct the sampling distribution of „M‟.
Solution: a) The deviation of sample statistic T from its parameter is considered an
error. Hence the standard deviation of a sample statistic is called as standard error of the statistic.
b) Number of possible samples (without replacement) = 10
2 45N
nC C
No. Samples ^
p No. Samples ^
p
1 M,A 1/2 23 N,N 0/2
2 M,N 1/2 24 N,T 0/2
3 M,A 1/2 25 A,G 0/2
4 M,G 1/2 26 A,E 0/2
5 M,E 1/2 27 A,M 1/2
6 M,M 2/2 28 A,E 0/2
7 M,E 1/2 29 A,N 0/2
8 M,N 1/2 30 A,T 0/2
9 M,T 1/2 31 G,E 0/2
10 A,N 0/2 32 G,M 1/2
11 A,A 0/2 33 G,E 0/2
12 A,G 0/2 34 G,N 0/2
13 A,E 0/2 35 G,T 0/2
14 A,M 1/2 36 E,M 1/2
15 A,E 0/2 37 E,E 0/2
16 A,N 0/2 38 E,N 0/2
17 A,T 0/2 39 E,T 0/2
18 N,A 0/2 40 M,E 1/2
19 N,G 0/2 41 M,N 1/2
20 N,E 0/2 42 M,T 1/2
21 N,M 1/2 43 E,N 0/2
22 N,E 0/2 44 E,T 0/2
45 N,T 0/2 The sampling distribution of P:
P f f(
^
p )
0 28 28/45
1/2 16 16/45
2/2 1 1/45
45 45/45=1
Question 3
a) Ten vegetables cans, all of the same size, have lost their labels. It is
known that 5 contain tomatoes and 5 contain corns. If 5 are selected at random, what is the probability that all contain tomatoes?
What is the probability that 3 or more contain tomatoes?
b) For a machine making parts, there is a small probability of 0.001 for a part
to be defective. The parts are supplied in bundles of 10. Calculate approximately the number of bundles containing no defective, one defective or two defectives in a consignment of 10,000 bundles, given
that 0.01 0.9900e .
Solution:
a) Let X denotes the number of tomatoes cans
Tomatoes cans Total cans Corn cans Selected cans
K=5 N=10 N-K=5 n=5
The probability distribution of X is:
P(X=x) =.k N k
x n x
N
n
C C
C
All contain tomatoes:
P(X=5) =5 5
5 0
10
5
1 1 1
252 252
C C
C
3 or more contain tomatoes ( 3) ( 3) ( 4) ( 5)P X p X p X P X
5 5 5 55 5
3 2 5 04 1
10 10 10
5 5 5
100 25 1 126
252 252 252 252
C C C CC C
C C C
b) Let P= 0.001 and n=10 Since p is extremely small, we apply the Poisson approximation, using
=np = 100.001=0.01
So, 0.01(0.01)
. ( ; ) 10,000!
x eN P x
x
For x=0 0 0.01(0.01)
. (0; ) 10,0000!
eN P
0.0110,000 e =9900.5
For x=1 1 0.01
0.01
(0.01). (1; ) 10,000
1!
10,000 (0.01)
eN P
e
=99.00 For x=2
2 0.01
20.01
(0.01). (2; ) 10,000
2!
(0.01)10,000
2!
eN P
e
=0.495 .
( 1) ( 1) ( 2)P X P X P X
( 1)P X 0.0099+0.00004995
= 0.01003995
. ( 1)N P X 10000(0.01003995)
= 100.3995
Assignment 7 Question 1
a) Define Non-Sampling Error and how it can be minimize?
b) A continuous manufacturing process produces items whose weights are normally distributed with a mean weight of 800 grams and a standard deviation of 300 grams. A random sample of 16 items is to be drawn from the process.
i. What is the probability that the arithmetic mean of the sample exceed 900 grams? Interpret the results.
ii. Find the values of the sample mean within which the middle 95 percent of
all sample means will fall. 1.95z
Solution:
a) The non-sampling errors are common in both the complete enumeration and in
sample surveys. These include biases and mistakes. The main source of these errors are definition of the population, defect in the method of interviewing,
duplication and substitution, inaccurate response by the respondents, faulty reporting of facts and non-response to the mail questionnaire. These errors can be controlled by giving precise definition of the population, making accurate
frame, improving the method of measurement, proper selection of questionnaire, adequate training of the investigator, cross judgment, following
up of non-response and correct manipulation of the collected information.
b (i) We are given the following information:
800 , 300 16g and n
Since population is normally distributed, the distribution of the sample mean is normal with mean and standard deviation equal to
800x
And 300 300
75416x n
The required probability is 900P x
; i.e.
900 800900
75
1.33
0.5 0.4082 0.0918
x
xP x P z
P z
Hence, 9.18 percent of all possible samples of size n=16 will have a sample mean value greater than 900 g.
b (ii) Since z=1.96 for the middle 95 percent area under the normal curve, therefore using
the formula for z to solve for the value of x
in term of the known values are as follow:
1
1 800 1.96 75 653
x x
x z
x g
And
2
2 800 1.96 75 947
x x
x z
x g
Question 2
a) what should be the sampling size necessary to estimate the population mean at 95 percent confidence with a sampling error of 5 and the standard deviation equal to 20?
b) Suppose we want to estimate the proportion of families in a town which have two
or more children. A random sample of 144 families has been chosen and 48 have
two or more children. Setup a 95 percent confidence interval estimate of the
population proportion of families having two or more children.
Solution:
a) levelconfidenceatzehaveWe %9596.1,20,5 2/
456.61
5
2096.12
22
2
22
2/ e
znThus
So, the required sample size is 62.
b) The sample proportion is
48 1
144 3
xp
n
2
2
1sin inf , 144, , 1.96 95
3
1
U g the ormation n p z at percent confidencecoefficent
p p
wehave p zn
1 2( )( )
1 3 31.96 0.333 0.0773 144
0.256,0.410
Hence, the population proportion of families who have two or more children is likely to be between 0.256 and 0.410.
Assignment 8 Question 1
a) The masses, in grams, of thirteen ball bearings taken at random from a batch
are 21.4,23.1,25.9,24.7,23.4,24.5,25.0,22.5,26.9,26.4,25.8,23.2,21.9
Calculate 95% Confidence interval for the mean mass of the population, supposed normal, from which these masses were drawn(s=1.77) Solution: The 95% confidence interval for the mean mass of the population mean is given by:
( / 2, ) /vX t s n
X
21.4
23.1
25.9
24.7
23.4
24.5
25.0
22.5
26.9
26.4
25.8
23.2
21.9
1
n
i
i
X
=314.7
314.7
13
24.21
X
Xn
X
X
( / 2, ) (.025,12)
1.77
1 13 1 12
2.179v
s
v n
t t
Now, ( / 2, ) /vX t s n
1.7724.21 2.179
13
24.21 2.179(0.49)
24.21 1.07
23.14 25.28
23.14 25.28
or
to
b) Ten oil tins are taken at random from an automatic filling machine. The mean
weight of the tins is 15.8kg and the standard deviation is 0.50kg.Does the sample mean differ significantly from the intended weight of 16kg? Solution:
0H : µ = 16
1H : µ 16
Level of significance:
α = 5 % = 0.05
Critical Region:
( / 2, )
( / 2, )
2.262
2.262
v
v
t t
t t
Test Statistic:
/
Xt
s n
Calculation:
15.8 16
0.5 / 10
0.21.26
0.158
t
t
Conclusion:
Since, t (cal) =-1.26 does not fall in critical region, so we do not reject
0H .Hence µ = 16
Question 2
a) A random sample of 20 students obtained a mean of 72 and a variance of 16
on a college placement test in mathematics. Assuming the scores to be normally
distributed, construct a 95 % confidence interval of σ2. b) The nicotine content of a certain brand of cigarettes is known to be normally
distributed with a variance of 1.3 milligrams. Test the hypothesis that σ2=1.3 against the alternative that σ2 1.3, if a random sample of 8 of these cigarettes has a standard s=1.8. Use α =0.05 level of significance.
Solution: a)
Given that 16,72 2 sx , n=20
And we know that
v
snsn
v ,
)1()1(
21
2
22
,2
2
2
2
2 2
0.05 19 0.051 ,19
2 2
(20 1)16 (20 1)16
,
19,025.012
2
19,025.02
16)19(16)19(
2
2 2
0.025,19 0.975,19
304 304
Now by using the area table of χ2, we get
852.32,19025.02 And 907.8,19975.0
2
And by substituting the values, we get
907.8
304
852.32
304 2
130459.34253.9 2 Sol:
b)
Hypothesis: H0: σ2=1.3
H1: σ2 1.3 Level of significance:
α =0.05
Test Statistics:
2
22 )1(
sn
Critical Region:
690.1,
013.16,
7,975.02
21
22
7025.02
,2
22
v
v
Computations:
3.1
)8.1(7 22 =17.44
Conclusion Since our calculated value of χ2 is greater than the table value, so we reject the null hypothesis at 5% level of significance.