Upload
muhammad-umair
View
417
Download
5
Embed Size (px)
DESCRIPTION
Statistics and Probability Solved Assignments - Semester Spring 2008
Citation preview
Assignment 1 Question 1
(a) Define the following terms population, sample, parameter, statistic and variable. Solution:
Population: Collection of all the possible observations regarding some problem that is under consideration.
Sample: A representative part of population is called sample. Parameter: Any numerical value computed from population is called parameter. Statistic: Any numerical value computed from sample is called statistic.
Variable: A characteristic that varies from individual to individual or object to object.
(b) Count the number of letters in each word of the following passage, and make a frequency distribution of word length.
“The Virtual University of Pakistan delivers education through a judicious combination
of broadcast television and the Internet. VU courses are written in meticulous detail by acknowledged experts in the field. Lectures are then recorded in a professional studio environment and after insertion of slides, movie clips and other material, become ready
for broadcast. Course lectures are broadcast over television and are also made available in the form of multimedia CDs. The multiple formats allows for a high degree of flexibility
for students who may view the lectures at a time of their choosing within a 24 hour period. Additionally, students can use the lectures to review an entire course before their examinations; a facility simply not available in the conventional face to face
environment.”
Solution:
length of Words Tally Bar Frequency
1 |||| | 6
2 |||| |||| |||| ||| 18
3 |||| |||| |||| |||| |||| 25
4 |||| |||| | 11
5 |||| ||| 8
6 |||| |||| |||| 14
7 |||| | 6
8 |||| |||| |||| 14
9 |||| ||| 8
10 |||| 5
11 |||| 4
12 |||| 5
Total 124
Question 2
Find the mean, median and mode from the following data
Class Interval Frequency
20-29 6
30-39 15
40-49 21
50-59 29
60-69 25
70-79 22
80-89 11
90-99 9
100-109 3
110-119 1
120-129 2
Solution:
The given data n required calculations are computed in the following table
Class Interval Class
Boundaries
Frequency
f
Mid-Point
x fx
Cumulative
Freq. cf
20-29 19.5-29.5 6 24.5 147 6
30-39 29.5-39.5 15 34.5 517.5 21
40-49 39.5-49.5 21 44.5 934.5 42
50-59 49.5-59.5 29 54.5 1580.5 71
60-69 59.5-69.5 25 64.5 1612.5 96
70-79 69.5-79.5 22 74.5 1639 118
80-89 79.5-89.5 11 84.5 929.5 129
90-99 89.5-99.5 9 94.5 850.5 138
100-109 99.5-109.5 3 104.5 313.5 141
110-119 109.5-119.5 1 114.5 114.5 142
120-129 119.5-129.5 2 124.5 249 144
144 8888
Modal Class
1
1 2
1 2
8838, 144
8888
144
61.722
( )2
144, 71, 25, 10, 59.5
10 14459.5 ( 71)
25 2
59.5 0.4
59.9
( ) ( )
49.5, 29, 21, 25, 10
2949.5
m
m m
m
fxMean x
f
Here fx f
x
x
h nMeadian l c
f
Heren c f h l
f fMode l h
f f f f
Herel f f f h
21
10(29 21) (29 25)
849.5 10
12
56.83
Assignment 2 Question 1
(a) What is difference between absolute measure of dispersion and relative measures of dispersion?
(b) The weekly sales of two products A and B were recorded as give below:
Product A 59 75 27 63 27 28 56
Product B 150 200 125 310 330 250 225
Find out which of the two shows greater fluctuation in sales.
Solution (a):
Absolute measures are describes by a number or value to represent the amount of
variation among the values in a data set. Such values are expressed in the same unit of measurement as the set of values in the data such as rupees, inches, and feet.
The relative measures are described as the ratio of a measure of absolute measure to an average and this value is independent of any unit of measurement. It is also called coefficient of variations.
Solution (b):
For this we will find Coefficient of Variation CV of both products. Required calculations are shown below,
Product A Product B
X X2 X X2
59 3481 150 22500
75 5625 200 40000
27 729 125 15625
63 3969 310 96100
27 729 330 108900
28 784 250 62500
56 3136 225 50625
Total = 335 18453 1590 396250
For Product A
22
2
335
7
47.86
. ( )
18453 335
7 7
2636.14 2290.31
18.60
xMean X
n
X
X
and
X XS D X S
n n
S
S
S
. 100
18.60. 100
47.86
. 38.86%
Now
CoefficietofVaritaion
SC V
X
C V
C V
For Product B
22
2
1590
7
227.14
. ( )
396250 1590
7 7
56607.14 51593.88
70.80
. 100
70.80. 100
227.14
. 31.17%
xMean X
n
X
X
and
X XS D X S
n n
S
S
S
Now
CoefficietofVaritaion
SC V
X
C V
C V
Conclusion/ Interpretation:
By comparing the CVs of product A and B, We see CV of product A is greater
than that of product B, this shows that Product B has greater fluctuation in sales.
Question 2
(a) What is empirical rule?
(b) Evaluate an appropriate measure of variation for the following data. Also find coefficient of that variation.
Farm size (acre) No. of forms
Below 40 394
41-80 461
81-120 391
121-160 334
161-200 169
201-240 113
241 and above 148
Solution (a):
Empirical Rule:
For a data set having symmetrical bell-shaped distribution (normal curve), the
range within which a given percentage of values of the distribution are likely to fall within a specified number of standard deviations of the mean is determined as follows:
( )X S Covers approximately 68% of values in the data set
( 2 )X S Covers approximately 95% of values in the data set
( 3 )X S Covers approximately 100 %( 99.73%) of values in the data set
Solution (b): Since the frequency distribution has open-end class intervals on the two extreme
sides, therefore Q.D. would be an appropriate measure of variation. The computation of Q.D. is shown below
Farm size
(acre)
Class
Boundaries
No. of forms
( f )
Cumulative
frequency ( cf )
Below 40 Below 40.5 394 394
41-80 40.5-80.5 461 855
81-120 80.5-120.5 391 1246
121-160 120.5-160.5 334 1580
161-200 160.5-200.5 169 1749
201-240 200.5-240.5 113 1862
241 and above 240.5 and above 148 2010
Total 2010
First we find first quartile:
1
1
1
1
( )4
2010502.5
4 4
461, 394, 40, 40.5
4040.5 (502.5 394)
461
40.5 9.41
49.91
h nQ l c
f
Here
nthvalue
f c h l
Q
Q
Q
And third quartile:
Q3 class
Q1 class
3
3
3
3
3( )
4
3 3(2010)1507.5
4 4
334, 1246, 40, 120.5
40120.5 (1507.5 1246)
334
120.5 31.31
151.81
h nQ l c
f
Here
nthvalue
f c h l
Q
Q
Q
Thus the quartile deviation is
3 1.2
151.81 49.91.
2
. 50.95
Q QQ D
Q D
Q D
And coefficient of Q.D:
3 1
3 1
.
151.81 49.91
151.81 49.91
0.505
Q QCoefficient of Q D
Q Q
Assignment 3 Question 1
(a) Define Set and its properties. Also explain the Venn diagram.
(b) The first four moments of a distribution about the origin are 1, 4, 10, and 46
respectively. Obtain the four moments about mean. Also calculate moment’s ratios.
Solution:
a)
Set: A set is any well-defined collection or list of distinct objects, e.g. a group of
students, the books in a library, the integers between 1 and 100, all human beings on the earth, etc
Properties of set:
Followings are the main properties of a set
i) Union ii) Intersection iii) Difference
Venn Diagram.
It is a diagram which is use to represent the set in such a way that the universal set or Sample Space is represented by the rectangle while its subsets are represented by the circles. e.g.
b)
In usual notations, we have
1
46,10,4,1,0
/
1
/
4
/
3
/
2
/
1
origionaboputmomentfirstx
A
314)( 2/
1
/
22
2 Variance
A
S
B
732.13.. 2 DS
012143102.33/
1
/
1
/
2
/
33
2713146110446
36.4
42
4
4/
1
2/
1
/
2
/
1
/
3
/
44
As we know that moment ratios are
03
03
2
2
3
1
And 4
2 2
2
273
9
Question 2
(a) In simple linear regression analysis, interpret “a” and “b”.
(b) A company is introducing a job evaluation scheme in which all jobs are graded
by points for skill, responsibility, and so on. Monthly pay scales (Rs. in 1000’s) are then drawn up according to the number of points allocated and other factors such as experience and local conditions. To date the company has applied this
scheme to 9 jobs:
Job: A B C D E F G H I Points: 5 25 7 19 10 12 15 28 16
Pay: 3.0 5.0 3.25 6.5 5.5 5.6 6.0 7.2 6.1
(i) Find the least squares line for linking pay scales to points. (ii) Estimate the monthly pay for a job graded by 20 points
(iii) Calculates the standard error of estimate. Solution:
a)
Solution: If bxay
than
a y-intercept that represents average value of the dependent variable y when x = 0
b slop of the regression line that represents the expected change in the value of y (either positive or negative) for a unit change in the value of x.
b)
Calculations required are as
x y x2 y2 xy
5 3 25 9 15
25 5 625 25 125
7 3.25 49 10.5625 22.75
19 6.5 361 42.25 123.5
10 5.5 100 30.25 55
12 5.6 144 31.36 67.2
15 6 225 36 90
28 7.2 784 51.84 201.6
16 6.1 256 37.21 97.6
137 48.15 2569 273.4725 797.65
(i) 35.59
15.48,22.15
9
137
n
xy
n
xx
133.0
13725699
15.4813765.7979222
xxn
yxxynbyx
33.322.15133.035.5 xbya
So the required regression line is xy 133.033.3
(ii) For job grade point x = 20, the estimate average pay scale is given by
96.520133.033.3133.033.3 xy
(iii) Standard error for the estimate is
2
2
.
n
xybyayS xy
29
65.797133.015.4833.347.273.
xyS
117
7
7
47.26647.273.
xyS
A B
13
6 8
A B 13
6 8
Assignment 4 Question 1
a. Is sample space changed/reduced in conditional probability, if yes, why?
b. From the following Venn diagram, this indicated the number of outcomes of an experiment corresponding to either event.
Total outcomes = 50
Find (i) P (A) (ii) P (B) (iii) P (AUB)
c. Two events, A and B are statistically dependent. If P (A) =0.39, P (B) = 0.21, and
P (A or B) = 0.47, find the probability that (i) Neither A nor B will occur.
(ii) Both A and B will occur.
Solution:
a. Yes, sample space is changed in conditional probability because some additional
information regarding to the outcomes of experiment is given. The effect of such information is to reduce the sample space by excluding some outcomes
as being impossible which before receiving the information were believed possible.
b. From the following Venn diagram, this indicated the number of outcomes of an
experiment corresponding to either event.
Total outcomes = 50
Find (i) P (A) (ii) P (B) (iii) P (AUB) Sol:
(i)
( )( )
( )
14( )
50
n AP A
n S
P A
P (A) = 0.28
(ii)
( )( )
( )
19( )
50
n BP B
n S
P B
P (B) = 0.38
(iii)
( ) ( ) ( ) ( )
14 19 6( )
50 50 50
27( )
50
( ) 0.54
P A B P A P B P A B
P A B
P A B
P A B
c.
( i)
= P (Neither A nor B will occur)
= ( )P A B
( ) 1 ( )
( ) 1 0.47
( ) 0.53
P A B P A B
P A B
P A B
( ii)
= P (Both A and B will occur) = ( )P A B
( ) ( ) ( ) ( )
( ) 0.39 0.21 0.47
( ) 0.13
P A B P A P B P A B
P A B
P A B
Question 2
(a).Two card are selected at random from a pack of 52 cards. What is the probability that
second is a card of king, if it is known that first card is (i) red card (ii) diamond card (iii) spade or diamond (iv) picture card?
(b). A company is considering upgrading its computer system, and a major probation of the upgrade is a new operating system. Suppose the probability of a favorable evaluation
is 0.65.If the probability the company will upgrade its system given a favorable evaluation is 0.85, what is the probability that the company will upgrade and receive a
favorable evaluation? Solution:
a.
Let k = king card D = diamond card SD = spade or diamond card
P = picture card
(i):
( )( )
( )
2 / 52( )
26 / 52
2( )
26
1( )
13
( ) 0.077
P R KKPR P R
KPR
KPR
KPR
KPR
(ii)
( )( )
( )
1/ 52( )
13/ 52
1( )
13
( ) 0.077
P D KKPD P D
KPD
KPD
KPD
(iii)
( )( )
( )
2 / 52( )
26 / 52
1( )
13
( ) 0.077
P SD KKPSD P SD
KPSD
KPSD
KPSD
(iv)
( )( )
( )
4 / 52( )
12 / 52
4( )
12
( ) 0.333
P P KKPP P P
KPP
KPP
KPP
b.
Let
Company upgrade the system = U Favorable evaluation = F
Then, we are given
P (F) = 0.65 and P (U/F) = 0.85
We are to find ( )P U F
By the formula of conditional probability;
( )( / )
( )
( / )* ( ) ( )
P U FP U F
P F
P U F P F P U F
Putting the values
(0.85)(0.65) = ( )P U F
0.5525 = ( )P U F
( )P U F =0.5525
Assignment 5 Question 1
a. Find first four moments; 1' , 2
' , 3' , 4
'
X 8 12 16 20 24
P(X) 1/8 1/6 3/8 1/4 1/12
b. Find the distribution function of the given p. d. f
f (x)= x8
1
2
1 , 0<x<4
a. Solution:
X 8 12 16 20 24
P(x) 1/8 1/6 3/8 1/4 1/12 1
X p(X) 8/8 12/6 48/8 20/4 24/12 16
X2p(X) 64/8 144/6 768/8 400/4 576/12 276
X3p(X) 512/8 1728/6 12288/8 8000/4 13824/12 5040
X4p(X) 4096/8 20736/6 196608/8 160000/4 331776/12 96192
Now,
1' =E(X) = )(. xpx =16
2' =E(X2) = )(.2 xpx =276
3' =E(X3) = )(.3 xpx =5040
4' =E(x4) = )(.4 xpx =96192
b. Solution
Given the p. d. f is:
f (x)= x8
1
2
1 , 0<x<4
For distribution function, we will proceed as follows
F(X) =
x
dxxf )(
=
x
dxx0
)8
1
2
1(
=xx
x
0
2
28
1
2
1
= )8
1(2
xx
Question 2
a. A continuous random variable X that can assume values only between X=2 and
X=8 inclusive has a density function given by
1
f x 348
x
Show that it is complete p. d. f also Find E(X).
b. Find the expected no. of boys on a committee which selected 3 at random from 4
boys and 3 girls.
a. Solution
Given that
f (x)= 48
1 (3+x) 2<x<8
To prove that it is complete p.d.f it should be satisfy the following property
1)( dxxf
= dxx)3(48
18
2
= 8
248
3dx
8
248
dxx
=8
248
3x + 2
82
)2
(48
1 x
= 48
3(8-2) +
48
1(32-2)
= 48
3(6) +
48
1(30)
=1 proved
E(x) = . ( )x f x dx
E(x) =
8
2
)3(48
dxxx
= 8
248
3dx
x+
8
2
2
48dx
x
=8
2
2
)2
(48
3 x+
48
1 8
2
3
3
x
= 48
3(32-2) +
48
1(
3
8
3
512 )
= )30(48
3
48
1
3
504
=5.375
b. Solution
Let X represents the no. of boys on the committee. Then
3
7
3
34
)(C
CCxf xx Where x=0, 1, 2, 3
Now
3
7
03
3
0
4
)0(C
CCf =1/35
3
7
13
3
1
4
)1(C
CCf =12/35
3
7
23
3
2
4
)2(C
CCf =18/35
3
7
33
3
3
4
)3(C
CCf =4/35
and
E(X)=(0)(1/35)+(1)(12/35)+(2)(18/35)+(3)(4/35)
= 1.71
OR,
Since X follows the hypergeometric distribution with ( 3, 4, 7)n K N
4 3
3. . , ( )
7
3
0,1, 2,3
( )
3 4( )
7
( ) 1.7143
x xp m f of X P X x
x
nKE X
N
E X
E X
Assignment 6 Question 1
a. Assume that the time X required for a runner to run a mile is a normal random
variable with parameters = 4 minutes 1 second and = 2 seconds. What is the
probability that this athlete will run the miles in more than 3 minutes 55 seconds?
Solution:
More than 3 minutes and 55 seconds = 235 second
235 241
2
3
xZ
Z
Z
Area against Z=3.0 is 0.49865 Required probability= ( 235) 0.50 0.49865 0.99865P X
b. One hundred passengers have made reservations for an airport flight. If the probability that a passenger who has a reservation will not show up is 0.01, what
is the probability that exactly 3 will not show up?
Solution:
This is essentially a binomial experiment with n=100 and p=0.01, since p is very small and n is considerably large, we shall apply the Poisson distribution, using
=(100)(0.01)=1
If X represents the number of success, we have
1
1
( 3)!
(1)3(3;1)
3!
sin
0.3679
xeP X
x
eP
ce
e
0.3679(3;1)
6P
P(X=3) = 0.061316
Question 2
a. Discuss the statement that in a binomial distribution =6 and =2.5.
Solution:
6
2.5
6.25
6
6 6.25
1.04 1
np
npq
Sruaring both sides
npq
Put np in above
q
q
Since probability can never be greater than 1
It is not possible for binomial distribution that =6 & =2.5.
b. The experience of a house agent indicates that he can provide suitable
accommodation for 75% of the clients who come to him. If on a particular
occasion 6 clients approach him independently, calculate the probability that at least 5 clients will get satisfactory accommodation
Solution:
( ) 1n n n x
xP X C p q where q P
6 5 1 6 6 0
5 6
( 5) ( 5) ( 6)
( 5) (0.75) (0.25) (0.75) (025)
P X P X P X
P X C C
= 0.3560+0.1780
=0.534
c. A committee of size 3 is selected from 4 men and 2 women. Find the probability distribution by the hyper geometric experiment for the number of men on the committee.
Solution:
Let X denotes the number of men on the committee:
No. of men Total No. of women No. of selected person
K=4 N=6 N-k=2 n=3
( )k N k
x n x
N
n
C CP X x
C
X
( )k N k
x n x
N
n
C CP X x
C
1 4 2
1 2
6
3
4 1 4
20 20
C C
C
2 4 2
2 1
6
3
6 2 12
20 20
C C
C
3 4 2
3 0
6
3
4 1 4
20 20
C C
C
Sum 1
Assignment 7
Question 1
a. Differentiate estimate and estimator? Solution:
Estimate:
An estimate is a numerical value of the unknown parameter.
Estimator:
An estimator stands for the rule or method that is used to estimate a parameter.
b. A test in statistics was given to 50 girls and 75 boys. The girls made an average grade of 76 with a standard deviation of 6, while the boys made an average of 82 with a
standard deviation of 8. Find a 96% confidence interval for the difference 1 2 , where
1 is the mean score of all boys and 2 is the mean score of all girls who might take this
test. Solution:
Boys
1
1
1
75
82
8
n
x
S
Girls
2
2
2
50
76
6
n
x
S
1 0.96 0.04
0.04 0.022 2
2.05Z Z Z
96% confidence interval is:
2 2
1 21 2
21 2
2 2
( )
(8) (6)(82 76) 2.05
75 50
6 2.05(1.254)
6 2.571
6 2.571,6 2.571
3.429,8.571
S Sx x Z
n n
Question 2
a. What is meant by bias? Bias is the difference between the expected value of the statistic and the true value of the unknown parameter being estimated.
It is defined as:
B= E (T)-
Where T is the sample statistic used to estimate the population parameter .
The bias is positive if E (T)> , and negative if E (T) < .
b. In a random sample of 75 axle shafts, 12 have a surface finish that is rougher than the specification will allow. How large a sample is required if we want to be 95% confident
that the error in using p to estimate p is less than 0.05?
Solution:
0.05e
0.025
2
2
2
12ˆ 0.1675
ˆˆ 1 1 0.16 0.84
1.96
ˆ ˆ
1.96(0.16)(0.84)
0.05
207
P
q P
Z
Zn pq
e
n
n
Question 3
a. A random sample of 500 residents of a city is chosen and the numbers of smokers are noted. It is found that 100 are smokers. Obtain an unbiased estimate of the proportion of
smokers in the city.
Solution:
n= 500
x=100 Sample proportion is the unbiased estimator and estimate is
P=x/n=100/500=1/5
b. The manufacturer of a patent medicine claimed that it was at least 90% effective in relieving an allergy for a period of 8 hours. In a sample of 200 people, who had an allergy
the medicine provided relief for 160 people? Determine whether the manufacture “s claim is legitimate.
Solution:
n=200, x=160, p=x/n=160/200=0.8
Hypothesis:
9.0:
9.0:
1
0
PH
PH
Level of significance:
α=5%=0.05
Test statistic:
n
PQ
PpZ
Critical region
645.1
645.105.0
calculatedZ
ZZ
Calculation:
n
PQ
PpZ
0.8 0.9
(0.9)(0.1)
200
0.14.717
0.0212
Z
Z
Conclusion:
Since Z-calculated =-4.717 lies in critical region, we reject H0 and conclude that manufacturer’s claim is not legitimate.
Assignment 8 Question 1
a. If α=0.10, how many intervals would be expected to contain μ.
Solution:
In the repeated sampling, we would expect about 90 % of the intervals to contain μ in the long run.
b. A random sample of 10 university professors gave their salaries (in thousand Rs.) 13, 11, 19, 15, 22, 20, 14, 17, 14, 15.Another random sample of 5 college professors gave
their salaries (in thousand Rs.) 9, 12, 8, 10, 16. Construct a 95% confidence interval for the difference between means of the salaries of university and college professors assuming that their population variances are equal.
Solution:
X1 X12 X2 X22
13 169 9 81
11 121 12 144
19 361 8 64
15 225 10 100
22 484 16 256
20 400
14 196
17 289
14 196
15 225
160 2666 55 645
Now,
2
})(
{})(
{
115
55
1610
160
21
2
2
22
2
1
2
12
1
2
2
1
1
nn
n
xx
n
xx
s
n
xx
n
xx
p
2510
}645{}10
)160(2666{
5
)55(2
2
ps
13
}40{}106{ ps
=3.351
16.2
05.0
95.01
1325102
13,025.013,
2
05.0
)(2
21
ttt
nnv
v
Now,
95% confidence interval for 21 is,
21,
2
21
11)(
nnstxx p
v
5
1
10
1)351.3(16.2)1116(
)548.0)(351.3(16.25
965.35
965.8035.1 21
Question 2
Three cards are drawn from an ordinary deck of playing cards, with replacement, and the number y of spades is recorded. After repeating the experiment 64 times, the following
outcomes were recorded:
Y 0 1 2 3
f 21 31 12 0
Test the hypothesis at 1 % level of significance, that the recorded data may be fitted by the binomial distribution with values b(y; 3, 1/4) for y=0,1,2,3
Solution:
Hypothesis:
goodnotisfitTheH
goodisfitTheH
:
:
1
0
Level of Significance:
0.01
Test statistic:
i
ii
e
eo 2
2 )(
Calculations:
y
iof fy ie ii eo 2)( ii eo
i
ii
eeo 2)(
0 21 0 64 30
0
3 )4/3()4/1(C 27 -6 36 1.3333
1 31 21 64 21
1
3 )4/3()4/1(C 27 4 16 0.59259
2 12 12
24 64 12
2
3 )4/3()4/1(C
3 0 0 64 03
3
3 )4/3()4/1(C
64 45 2.3258=χ2
Critical region:
No. of degrees of freedom=k-1-no. of estimated parameters
=3-1-0 (In binomial dis. we have two parameters and they are provided in question)
2 2
0.01,2
2 9.21
Conclusion: Since our calculated value of chi-square does not fall in critical region so we
do not reject H0 and conclude that the fit is good.
10*
2 4 0.4