Transcript
Page 1: Statistics and Probability Solved Assignments - Semester Fall 2008

Assignment 1

Question 1:

(Marks: 16) Write the short notes on the following:

Solution:

i) Variable and constant Variable: A measurable quantity which can vary from one individual or object to another is called a variable.

Constant: A quantity which can assume only one value is called a constant.

ii) Continuous and Qualitative variable Continuous variable: A variable which can assume an infinite number of values

with in a given range is called a continuous variable .e.g. weight, height, length Qualitative variable: A variable that can not be expressed in numerical form but

shows the presence or absences of some attribute is called qualitative data. For examples marital status, religion, sex etc

iii) Population and sample

Population: The collection of individuals or objects having some common measurable characteristics. Sample: A representative part of the population is called sample.

iv) Primary and secondary data Primary data: the data published or used by an organization which originally collected by them is called primary data.

Secondary data: The data published or used by an organization which they did not collect originally is known as secondary data.

v) Sampling errors and non-sampling errors

Sampling Error: The difference between the estimate derived from the sample and the true population value (the parameter) is technically called the sampling error.

Non-sampling errors: There are certain errors which are not attributable to sampling but arise in the process of data collection, even if a complete count is carried out. Such errors are known as non-sampling errors.

vi) Multiple bar chart and component bar chart

Page 2: Statistics and Probability Solved Assignments - Semester Fall 2008

Multiple bar chart: This chart is simple an extension of simple bar chart. In this chart, grouped (adjacent) bars are used to represent related set of data. Each bar in a

group is shaded differently for distinction. Component bar chart: This chart is consisting of horizontal or vertical bar which

are subdivided into two or more parts. This chart is used when it is desired to present data which are subdivisions of totals.

vii) Frequency distribution Frequency distribution: A frequency distribution is a tabular arrangement of data in

which various items are arranged into classes and the number if item falling in each class (called class frequency) is stated.

viii) Measure of central tendency

Measure of central tendency: A single value which represents the whole data is called the average value. Since the average tends to lie in the center of data/distribution, it is also called measures of central tendency.

Question 2:

(Marks: 4) State which of the following represent qualitative data and which one of them represents

quantitative data. i) Religion of the people of the country (qualitative data)

ii) Fee of VU students (quantitative data) iii) Majority of population like Geo TV (qualitative data)

iv) Inches of rainfall in Lahore city during the last year (quantitative data)

Note: Question 3:

(Marks: 10)

The following data are the weights in pound of 42 students of Virtual University. Construct a stem-and-leaf display of the data.

135 157 152 189 135 164 146

144 154 153 150 158 168 165

140 132 140 126 146 135 144

147 138 173 161 125 136 176

142 145 149 148 163 147 135

142 150 156 145 128 154 171

Solution:

The stem-and-leaf display of the data is shown below.

Stem Leaf

Page 3: Statistics and Probability Solved Assignments - Semester Fall 2008

12 5 6 8

13 8 2 6 5 5 5 5

14 4 9 6 0 7 8 4 6 2 0 5 2 5 7

15 0 7 8 2 4 3 0 6 4

16 4 8 3 5 1

17 6 3 1

18 9

Stem

Leaf

12 5 6 8

13 2 5 5 5 5 6 8

14 0 0 2 2 4 4 5 5 6 6 7 7 8 9

15 0 0 2 3 4 4 6 7 8

16 1 3 4 5 8

17 1 3 6

18 9

Page 4: Statistics and Probability Solved Assignments - Semester Fall 2008

Assignment 2

Question 1:

a) What is the difference between Chebyshev’s inequality and empirical rule (in

terms of skweness)?

Solution:

Chebyshev’s inequality and Empirical rule both tells us the proportion of data values

that must lie within a specified number of standard deviation from mean.

Chebyshev’s inequality is a general rule for all symmetric and non symmetric

distributions.

But empirical rule is applicable only on the symmetric distributions.

b) The share prices of a company in Lahore and Islamabad market during the last

months are recorded below:

Months Jan Feb March April May Jun July Aug Sep Oct

Lahore 105 120 115 118 130 127 109 110 104 112

Islamabad 108 117 120 130 100 125 125 120 110 135

In which market, the shares prices are more stable?

Solution:

For the stability of market we have to check the Coefficient of variation for both

cities, the city having less CV will show stability in its market.

33.811510

132944

11510

1150

2

22

n

x

n

xlahoreS

n

xlahorex

24.7100115

33.8100..

x

SLahoreVC

Page 5: Statistics and Probability Solved Assignments - Semester Fall 2008

48.8100119

09.10100..

09.1011910

142628

11910

1190

2

22

y

SIslamabadVC

n

y

n

yIslamabadS

n

yIslamabady

By the comparison of coefficient of variations shows that there is more stability in

Lahore stock exchange as compare to Islamabad.

Question 2:

a) Interpret standard deviation.

Solution:

The standard deviation is a very important concept that serves as a basic measure of

variability. A smaller value of the standard deviation indicates that most of the

observations in a data set are close to the mean while large value of S.D implies that

the observations are scattered widely about the mean.

b) The following data give the number of passengers traveling by airplane from one

city to another in one week.

115 112 129 113 119 124 132 120 110 116

Calculate the mean and standard deviation and determine the percentage of class

that lies between (i) (ii) 2 (iii) 3 . What percentage of data lies

outside these limits?

Solution: Let x represents the number of people traveling by airplane from one

city to another in one week.

Calculations for mean and standard deviations are given

1190119

10

xx

n

222142096

119 6.9710

x xS

n n

Thus percentage of data lies between given limits:

Interval Values within

Interval

%age of

values within

interval

%age of

values

Falling

Page 6: Statistics and Probability Solved Assignments - Semester Fall 2008

Outside

119 6.97 125.97,112.03

2

119 2(6.97) 132.94,105.06

3

119 3 6.97 139.91,98.09

113,115, 116, 119,

120, 124

110, 112,113, 115,

116, 119, 120, 124, 129, 132

All values

6100 60%

10

%10010010

10

100

40%

nil

nil

Page 7: Statistics and Probability Solved Assignments - Semester Fall 2008

Assignment 3 Question 1

Give the short answers of the following:

I. What are moments? And why we use moments. II. What is meant by kurtosis?

III. Lepto kurtic

IV. Platy kurtic V. Normal distribution

VI. Regression VII. Regressor VIII. Regressand

Solution:

What are moments? And why we use moments.

Moments are central parameters, which are used for testing the symmetry and normality of the distribution. What is meant by kurtosis?

The term kurtosis is meant to show the degree of peak ness of the distribution. Lepto kurtic:

A distribution having a relatively higher peak is called Lepto kurtic distribution. Platy kurtic:

A distribution, which is flat – topped, is called platy distribution. Normal distribution:

A distribution which is neither very peaked nor very flat, is called normal distribution or mesokurtic. Regression:

It investigates the dependence of one dependent variable on the other independent variable. Regressor:

The independent or the non-random variable is also referred to as the regressor, the predictor, the regression variable or the explanatory variable. Regressand:

The dependent or the random variable is also referred as the regressand , the

predictand , the response or the explained variable. Question 2:

If distribution has mean 1403 and mode 1487, what can you say about the

skewness? Solution:

Mean = 1403 Mode = 1487 The distribution is negatively skewed, because

Mean < Mode Question 3:

a) Distinguish between permutation and combination.

Page 8: Statistics and Probability Solved Assignments - Semester Fall 2008

b) First four moments of a certain distribution about Y = 17.5 are 0.3,74,45, and 12125 respectively. Find out whether the distribution is Lepto kurtic or

Platy kurtic.

Solution: a. Permutation:

A permutation is an arrangement of all or part of a set of objects in a definite order. The number of permutations of n distinct objects taken r

at a time is !

( )!

n

r

nP

n r

Combination:

A combination is an arrangement of objects without regard to their order. The

number of combinations of n objects taken r at a time is !

!( )!

n

r

nC

r n r

b. First four moments about Y = 17.5

1

2

3

4

0.3

74

45

12125

m

m

m

m

Moments about mean:

1

2

2 2 1

3 3 2 1 1

3

2 4

4 4 3 1 2 1 1

0

( ) 74 0.09 73.91

3 2( )

45 3(74)(0.3) 0.054 21.546

4 6 ( ) 3( )

m

m m m

m m m m m

m

m m m m m m m

2 4

4 12125 4(0.3)(45) 6(74)(0.3) 3(0.3)m

=12125-54+39.96-0.0243 = 12110.94

42 2 2

2

2

12110.942.22

(73.91)

3

mb

m

b

Page 9: Statistics and Probability Solved Assignments - Semester Fall 2008

The distribution is platykurtic.

Page 10: Statistics and Probability Solved Assignments - Semester Fall 2008

Assignment 4 Question 1

An Urn contains 5 white and 7 black balls, five balls drawn at random. a) Find the distribution function of the probability distribution of no. of white balls. b) Draw the graph of the distribution function.

a. Let X be a random variable which represent the number of white balls then the random

variable X takes the values 0,1,2,3,4,5 and their probabilities are:

N=12, n=5

5 7

0 5

12

5

.0p X c c

c

=21/792

=7/264

5 7

1 4

12

5

.p X=1 c c

c

=175/792

5 7

2 3

12

5

.p X=2 c c

c

=350/792 =175/396

5 7

3 2

12

5

.p X=3 c c

c

=210/792

=35/132

5 7

4 1

12

5

.p X=4 c c

c

=35/792

Page 11: Statistics and Probability Solved Assignments - Semester Fall 2008

5 7

5 0

12

5

.p X=5 c c

c

=1/792

Probability Distribution of X

Number of white balls X

Probability f(X)

0 21/792

1 175/792

2 350/792

3 210/792

4 35/792

5 1/792

In order to obtain the distribution function of the probability distribution, we compute the Cumulative Probabilities as follows:

Number of white balls X

Probability f(X)

Cumulative frequency F(X)

0 21/792 21/792

1 175/792 196/792

2 350/792 546/792

3 210/792 756/792

4 35/792 791/792

5 1/792 792/792

Hence the desired Distribution Function is:

0 for x<0

21/792 for 0x<1

196/792 for 1x<2

F(X) = 546/792 for 2x<3

756/792 for 3x<4

791/792 for 4x<5

1 for x5

b.

Page 12: Statistics and Probability Solved Assignments - Semester Fall 2008

Question 2

Three balls are drawn at random from a box containing 3 blue balls, 2 red balls and 3 green balls. If X represents no. of blue balls and y is the number of red balls. Then

a) Make the joint distribution of X and Y

b) Find f(x/1)

c) P(X=2/Y=0)

Solution:

The joint probability distribution will be determined as follows

3 2 33

8 3( , ) x y x yC C C

Cf X x Y y Where x=0, 1, 2, 3 and y=0, 1, 2

3 2 3

0 0 3

8

3

3 2 3

0 1 2

8

3

3 2 3

1 0 2

8

3

( 0, 0) 1/ 56

( 0, 1) 6 / 56

( 1, 0) 9 / 56

C C Cf x y

C

C C Cf x y

C

C C Cf x y

C

Similarly we can find the remaining probabilities

a. Joint distribution of X and Y

Y

X

0 1 2 3 h(y)

0 1/56 9/56 9/56 1\56 20/56

1 6/56 18/56 6/56 0 30/56

2 3/56 3/56 0 0 6/56

g(x) 10/56 30/56 15/56 1/56 1

Page 13: Statistics and Probability Solved Assignments - Semester Fall 2008

For Part (b):

f (x/1)=( ,1)

(1)

f x

h

Now we have to find first the h (1)

h(1)=f(0,1)+f(1,1)+f(2,1)+f(3,1)

=6/56+18/56+6/56+0=30/56

Then,

f (x/1)= 56 ( ,1)

30

f x

56(0 /1) (0,1)

30

56 6 1( )

30 56 5

56(1/1) (1,1)

30

56 18 3( )

30 56 5

56(2 /1) (2,1)

30

56 6 1( )

30 56 5

56(3/1) (0) 0

30

f f

f f

f f

f

x 0 1 2 3

f(x/1) 1/5 3/5 1/5 0

c.

P(x=2/Y=0)

( 2, 0)( 2 / 0)

(0)

f x yP x Y

h

= 9 / 56

9 / 2020 / 56

Page 14: Statistics and Probability Solved Assignments - Semester Fall 2008
Page 15: Statistics and Probability Solved Assignments - Semester Fall 2008

Assignment 5 Question 1: Define Poisson process.

Sol: A Poisson process represents a situation where events occur randomly over a specified interval of time or space or length.

a) Given a random variable X, E(X) = 0.63 & Var (X) = 0.2331. Find 2( )E X .

Sol:

E(X) = 0.63 & Var (X) = 0.2331

22( ) ( ) ( )Var X E X E X

Putting the information in the above formula , we get

220.2331 ( ) 0.63E X

2

2

0.2331 ( ) 0.3969

0.2331 0.3969 ( )

E X

E X

2( ) 0.63E X

Question 2: a) When do we deal discrete Uniform distribution?

Sol: The point to be kept in mind is that, whenever we have a situation where the various

outcomes are equally likely, and of a form such that we have a random variable X with values 0, 1, 2, … ..n then we will be dealing with the discrete uniform distribution.

b) A random variable X is normally distributed with 250 25and . Find

the probability that it will

I. larger than 54

II. Smaller than 57. Sol:

With250 25and , we have

i) At x=54

54 500.80

5Z

Hence using table we have P(X>54) = P (Z>0.8) = 0.5- P (0Z0.8)

= 0.5 – 0.2881= 0.2119.

Page 16: Statistics and Probability Solved Assignments - Semester Fall 2008

ii) At x= 57

57 50

1.405

Z

Therefore using table P(X<57) = P (Z<1.40)

= 0.5+ P (0Z1.40) =0.5+ 0.4192 = 0.9192

Question 3: In which condition, Poisson distribution is used to approximate the hyper geometric

distribution? Sol:

The Poisson distribution can be used to approximate the hyper geometric distribution when

n < 0.05N, n > 20, and p < 0.05

a) A fair coin is tossed 20 times. Find the probability that the number of heads

occurring is between 10 and 14 inclusive by using the normal approximation

to the binomial distribution.

Sol: Since n= 20, p= 0.5, q= 1-p = 0.5

20(0.5) 10np

20(0.5)(0.5) 2.24npq

For the normal approximation, the interval of discrete value 10X14 is replaced by the interval 9.5X14.5, we compute as

At x=9.5, we find

1

9.5 100.22

2.24Z

2

14.5 102.01

2.24Z

Hence by using table

P (10X14) = P (-0.22X2.01) = P (-0.22X0) + P(0X2.01)

= 0.0871 + 0.4778 = 0.5649.

Page 17: Statistics and Probability Solved Assignments - Semester Fall 2008

Assignment 6

Question 1:

Define the following terms:

Parameter and statistic

The quantity calculated from the population is known as Parameter whereas the statistic is a quantity calculated form sample.

Sampling distribution of a statistic

The probability distribution of a sample statistic is the sampling distribution of a statistic.

Unbiased estimator

An estimator is unbiased if the mean of its sampling distribution is equal to the population parameter to be estimated.

Statistical Estimation

The statistical estimation is a procedure of making judgment about the unknown value of

a population parameter by using the sample observations.

Standard Error of a statistic

The standard deviation of the sampling distribution of a statistic is called the standard

error of the statistic. Question 2:

a) A random variable X has the following probability distribution:

x 4 5 6

P(x) 0.3 0.5 0.2

Find the mean X

and standard error X

of the mean for a random sample of size 2.

Solution: A random variable X has the following probability distribution:

x 4 5 6

P(x) 0.3 0.5 0.2

xP(x) 1.2 2.5 1.2 ( )xP x =4.9

Page 18: Statistics and Probability Solved Assignments - Semester Fall 2008

x 2 P(x) 4.8 12.5 7.2 2 ( )x P x =24.5

( ) ( ) 4.9E x xP x

2 2 2( ) ( ) ( ) 24.5 (4.9) 0.49

0.49 0.7

Var x x P x xP x

We know that:

4.9X

22 0.49

0.2452

Xn

0.245 0.495X

b) It is known that 3% of the persons living in Gujranwala city are known to have a

certain disease. Find the mean and standard error of sampling distribution of proportion of diseased persons in a random sample of 500 persons.

Solution: We have proportion in the population P= 0.03 and the sample size n= 500.

Let the sample proportion is P̂

Then, ˆ 0.03p P

And ˆ

(1 ) 0.03(1 0.03)0.00763

500p

P P

n

Question 3:

a) In a random sample of 500 people eating lunch at a hospital cafeteria on various

Fridays, it was found that 160x preferred seafood. Find 95% confidence interval for the actual proportion of people who eat seafood on Fridays at this cafeteria. Solution:

The point estimate of population proportion is160

ˆ 0.32500

p .Using table we

find 0.05/ 2 1.96z .Therefore

/ 2

ˆ ˆ(1 )ˆ

(0.32)(0.68)0.32 1.96

500

0.32 0.04

0.28, 0.36

p pp z

n

b)

The mean and standard deviation for the quality grade-point averages of a random sample are calculated to be 2.6 and 0.3. How large sample is required if we want to be 95% confident that our estimate of is not off by more than 0.05

Page 19: Statistics and Probability Solved Assignments - Semester Fall 2008

Solution: We know that

2

/ 2ˆ.z

ne

As given

/ 2 1.96z

ˆ 0.3

0.05e By substituting given values

2 2

/ 2ˆ. 1.96 0.3

138.30.05

138

zn

e

n

Page 20: Statistics and Probability Solved Assignments - Semester Fall 2008

Assignment 7

Question 1:

Prove that when n is large, 2sis approximately equal to

2S

Solution:

22 2 2

22 2 2

2 2 2 2 2

2 2

( )( ) ( 1)

1

( )( )

( 1) 1( 1) 1

,

10

As we knowthat

x xs x x n s

n

whereas

x xS x x nS

n

Hence

nn s nS S s s

n n

Now as

nn

Hence

If n is LARGE

S s

(a) A random sample of 100 workers with children in day care show a mean day-care

cost of Rs.2650 and a standard deviation of Rs.500. Verify the department’s claim

that the mean exceeds Rs.2500 at the 0.05 level with this information.

Page 21: Statistics and Probability Solved Assignments - Semester Fall 2008

0

1

0

1:

: 2500

: 2500( )

2 :

0.05

3 :

2650 2500 150

500 50100

3

4 :

0.05 1.645

5 :

Step

H

H one sided test

Step

Step

xz

Sn

z

z

Step

Thecritical region for is z

Step

Since the calculated value is Z falls in the critical region, so we accept Alternative hypothesis.

Question 2:

(a) A random sample of size n is drawn from normal population with mean 5 and

variance2. Answer the following:

If s=15, x

=14 and t=3, what is value of n?

Solution:

As we know that

14 53

15

153( ) 9

45 9

5

25

xt

sn

n

n

n

n

n

(b) In a poll of college students in a large university, 300 of 400 students living in students’ residences (hostels) approved a certain course of action, whereas 200 of 300

students not living in students’ residences approved it. Compute the 90% confidence interval for this difference.

Page 22: Statistics and Probability Solved Assignments - Semester Fall 2008

Solution:

1

1 1

2

2 2

0.05

300ˆ 0.75

400

ˆ ˆ1 1 0.75 0.25

200ˆ 0.67

300

ˆ ˆ1 1 0.67 0.33

0.10 1.645

Let

p

q p

and

p

q p

z

The 90% confidence interval for 1 2p p is

1 1 2 21 2

21 2

ˆ ˆ ˆ ˆˆ ˆ

(0.75)(0.25 (0.67)(0.33)0.75 0.67 1.645

400 300

0.08 1.645(0.0347)

0.08 0.057

0.023,0.137

p q p qp p Z

n n