51
SAMPLING DISTRIBUTIONS Statistical Inference is concerned with making decisions about a population based on the information contained in a random sample from that population. A sample is a subset of the population, selected to representative of the larger population. It is essential that any sample is as representative as possible of the population from which it is drawn. Often, we may want to know things about populations but don’t have data for every person or thing in the population. If a company’s customer service division wanted to learn whether its customers were satisfied, it would not be practical or perhaps even possible to contact every individual who purchased a product. Instead, the company might select a sample or samples of the population. For instance, we may be interested in the mean fill volume of a can of soft drink. The mean fill volume in the population is required to be 300 millilitres. An Engineer takes a random sample of 25 cans and computes the sample average to be = 298 millilitres. The Engineer will probably decide that the population mean is µ = 300 millilitres, even though the sample was 298 millilitres because he or she knows that the sample mean is a reasonable estimate of µ and that a sample mean of 298 millilitres is very likely to occur, even if the true population mean is µ = 300 millilitres. In fact, if the true mean is 300 millilitres, tests of 25 cans made repeatedly, perhaps, every five minutes, would produce values of that vary both above and below µ = 300 millilitres. 1

Testing of Hypothesis

Embed Size (px)

Citation preview

Page 1: Testing of Hypothesis

SAMPLING DISTRIBUTIONS

Statistical Inference is concerned with making decisions about a population based on the information contained in a random sample from that population.

A sample is a subset of the population, selected to representative of the larger population. It is essential that any sample is as representative as possible of the population from which it is drawn.

Often, we may want to know things about populations but don’t have data for every person or thing in the population. If a company’s customer service division wanted to learn whether its customers were satisfied, it would not be practical or perhaps even possible to contact every individual who purchased a product. Instead, the company might select a sample or samples of the population.

For instance, we may be interested in the mean fill volume of a can of soft drink. The mean fill volume in the population is required to be 300 millilitres. An Engineer takes a random sample of 25 cans and computes the sample average to be = 298 millilitres. The Engineer will probably decide that the population mean is µ = 300 millilitres, even though the sample was 298 millilitres because he or she knows that the sample mean is a reasonable estimate of µ and that a sample mean of 298 millilitres is very likely to occur, even if the true population mean is µ = 300 millilitres. In fact, if the true mean is 300 millilitres, tests of 25 cans made repeatedly, perhaps, every five minutes, would produce values of that vary both above and below µ = 300 millilitres.

.

The sample mean is a statistic; that is, a random variable that depends on the results obtained in each particular sample. Since statistic is a random variable, it has a probability distribution.

A parameter is a characteristic of a population. A statistic is a characteristic of a sample. Inferential statistics enables us to make an educated guess about a population

1

Page 2: Testing of Hypothesis

parameter based on a statistic computed from a sample randomly drawn from that population.

A sampling distribution is the probability distribution of a sample statistic that is formed when samples of size n are repeatedly taken from a population. If the sample statistic is the sample mean, then the distribution is the sampling distribution of sample means.

SAMPLING DISTRIBUTION OF MEANS

Suppose that a random sample of size n is taken from a normal population with mean µ and variance . Now each observation in this sample, say, X1, X2, ..., Xn, is a normally and independently distributed random variable with mean µ and variance . Then by the reproductive property of normal distribution, the sample mean,

has a normal distribution with mean

and variance

If we are sampling from a population that has an unknown probability distribution, the sampling distribution of the sample mean will still be approximately normal with mean µ

and variance , if the sample size is large. This is one of the most useful theorems

called central limit theorem.

CENTRAL LIMIT THEOREM

If X1, X2, ... , Xn is a random sample of size n taken from a population (finite or infinite) with µ and finite variance , and if is the sample mean, the limiting form of the distribution of

, as n → ∞, is the standard normal distribution.

2

Page 3: Testing of Hypothesis

NOMENCLATURE AND DEFINITIONS

Statistical HypothesisA statistical hypothesis is a statement about the parameters of one or more populations.

Null Hypothesis (H0) A null hypothesis is a hypothesis that might be falsified on the basis of observed data. The null hypothesis typically proposes a general or default position, such as that there is no relationship between two quantities, or that there is no difference between a treatment and the control.

Null hypothesis is a statement of zero or no change. If the original claim includes equality (≤, =, or ≤), it is the null hypothesis. If the original claim does not include equality (<, not equal, >) then the null hypothesis is the complement of the original claim. The null hypothesis always includes the equal sign.

The decision is based on the null hypothesis.

Alternative Hypothesis (H1 or Ha) Statement which is true if the null hypothesis is false. The type of test (left, right, or two-tail) is based on the alternative hypothesis.

Test statistic Sample statistic used to decide whether to reject or fail to reject the null hypothesis.

Critical region Set of all values which would cause us to reject H0 (Region of rejection).

Critical value(s) The value(s) which separate the critical region from the non-critical region. The critical values are determined independently of the sample statistics.

Significance level (α)The probability of rejecting the null hypothesis when it is true; α = 0.05 and α = 0.01 are common. If no level of significance is given, use alpha = 0.05.

Decision A statement based upon the null hypothesis. It is either "reject the null hypothesis" or "fail to reject the null hypothesis". We will never accept the null hypothesis.

Conclusion A statement which indicates at what level of significance the null hypothesis is rejected or not rejected.

Statisticians will never accept the null hypothesis, we will fail to reject. In other words, we'll say that it isn't, or that we don't have enough evidence to say that it isn't, but we'll

3

Page 4: Testing of Hypothesis

never say that it is, because someone else might come along with another sample which shows that it isn't and we don't want to be wrong.

ERRORS OF TESTING OF HYPOTHESIS

Type I error Rejecting the null hypothesis when it is true. Usually the more serious error.

Type II error Failing to reject the null hypothesis when it is false.

Alpha (α)Probability of committing a Type I error.

Beta (β)Probability of committing a Type II error.

Decision H0 True H0 False

Reject H0 Type I Error - α Correct Assessment

Fail to Reject H0 Correct Assessment Type II Error - β

Which of the two errors is more serious? Type I or Type II?

Since Type I is the more serious error (usually), that is the one we concentrate on. We usually pick alpha to be very small (0.05, 0.01). Notice here that, alpha is not a Type I error. Alpha is the probability of committing a Type I error. Likewise beta is the probability of committing a Type II error

TYPE OF TESTS

The type of test is determined by the Alternative Hypothesis (H1)

Left Tailed Test

4

Page 5: Testing of Hypothesis

H1: parameter < valueNotice the inequality points to the left

Decision Rule: Reject H0 if test statistic < critical value

Right Tailed Test

H1: parameter > valueNotice the inequality points to the right

Decision Rule: Reject H0 if test statistic > critical value

Two Tailed Test

H1: parameter ≠ valueNotice the inequality points to both sides

Decision Rule: Reject H0 if

test statistic < critical value (left) or test statistic > critical value (right)

THE TESTING PROCESS

Hypothesis testing involves the following general procedure:

1. State the relevant null and alternative hypotheses to be tested.

2. The second step is to consider the assumptions being made in doing the test; for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is equally important as invalid assumptions will mean that the results of the test are invalid.

5

Page 6: Testing of Hypothesis

3. Compute the relevant test statistic. The distribution of such a statistic under the null hypothesis can be derived from the assumptions. In standard cases this will be a well-known result. For example the test statistics may follow a Student’s t distribution or a normal distribution. The distribution of the test statistic partitions the possible values of the estimator into those for which the null-hypothesis is rejected and those for which it is not.

4. Compare the test-statistic (S) to the relevant critical values (CV) (obtained from tables in standard cases).

5. Decide to either fail to reject the null hypothesis or reject it in favor of the alternative. The decision rule is to reject the null hypothesis (H0) if S > CV and vice versa.

TESTS FOR MEAN

We will see how to conduct a test of hypothesis for a mean, when the following conditions are met:

The sampling method is simple random sampling. The sample is drawn from a normal or near-normal population.

Generally, the sampling distribution will be approximately normally distributed if any of the following conditions apply.

The population distribution is normal. The sampling distribution is symmetric, unimodal, without outliers, and the

sample size is 15 or less. The sampling distribution is moderately skewed, unimodal, without outliers, and

the sample size is between 16 and 40. The sample size is greater than 40, without outliers.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

STATEMENT OF THE HYPOTHESES

The table below shows three sets of hypotheses. Each makes a statement about how the population mean, μ is related to a specified value μ0.

Case Null hypothesis Alternative hypothesis Number of tails

1 μ = μ0 μ ≠ μ0 2

2 μ > μ0 μ < μ0 1

3 μ < μ0 μ > μ0 1

6

Page 7: Testing of Hypothesis

The first case of hypotheses is an example of a two-tailed test, since an extreme value on either side of the sampling distribution would cause to reject the null hypothesis. The other two cases of hypotheses are one-tailed tests, since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis.

7

Page 8: Testing of Hypothesis

TESTS FOR MEAN - KNOWN VARIANCE: (Z- Test)

Let X1, X2, ..., Xn is a random sample drawn from a normal population with known variance σ2. Using sample data, conduct a one-sample Z-test.

Calculate the test statistic

where Z is the standard normal variable. Compare the Z calculated value with and

which are the critical values from normal distribution corresponding to α and α/2

probabilities representing one tail and two-tail tests. The decisions are given in the following table.

CaseNull

hypothesisAlternative hypothesis

Number of tails

Reject Null Hypothesis

1 μ = μ0 μ ≠ μ0 2

2 μ > μ0 μ < μ0 1

3 μ < μ0 μ > μ0 1

EXAMPLE:

Aircrew escape systems are powered by a solid propellant. The burning rate of this propellant is an important product characteristic. Specifications require that the mean burning rate must be 50 centimeters per second. We know that the standard deviation of burning rate is σ = 2 centimeter per second. The experimenter decides to specify a type I error probability or significance level of α = 0.05 and selects a random sample of n= 25 and obtains a sample average burning rate of = 51.3 centimeter per second. What conclusion should be drawn?

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is μ, the mean burning rate.2. H0: μ = 50 centimeter per second3. H1: μ ≠ 50 centimeter per second4. α = 0.055. The test statistics is

6. Reject H0 if Z ≥ 1.96 or if Z ≤ -1.96. Note that this results from step 4, where we specified α = 0.05 and so the boundaries of the critical region are at

and from normal distribution tables.

7. Computations: Since = 51.3 and σ = 2,

8

Page 9: Testing of Hypothesis

8. Conclusion: Since Z0 = 3.25 > 1.96 (Zα/2), we reject H0: μ = 50 at the 0.05 level of significance. Stated more completely, we conclude that the mean burning rate differs from 50 centimeters per second, based on a sample of 25 measurements. In fact, there is strong evidence that the mean burning rate is not equal to 50 centimeters per second.

Note: In case of one tail tests –

(i) when the alternative is of the type H1: μ > 50 centimeter per second the conclusion would be to reject H0: μ ≤ 50 at the 0.05 level of significance, since Z0 = 3.25 > 1.64 (Zα). Stated more completely, we conclude that the mean burning rate differs from 50 centimeters per second, based on a sample of 25 measurements. In fact, there is strong evidence that the mean burning rate exceeds 50 centimeters per second.

(ii) when the alternative is of the type H1: μ < 50 centimeter per second the conclusion would be do not reject H0: μ ≥ 50 at the 0.05 level of significance, since Z0 = 3.25 > - 1.64 (Zα). Stated more completely, we conclude that the mean burning rate differs from 50 centimeters per second, based on a sample of 25 measurements. In fact, there is strong evidence that the mean burning rate exceeds 50 centimeters per second.

TESTS FOR MEAN - UN KNOWN VARIANCE: (t - Test)

Let X1, X2, … , Xn is a random sample drawn from a normal population with unknown variance σ2. Using sample data, conduct a one-sample t-test.

Calculate the test statistic

where t has a t-distribution with (n-1) degrees of freedom. Compare the t calculated value with and which are the critical values from normal distribution

corresponding to α and α/2 probabilities representing one tail and two-tail tests. The decisions are given in the following table.

CaseNull

hypothesisAlternative hypothesis

Number of tails

Reject Null Hypothesis

1 μ = μ0 μ ≠ μ0 2

2 μ > μ0 μ < μ0 1

3 μ < μ0 μ > μ0 1

9

Page 10: Testing of Hypothesis

EXAMPLE

The increased availability of light materials with high strength has revolutionized the design and manufacture of golf clubs, particularly drivers. Clubs with hollow heads and very thin faces can result in much longer tee shots, especially for players of modest skills. This is due partly to the “spring – like effect” that the thin face imparts to the ball. Firing of golf ball at the head of the club and measuring the ratio of the outgoing velocity of the ball to the incoming velocity can quantify this spring like effect. The ratio of velocities is called the coefficient of restitution of the club. An experiment was performed in which 15 drivers produced by a particular club maker were selected at random and their coefficients of restitution measured. In the experiment the golf balls were fired from air cannon so that the incoming velocity and spin rate of the ball could be precisely controlled. It is of interest to determine if there is evidence (with α = 0.05) to support a claim that the mean coefficient of restitution exceeds 0.82. The observations are:

0.8411 0.8191 0.8182 0.8125 0.87500.8580 0.8532 0.8483 0.8276 0.7983

0.8042 0.8730 0.8282 0.8359 0.8660

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is the mean coefficient of restitution, μ.2. H0: μ = 0.82 3. H1: μ > 0.82. We want to reject H0 if the mean coefficient of restitution

exceeds 0.824. α = 0.055. The test statistics is

6. Reject H0 if t0 ≥ 1.761 (t0.05, 14) 7. Computations: Since = 0.83725, s = 0.02456, μ0 = 0.82 and n = 15 we

have,

8. Conclusion: Since t0 = 2.72 > 1.761 (t0.05, 14), we reject H0: μ = 0.82 at the 0.05 level of significance that the mean coefficient of restitution exceeds 0.82 based on a sample of 15 measurements.

Note: In case of a two tail test –

i. when the alternative is of the type H1: μ ≠ 0.82 the conclusion would be to reject H0: μ = 0.82 at the 0.05 level of significance, since t0 = 2.72 > 2.145 (t0.025). Stated more completely, we conclude that the mean coefficient of restitution is not equal to 0.82 based on a sample of 15 measurements

10

Page 11: Testing of Hypothesis

In case of an one tail test –ii. when the alternative is of the type H1: μ < 0.82, the conclusion would be

do not reject H0: μ = 0.82 at the 0.05 level of significance, since t0 = 2.72 > - 1.761 ( -t0.05, 14),. Stated more completely, we conclude that the mean coefficient of restitution exceeds 0.82 based on a sample of 15 measurements

PROBLEMS:

1. An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. Suppose a simple random sample of 50 engines is tested. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance. (Assume that run times for the population of engines are normally distributed.)

2. Bon Air Elementary School has 300 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01.

11

Page 12: Testing of Hypothesis

TESTS FOR A POPULATION PROPORTION

Suppose that a random sample of size ‘n’ is drawn from a large population and that X(≤n) observations in this sample belong to a specified class of interest. Then = X/n

is a point estimator of the proportion ‘p’ of the population that belongs to this class. Note that n and p are parameters of binomial distribution. When n is relatively large and if p is not too close to either 0 (zero) or 1 (one), then is approximately normal with mean

p and variance p(1- p)/n. For this approximation we require that np and n(1-p) be greater than 5.

We will consider testing the following hypothesis

Case Null hypothesis Alternative hypothesis Number of tails

1 p= p0 p≠ p0 2

2 p > p0 p < p0 1

3 p < p0 p > p0 1

Let X be the number of observations in a random sample of size ‘n’ that belongs to the class associated with p. Using sample data, conduct a one-sample Z-test as follows:

Calculate the test statistic

where Z has the standard normal distribution. Compare the Z calculated value with

and which are the critical values from normal distribution corresponding to α and

α/2 probabilities representing one tail and two-tail tests. The decisions are given in the following table.

CaseNull

hypothesisAlternative hypothesis

Number of tails

Reject Null Hypothesis

1 p = p0 p ≠ p0 2

2 p > p0 p < p0 1

3 p < p0 p > p0 1

The first case of hypotheses is an example of a two-tailed test, since an extreme value on either side of the sampling distribution would cause to reject the null hypothesis. The other two cases of hypotheses are one-tailed tests, since an extreme value on only one side of the sampling distribution would cause a researcher to reject the null hypothesis. EXAMPLE:

12

Page 13: Testing of Hypothesis

A semiconductor manufacturer produces controllers used in automobile engine applications. The customer requires that the process fraction defective at a critical manufacturing step not exceed 0.05 and that the manufacturer demonstrate process capability at this level of quality using α = 0.05. The semiconductor manufacturer takes a random sample of 200 devices and finds that 4 of them are defective. Can the manufacturer demonstrate process capability for the customer?

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is the process fraction defective p.2. H0: p = 0.053. H1: p < 0.05; (this formulation of the problem will allow the manufacturer to make

a strong claim about process capability if null hypothesis H0: p = 0.05 is rejected). 4. α = 0.055. The test statistics is

6. Reject H0 if Z ≤ -1.96. Note that this results from step 4, where we specified α = 0.05 and so the boundary of the critical region is from normal

distribution tables.7. Computations: Since X=4, n=200, p0= 0.05 and = 0.02

8. Conclusion: Since Z0 = -1.95 < -1.645 (Zα), we reject H0: p = 0.05 at the 0.05 level of significance and conclude that the process fraction defective is less than 0.05. Hence the process is capable.

PROBLEMS:

1. The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are very satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. Among the sampled customers, 73 percent say they are very satisfied. Based on these findings, can we reject the CEO's hypothesis that 80% of the customers are very satisfied? Use a 0.05 level of significance.

2. The previous problem is stated a little bit differently. Suppose the CEO claims that at least 80 percent of the company's 1,000,000 customers are very satisfied. Again, 100 customers are surveyed using simple random sampling. The result: 73 percent are very satisfied. Based on these results, should we accept or reject the CEO's hypothesis? Assume a significance level of 0.05.

13

Page 14: Testing of Hypothesis

TESTS FOR DIFFERENCE BETWEEN TWO POPULATION PROPORTIONS

We now consider the case where there are two binomial parameters of interest, say p1

and p2. Suppose that two independent random samples of sizes n1 and n2 are taken from two populations, and let X1 and X2 represent the number of observations that belong to the class of interest in samples or 1 and 2, respectively. Let,

,

Then we test the following hypothesis:

Case Null hypothesis Alternative hypothesis Number of tails

1 p1 = p2

orp1 - p2 = 0

p1 ≠ p2 or

p1 - p2 ≠ 0 2

2 p1 ≥ p2

or p1 - p2 ≥ 0

p1 < p2 or

p1 - p2 < 0 1

3 p1 ≤ p2

orp1 - p2 ≤ 0

p1 > p2 or

p1 - p2 > 0 1

Calculate the test statistic

where Z has the standard normal distribution.

Under null hypothesis the test statistic becomes

Compare the calculated value of Z0 with and which are the critical values from

normal distribution corresponding to α and α/2 probabilities, representing one tail and two-tail tests. The decisions are given in the following table.

CaseNull

hypothesisAlternative hypothesis

Number of tails

Reject Null Hypothesis

14

Page 15: Testing of Hypothesis

1 p1 = p2

orp1 - p2 = 0

p1 ≠ p2 or

p1 - p2 ≠ 0 2

2 p1 ≥ p2

or p1 - p2 ≥ 0

p1 < p2 or

p1 - p2 < 0 1

3 p1 ≤ p2

orp1 - p2 ≤ 0

p1 > p2 or

p1 - p2 > 0 1

EXAMPLE

Suppose the Acme Drug Company develops a new drug, designed to prevent colds. The company states that the drug is equally effective for men and women. To test this claim, they choose a simple random sample of 100 women and 200 men from a population of 100,000 volunteers. At the end of the study, 38% of the women caught a cold; and 51% of the men caught a cold. Based on these findings, can we reject the company's claim that the drug is equally effective for men and women? Use a 0.05 level of significance.

We may solve this problem by following the eight-step procedure outlined as follows. This is a two tail test.

1. The parameters of interest are p1 and p2 the proportion of men and women who caught a cold.

2. H0: p1 = p2

3. H1: p1 ≠ p2 4. α = 0.055. The test statistics is

where, , , n1=200 and n2 = 100

6. Reject H0 if Z ≥ 1.96 or if Z ≤ -1.96. Note that this results from step 4, where we specified α = 0.05 and so the boundaries of the critical region are at

and from normal distribution tables.

7. Computations: The value of the test statistics is,

15

Page 16: Testing of Hypothesis

8. Conclusion: Since Z0 = 2.17 > 1.96 (Zα/2), we reject H0: p1 = p2 at the 0.05 level of significance. Hence we reject the claim that the drug is equally effective for men and women.

Note: In case of one tail tests –

i. when the alternative is of the type H1: p1 > p2 the conclusion would be to reject H0: p1 = p2 at the 0.05 level of significance, since Z0 = 2.17 > 1.64 (Zα) and conclude that the drug is more effective for men than women.

ii. when the alternative is of the type H1: p1 < p2 the conclusion would be do not reject H0: p1 = p2 at the 0.05 level of significance, since Z0 = 2.17 > - 1.64 (Zα) and conclude that the drug is more effective for men than women.

TESTS FOR DIFFERENCE IN TWO MEANS - KNOWN VARIANCE: (Z- Test)

Let us see how to conduct a hypothesis test for the difference between two means from two independent populations with means μ1 and μ2 and variances σ1

2 and σ22

respectively. Inferences will be based on two random samples of sizes n1 and n2

respectively.

Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with mean μ1 and variance σ1

2 and X21, X22, … , X2n2 is a random sample of n2 observations from the population with mean μ2 and variance σ2

2. Assume that both populations are independent and normal.

Then the random variable,

has a N(0,1) distribution.

We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various hypotheses are stated in the following table.

Case Null hypothesis Alternative hypothesis Number of tails1 μ1 = μ 2 μ 1 ≠ μ 2 2

16

Page 17: Testing of Hypothesis

orμ 1 - μ 2 = 0

orμ 1 - μ 2 ≠ 0

2μ 1 ≥ μ 2

orμ 1 - μ 2 ≥ 0

μ 1 < μ 2

orμ 1 - μ 2 < 0

1

3μ 1 ≤ μ 2

orμ 1 - μ 2 ≤ 0

μ 1 > μ 2

orμ 1 - μ 2 > 0

1

Given the two sample observations, calculate the sample means and . Under the

null hypothesis the test statistic becomes,

Compare the calculated value of Z0 with and which are the critical values from

normal distribution corresponding to α and α/2 probabilities, representing one tail and two-tail tests. The decisions are given in the following table.

Case Null hypothesis

Alternative hypothesis

Number of tails

Reject Null Hypothesis

1μ1 = μ 2

orμ 1 - μ 2 = 0

μ 1 ≠ μ 2

orμ 1 - μ 2 ≠ 0

2

2μ 1 ≥ μ 2

orμ 1 - μ 2 ≥ 0

μ 1 < μ 2

orμ 1 - μ 2 < 0

1

3μ 1 ≤ μ 2

orμ 1 - μ 2 ≤ 0

μ 1 > μ 2

orμ 1 - μ 2 > 0

1

EXAMPLE:

A product developer is interested in reducing the drying time of primer paint. Two formulations of the paint are tested. Formulation-1 is the standard chemistry and Formulation-2 has a new drying ingredient that should reduce the drying time. From experience it is known that the standard deviation of drying time is 8 minutes and this inherent variability should be unaffected by the addition of the new ingredient. Ten specimens are painted with Formulation-1 and another 10 specimens are painted with Formulation-2; the 20 specimens are painted in random order. The two sample average

17

Page 18: Testing of Hypothesis

drying times are = 121 minutes and = 112 minutes respectively. What conclusions

can be the product developer draw about the effectiveness of the new ingredient, using α = 0.05.

We may solve this problem by following the eight-step procedure outlined as follows.

1. The parameter of interest is the difference in mean drying times, μ 1 - μ 2

2. H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2

3. H1: μ 1 - μ 2 > 0, (or) μ 1 > μ 2. We want to reject H0 if the new ingredient reduces the drying time.

4. α = 0.055. The test statistics is

6. Reject H0 if Z0 ≥ 1.645 (Zα). Note that this results from step 4, where we specified α = 0.05 and so the boundaries of the critical region are at from

normal distribution tables.7. Computations: Since =121 minutes, =112 minutes, σ1

2 =σ22 = 82 = 64

minutes and n1= n2 = 10, the value of the test statistics is,

8. Conclusion: Since Z0 = 2.52 > 1.645, we reject H0: μ 1 - μ 2 = 0 at the 0.05 level of significance and conclude that adding the new ingredient to the paint significantly reduces the drying time.

Note: (i) In case of a two tail test – when the alternative hypothesis is of the type H1:

μ 1 ≠ μ2, (or) H1: μ 1 - μ 2 ≠ 0 the conclusion would be to reject H0: μ 1 - μ 2 = 0 at the 0.05 level of significance, since Z0 = 2.52 > 1.96 (Zα/2). That is the mean drying time is significantly different for the two types of primers.

(ii) In case of an one tail test – when the alternative hypothesis is of the type H1: μ 1 < μ 2 (or), H1: μ 1 - μ 2 < 0, the conclusion would be do not reject H0: μ 1 - μ 2 = 0 at the 0.05 level of significance, since Z0 = 2.52 > - 1.645 and conclude that adding the new ingredient to the paint significantly reduces the drying time.

TESTS FOR DIFFERENCE IN TWO MEANS - UN KNOWN VARIANCE: (t- Test)

18

Page 19: Testing of Hypothesis

Let us see how to conduct a hypothesis test for the difference between two means from two independent populations with means μ1 and μ2 and unknown variances σ1

2 and σ22

respectively. Here it is assumed that σ12 = σ2

2 = σ2. That is the variances of the two normal populations are unknown but are equal. Inferences will be based on two random samples of sizes n1 and n2 respectively.

Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with mean μ1 and X21, X22, … , X2n2 is a random sample of n2 observations from the population with mean μ2 and common unknown variance σ2. Assume that both populations are independent and normal.

The pooled estimator of the common variance σ2 from the samples is

, where and

Then the random variable,

has t-distribution with n1+n2-2 degrees of freedom.

We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various hypotheses are stated in the following table.

Case Null hypothesis Alternative hypothesis Number of tails

1μ1 = μ 2

orμ 1 - μ 2 = 0

μ 1 ≠ μ 2

orμ 1 - μ 2 ≠ 0

2

2μ 1 ≥ μ 2

orμ 1 - μ 2 ≥ 0

μ 1 < μ 2

orμ 1 - μ 2 < 0

1

19

Page 20: Testing of Hypothesis

3μ 1 ≤ μ 2

orμ 1 - μ 2 ≤ 0

μ 1 > μ 2

orμ 1 - μ 2 > 0

1

Given the two sample observations, calculate the sample means , and the pooled

estimate sp. Under the null hypothesis the test statistic becomes,

Compare the calculated value of t0 with and which are the critical values from

normal distribution corresponding to α and α/2 probabilities, representing one tail and two-tail tests. The decisions are given in the following table.

Case Null hypothesis

Alternative hypothesis

Number of tails

Reject Null Hypothesis

1μ1 = μ 2

orμ 1 - μ 2 = 0

μ 1 ≠ μ 2

orμ 1 - μ 2 ≠ 0

2

2μ 1 ≥ μ 2

orμ 1 - μ 2 ≥ 0

μ 1 < μ 2

orμ 1 - μ 2 < 0

1

3μ 1 ≤ μ 2

orμ 1 - μ 2 ≤ 0

μ 1 > μ 2

orμ 1 - μ 2 > 0

1

EXAMPLE:

Two catalysts are being analyzed to determine how they affect the mean yield of a chemical process. Specifically, catalyst-1 is currently in use, but catalyst-2 is acceptable. Since catalyst-2 cheaper, it should be adopted, if it does not change the process yield. A test is run in the pilot plant and the results are shown in the table below. Is there any difference between the mean yields? Use α = 0.05, and assume equal variances.

Observation Number Catalyst – 1 Catalyst – 2

1 91.50 89.192 94.18 90.953 92.18 90.464 95.39 93.215 91.79 97.196 89.07 97.047 94.72 91.078 89.21 92.75

20

Page 21: Testing of Hypothesis

From the data it can be calculated that = 92.255, = 92.733, s1 = 2.39 and s2 = 2.98.

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest are μ 1 and μ 2, and we want to know if μ 1 - μ 2=0.2. H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2

3. H1: μ 1 - μ 2 ≠ 0, (or) μ 1 ≠ μ 2. 4. α = 0.055. The test statistics is

6. Reject H0 if t0 > 2.145 ( = t0.025,14) (or) if t0 < -2.145 (= -t0.025,14). Note that this results from step 4, where we specified α = 0.05 and so the boundaries of the critical region are at t0.025,14 = 2.145 from t-distribution tables (two tail).

7. Computations: We have = 92.255, = 92.733, s1 = 2.39, s2 = 2.98 and n1= n2

= 8. Therefore,

And sp = √7.30 = 2.70

Hence the value of test statistic is,

8. Conclusion: Since t0 = -0.35, we have -2.145 < t0 = -0.35 < 2.145, we do not reject H0: μ 1 - μ 2 = 0 at the 0.05 level of significance and conclude that there is no strong evidence that catalyst-2 results in a mean yield that differs from the mean yield when catalyst-1 is used.

Note: In case of one tail tests –

i. when the alternative is of the type H1: μ1 - μ2 > 0 (or) H1: μ1 > μ2 the conclusion would be do not reject H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2 at the 0.05 level of significance, since t0 = -0.35 < 1.761 (=t0.05,14). We conclude that the mean yield of Catalyst-1 is not significantly greater than the mean yield when catalyst-2 is used.

21

Page 22: Testing of Hypothesis

ii. when the alternative is of the type H1: μ1- μ2 < 0 (or) H1: μ1 < μ2 the conclusion would be do not reject H0: μ 1 - μ 2 = 0, (or) μ 1 = μ 2 at the 0.05 level of significance, since t0 = -0.35 > -1.761 (= -t0.05,14). We conclude that the mean yield of Catalyst-1 is not significantly less than the mean yield when catalyst-2 is used.

PROBLEMS

1. Within a school district, students were randomly assigned to one of two Math teachers - Mrs. Smitha and Mrs. Lakshmi. After the assignment, Mrs. Smitha had 30 students, and Mrs. Lakshmi had 25 students. At the end of the year, each class took the same standardized test. Mrs. Smitha’s students had an average test score of 78, with a standard deviation of 10; and Mrs. Lakshmi 's students had an average test score of 85, with a standard deviation of 15. Test the hypothesis that Mrs. Smitha and Mrs. Lakshmi are equally effective teachers. Use a 0.10 level of significance. (Assume that student performance is approximately normal.)

2. The Acme Company has developed a new battery. The engineer in charge claims that the new battery will operate continuously for at least 7 minutes longer than the old battery. To test the claim, the company selects a simple random sample of 100 new batteries and 100 old batteries. The old batteries run continuously for 190 minutes with a standard deviation of 20 minutes; the new batteries, 200 minutes with a standard deviation of 40 minutes. Test the engineer's claim that the new batteries run at least 7 minutes longer than the old. Use a 0.05 level of significance.

22

Page 23: Testing of Hypothesis

PAIRED t-TEST

The paired t-test is generally used when measurements are taken from the same subject before and after some manipulation such as injection of a drug. For example, ya paired t test can be used to determine the significance of a difference in blood pressure before and after administration of an experimental drug. Paired t-test may also be used to compare samples that are subjected to different conditions, provided the samples in each pair are identical otherwise. For example, we might test the effectiveness of a water additive in reducing bacterial numbers by sampling water from different sources and comparing bacterial counts in the treated versus untreated water sample. Each different water source would give a different pair of data points.

The number of points in each data set must be the same, and they must be organized in pairs, in which there is a definite relationship between each pair of data points. Clearly for paired t-test, the data is dependent, i.e. there is a one-to-one correspondence between the values in the two samples. For example, same subject measured before and after a process change or same subject measured at different times.

Let (X11, X21), (X12, X22), … , (X1n, X2n) be a set of n paired observations of a sample drawn from two populations with means μ1 and μ2 and variances σ1

2 and σ22 respectively.

Define the differences between each pair of observations as D j = X1j - X2j, j = 1,2, … , n. Then Dj’’s are assumed to be normally distributed with mean μD = μ1 - μ2 and variance σD

2. Hence testing hypothesis about the difference between μ1 and μ2 can be accomplished by performing a one-sample t-test on μD.

Then, has a t-distribution with (n-1) degrees of freedom. An estimator of σD2

is given by where di = x1j-x2j and

We now consider hypothesis testing on the difference in the means, μ1 - μ2. The various hypotheses are stated in the following table.

Case Null hypothesis

Alternative hypothesisNumber of

tails1 μD = 0 μD ≠ 0 2

2 μD ≥ 0 μD < 0 1

3 μD ≤ 0 μD > 0 1

23

Page 24: Testing of Hypothesis

Given the pairs of sample observations, calculate and sD2. Under the null hypothesis

the test statistic becomes,

where t0 has a t-distribution with (n-1) degrees of freedom. Compare the t0 calculated value with and which are the critical values from normal distribution corresponding

to α and α/2 probabilities representing one tail and two-tail tests. The decisions are given in the following table.

CaseNull

hypothesisAlternative hypothesis

Number of tails

Reject Null Hypothesis

1 μD = 0 μD ≠ 0 2

2 μD ≥ 0 μD < 0 1

3 μD ≤ 0 μD > 0 1

EXAMPLE:

The following data refers to Strength predictions for nine Steel Plate Girders by Karlsruhe and Lehigh Methods. Test whether there is any significant difference between the two methods.

Girder Karlsruhe Method Lehigh Method Difference dj

1 1.186 1.061 0.1192 1.151 0.992 0.1593 1.322 1.063 0.2594 1.339 1.062 0.2775 1.200 1.065 0.1386 1.402 1.178 0.2247 1.365 1.037 0.3288 1.537 1.086 0.4519 1.559 1.052 0.507

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest is the difference in mean strength between the two methods, say, μD = μ1 - μ 2 = 0.

2. H0: μD = 0 3. H1: μD ≠ 0 4. α = 0.055. The test statistics is

24

Page 25: Testing of Hypothesis

6. Reject H0 if t0 > 2.306 ( = t0.025,8) (or) if t0 < -2.306 (= -t0.025,8). Note that this results from step 4, where we specified α = 0.05 and so the boundaries of the critical region are at t0.025,8= 2.306 from t-distribution tables (two tail).

7. Computations: We have = 0.2736, and sD = 0.1356 and n = 9. Therefore,

8. Conclusion: Since t0 = 6.05 > 2.306 ( = t0.025,8) we reject H0: μD = 0 at the 0.05 level of significance and conclude that the strength prediction methods yield different results.

Note: In case of one tail tests –

iii. when the alternative is of the type H1: μD > 0 the conclusion would be to reject H0: μD = 0, at the 0.05 level of significance, since t0 = 6.05 > 1.860 (=t0.05,8). Specifically, the data indicate that the Karlsruhe Method produces, on the average higher strength predictions than does the Lehigh Methods.

iv. when the alternative is of the type H1: μD < 0 the conclusion would be do not reject H0: μD = 0, at the 0.05 level of significance, since t0 = 6.05 > -1.860 (=-t0.05,8). Specifically, the data indicate that the Karlsruhe Method produces, on the average higher strength predictions than does the Lehigh Methods.

TEST FOR SINGLE VARIACNE

Suppose that we wish to test the hypothesis that the variance of a normal population σ2

equals a specified value, say σ02 or equivalently, that the standard deviation σ is equal

σ0. Let X1, X2, … , Xn be a random sample of n observations from this population.

The table below shows three sets of hypotheses for testing the variance. Each makes a statement about how the population variance, σ2 is related to a specified value σ0

2.

Case Null hypothesis Alternative hypothesis Number of tails1 σ2 = σ0

2 σ2 ≠σ02 2

2 σ2 ≥ σ02 σ2 < σ0

2 13 σ2 ≤ σ0

2 σ2 > σ02 1

We use the test statistic .

25

Page 26: Testing of Hypothesis

Under the null hypothesis H0: σ2 = σ02, the statistic has chi-square

distribution with (n-1) degrees of freedom.

Given a sample of observations calculate, the sample variance.

Under the null hypothesis the test statistic becomes,

where has a chi-square distribution with (n-1) degrees of freedom. Compare the

calculated value with and which are the critical values from chi-square distribution

corresponding to α and α/2 probabilities representing one tail and two-tail tests. The decisions are given in the following table.

CaseNull

hypothesisAlternative hypothesis

Number of tails

Reject Null Hypothesis

1 σ2 = σ02 σ2 ≠σ0

2 2

or ≤

2 σ2 ≥ σ02 σ2 < σ0

2 1 ≤

3 σ2 ≤ σ02 σ2 > σ0

2 1 ≥

EXAMPLE:

An automatic filling machine is used to fill bottles with liquid detergent. A random sample of 20 bottles results in a sample variance of fill volume of s2 =0.0153 (fluid ounces)2. If the variance of fill volume exceeds 0.01 (fluid ounces)2, an unacceptable portion of bottles will be underfilled or overfilled. Is there evidence in the sample data to suggest that the manufacturer has a problem with underfilled or overfilled bottles? Use α = 0.05 and assume that fill volume has a normal distribution.

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest is the population variance, σ2

2. H0: σ2 = 0.01 3. H1: σ2 > 0.01 4. α = 0.055. The test statistics is

26

Page 27: Testing of Hypothesis

6. Reject H0 if > 30.14 (= ). Note that this results from step 4, where we

specified α = 0.05 and so the critical region is at = 30.14 from chi-square

distribution tables (one tail).7. Computations: We have s2 =0.0153. Therefore,

8. Conclusion: Since = 29.07 < 30.14 ( = ) we do not reject the null

hypothesis H0: σ2 = 0.01 at the 0.05 level of significance and conclude that there is no strong evidence that the variance of fill volume exceeds 0.01(fluid ounces)2.

Note: (i) In case of a two tail test – when the alternative hypothesis is of the type H1: σ2 ≠ 0.01 the conclusion would be do not reject H0: σ2 = 0.01 at the 0.05 level of significance since, = 29.07 < 32.85 ( ) and

= 29.07 > 8.91( ) and conclude that there is no strong evidence

that the variance of fill volume equals 0.01(fluid ounces)2.

(ii) In case of an one tail test – when the alternative hypothesis is of the type H1: σ2 < 0.01, the conclusion would be do not reject the null hypothesis H0: σ2 = 0.01 at the 0.05 level of significance, since

= 29.07 > 10.12 (= ) and conclude that there is no strong

evidence that the variance of fill volume exceeds 0.01(fluid ounces)2.

TEST FOR EQUALITY OF VARIACNES

Suppose that two independent normal populations are of interest, where the population means and variances, say, μ1, σ1

2 , μ2, and σ22 are unknown. We wish to test the

hypothesis about the equality of two variances, say, H0: σ12 = σ2

2. Assume that two random samples of sizes n1 and n2 from the two populations respectively, and let s1

2 and s2

2 be the respective samples variances based on the two samples.

27

Page 28: Testing of Hypothesis

The null and alternative hypotheses are given in the following table.

Case Null hypothesis Alternative hypothesis Number of tails1 σ1

2 = σ22 σ1

2 ≠ σ22 2

2 σ12 ≥ σ2

2 σ12 < σ2

2 13 σ1

2 ≤ σ22 σ1

2 > σ22 1

Let X11, X12, … , X1n1 is a random sample of n1 observations from the population with mean μ1 and variance σ1

2 and X21, X22, … , X2n2 is a random sample of n2 observations from the population with mean μ2 and variance σ2

2. Assume that both populations are independent and normal. Let s1

2 and s22 be the respective samples variances based on

the two samples.

Define the ratio, . This F statistic has F-distribution with (n1-1) numerator

degrees of freedom and (n2-1) denominator degrees of freedom.

Under the hull hypothesis the test statistic becomes, .

Compare the calculated value of F0 with Fα and Fα/2 which are the critical values from F-distribution against [(n1-1), (n2-1)] degrees of freedom corresponding to α and α/2 probabilities representing one tail and two-tail tests. The decisions are given in the following table.

CaseNull

hypothesisAlternative hypothesis

Number of tails

Reject Null Hypothesis

1 σ12 = σ2

2 σ12 ≠ σ2

2 2F0 ≥ Fα/2

orF0 ≤ F1-α/2

2 σ12 ≥ σ2

2 σ12 < σ2

2 1 F0 ≤ F1-α

3 σ12 ≤ σ2

2 σ12 > σ2

2 1 F0 ≥ Fα

EXAMPLE:

In comparing the variability of the tensile strength of two kinds of structural steel, an experiment yielded the following results: n1 = 13, s1

2 = 19.2, n2 = 16 and s22 = 3.5, where

the units of measurements are thousand pounds per square inch. Assuming that the measurements constitute independent random samples from two normal populations, test the null hypothesis σ1

2 = σ22 against the alternative σ1

2 ≠ σ22 at α = 0.02 level of

significance.

The problem is solved using the eight-step hypothesis testing procedure as follows.

1. The parameters of interest are σ12 and σ2

2

28

Page 29: Testing of Hypothesis

2. H0: σ12 = σ2

2

3. H1: σ12 ≠ σ2

2

4. α = 0.055. The test statistics is

6. Reject H0 if F0 ≥ 2.96 (=Fα/2) or F0 ≤ 0.350 (=F1-α/2). Note that this results from step 4, where we specified α = 0.02 (two tail).

7. Computations: We have s12 = 19.2, and s2

2 = 3.5. Therefore, F0 = (19.2/3.5) = 5.49.

8. Conclusion: F0 = 5.49 > 2.96 (=Fα/2) the conclusion is to reject the null hypothesis H0: σ1

2 = σ22 and conclude that the variability of the tensile strength

of the two kinds of steel is not the same.

Note: In case of one tail tests – i. when the alternative is of the type H1: σ1

2 > σ22 the conclusion would be to

reject H0: σ12 ≤ σ2

2at the 0.05 level of significance, since F0 = 5.49 > 2.40 (Fα) and conclude that the variability of the tensile strength of the first kind of steel is greater than the variability of the tensile strength of the second kind of steel.

ii. when the alternative is of the type H1: σ12 < σ2

2 the conclusion would be do not reject H0: σ1

2 ≥ σ22at the 0.05 level of significance, since F0 = 5.49 >

0.371 (F1-α) and conclude that the variability of the tensile strength of the first kind of steel is greater than the variability of the tensile strength of the second kind of steel.

CHI-SQUARE TEST FOR GOODNESS OF FIT

Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed value of a given phenomena is significantly different from the expected value. In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed sample distribution with the expected probability distribution. Chi-Square goodness of fit test determines how well theoretical distribution (such as normal, binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness of fit test, sample data is divided into intervals. Then the numbers of points that fall into the interval are compared, with the expected numbers of points in each interval.

29

Page 30: Testing of Hypothesis

PROCEDURE FOR CHI-SQUARE GOODNESS OF FIT TEST

1. Set up the hypothesis for Chi-Square goodness of fit test:

a.   Null hypothesis: In Chi-Square goodness of fit test, the null hypothesis assumes that there is no significant difference between the observed and the expected value. In other words, the data follows a specified distribution.

b.   Alternative hypothesis: In Chi-Square goodness of fit test, the alternative hypothesis assumes that there is a significant difference between the observed and the expected value. In other words, the data does not follow a specified distribution.

2. Compute the value of Chi-Square goodness of fit test using the following formula:

 

where, = Chi-Square goodness of fit test statistic, O= observed value and E= expected value.

The test statistic follows, approximately, a distribution with (k - c) degrees of freedom where k is the number of non-empty cells and c = the number of parameters to be estimated + 1

3. Degree of freedom: In Chi-Square goodness of fit test, the degree of freedom depends on the distribution of the sample. The following table shows the distribution and an associated degree of freedom:

Type of distribution c Degree of freedomBinominal distribution(if p is estimated)

2 n-2

Poisson distribution 2 n-2Normal distribution 3 n-3

 

4. Hypothesis testing: Hypothesis testing in Chi-Square goodness of fit test is the same as in other tests, like Z- test, t-test, etc. The calculated value of Chi-Square goodness of fit test is compared with the table value corresponding to (k-c) degrees of freedom and at α level of significance. If the calculated value of Chi-Square goodness of fit test is greater than the table value, we will reject the null hypothesis and conclude that there is a significant difference between the observed and the expected frequency. If the calculated value of Chi-Square goodness of fit test is less than the table value, we will accept the null hypothesis

30

Page 31: Testing of Hypothesis

and conclude that there is no significant difference between the observed and expected value.

EXAMPLES

1. For example, in 200 flips of a coin, one would expect 100 heads and 100 tails. But what if 92 heads and 108 tails are observed? Would we reject the hypothesis that the coin is fair? Or would we attribute the difference between observed and expected frequencies to random fluctuation?

Solution:

Null hypothesis: The frequency of heads is equal to the frequency of tails.Alternative hypothesis: The frequency of heads is not equal to the frequency of tails.The expected frequencies in each of the two categories (heads or tails) are not independent. To obtain the expected frequency of tails (100), we need only to subtract the expected frequency of heads (100) from the total frequency (200), or 200 - 100 = 100. Thus, given the expected frequency in one of the categories, the expected frequency in the other is readily determined. In other words, only the expected frequency in one of the two categories is free to vary; that is, there is only 1 degree of freedom.

The calculation of the statistic is shown in the table below.

Face O E O-E (O-E)2 (O-E)2/EHeads 92 100 - 8 64 0.64Tails 108 100 8 64 0.64Total 200 200 0 = 1.28

Conclusion:The critical values of for 1 degree of freedom, with α= .05 and α= .01 are 3.841 and 6.635, respectively. As the calculated value of is less than the table value at both α= .05 and α= .01 levels of significance we do not reject the null hypothesis and conclude that the coin is fair. That is, frequency of heads is equal to the frequency of tails.

2. Suppose we hypothesize that we have an unbiased six-sided die. To test this hypothesis, we roll the die 300 times and observe the frequency of occurrence of each of the faces. Because we hypothesized that the die is unbiased, we expect that the number on each face will occur 50 times. However, suppose we observe frequencies of occurrence as follows:

Face 1 2 3 4 5 6

31

Page 32: Testing of Hypothesis

ValueOccurrence

42 55 38 57 64 44

What would we conclude? Is the die biased, or do we attribute the difference to random fluctuation? Solution:

Null hypothesis: The die is fair. In other words, the frequency of occurrence of each of the six faces of the die is the same.

Alternative hypothesis: The die is not fair. In other words, the frequency of occurrence of each of the six faces of the die is not the same.

There are six possible categories of outcomes: the occurrence of the six faces. Under the assumption that the die is fair, we would expect that the frequency of occurrence of each of the six faces of the die would be 50. Note again that the expected frequencies in each of these categories are not independent. Once the expected frequency for five of the categories is known, the expected frequency of the sixth category is uniquely determined, since the total frequency equals 300. Thus, only the expected frequencies in five of the six categories are free to vary; there are only 5 degrees of freedom associated with this example.

The calculation of the statistic is shown in the table below.

Face Value

O E O-E (O-E)2 (O-E)2/E

1 42 50 - 8 64 1.28 2 55 50 5 25 0.5 3 38 50 -12 144 2.884 57 50 7 49 0.985 64 50 14 196 3.926 44 50 -6 36 0.72

Total 300 300 0 10.28 =

Conclusion:The critical values of for 5 degree of freedom, with α= .05 and α= .01 are 11.070 and 15.086, respectively. As the calculated value of

is less than the table value at both α= .05 and α= .01 levels of significance we do not reject the null hypothesis and conclude that the die is fair. That is, the frequency of occurrence of each of the six faces of the die is the same.

32

Page 33: Testing of Hypothesis

3. The president of a major University hypothesizes that at least 90 percent of the teaching and research faculty will favor a new university policy on consulting with private and public agencies within the state. Thus, for a random sample of 200 faculty members, the president would expect 0.90 x 200 = 180 to favor the new policy and 0.10 x 200 = 20 to oppose it. Suppose, however, for this sample, 168 faculty members favor the new policy and 32 oppose it. Is the difference between observed and expected frequencies sufficient to reject the president's hypothesis that 90 percent would favor the policy? Or would the differences be attributed to chance fluctuation?Solution:

Null hypothesis: The faculty favouring the new policy is 90 percent.

Alternative hypothesis: The faculty favouring the new policy is not 90 percent.

The expected number of faculty members who oppose it (20) can be found by subtracting the expected number who supports it (180) from the total number in the sample (200), or 200 - 180 = 20. Thus, given the expected frequency in one of the categories, the expected frequency in the other is readily determined. In other words, only the expected frequency in one of the two categories is free to vary; that is, there is only 1 degree of freedom.

The calculation of the statistic is shown in the table below.

Disposition O E O-E (O-E)2 (O-E)2/EFavour 168 180 - 12 144 0.80Oppose 32 20 12 144 7.20

Total 200 200 0 8.00 =

Conclusion:The critical values of for 1 degree of freedom, with α= .05 and α= .01 are 3.841 and 6.635, respectively. As the calculated value of is greater than the table value at both α= .05 and α= .01 levels of significance we reject the null hypothesis and conclude that the faculty favouring the new policy is not 90 percent.

33

Page 34: Testing of Hypothesis

CHI-SQUARE TEST FOR INDEPENDENCE OF ATTRIBUTES

CONTINGENCY TABLES

A frequency table in which a sample is classified according to the distinct classes of two different attributes is called a contingency table. It is often of interest to test the hypothesis that, in the population from which the sample was drawn, the two attributes are independent. An mxn contingency table has m rows and n columns.

A typical mxn contingency table is given below:

Rows(Attribute 2)

Columns ( Attribute 1)1 2 ... j … n Total

1 O11 O12 O1j O1n R1

2 O21 O22 O2j O2n R2

.i.

.Oi1.

.Oi2. Oij

.Oin.

.Ri

.m Om1 Om2 Omj Omn Rm

Total C1 C2 Cj Cn N

The expected frequencies are obtained as follows: The expected frequency of the cell corresponding to i-th row and j-th column is found by

Using the observed frequencies given in the contingency table and the expected frequencies found using the above formula, we may test the null hypothesis that,

H0: The two attributes are independent (or) the two attributes are not associated.

H1: The two attributes are not independent (or) the two attributes are associated.

The test statistic to test the above hypothesis is given by,

or simply for easy of understanding .

This statistic has chi-square distribution with (m-1)(n-1) degrees of freedom.

The decision is to reject the null hypothesis H0 If the calculated value of , say is

greater than the table value of at α level of significance corresponding to (m-1)(n-1) degrees of freedom.

34

Page 35: Testing of Hypothesis

EXAMPLE

The following data were collected in a study on the effectiveness of inoculation for a particular disease. The two attributes in this case are;

Attribute A: whether or not the person was inoculated; and

Attribute B: whether or not they contracted the disease

The 2x2 contingency table is

In this case the null hypothesis and alternative hypothesis are stated as,

H0: Contracting the disease is independent of inoculation

H1: Contracting the disease is not independent of inoculation

Expected Frequencies

Observed frequencies Disease No disease TotalInoculated 10 50 60Not Inoculated 30 40 70Total 40 90 130

If the null hypothesis is true then 40 people will contract the disease regardless of whether or not they were inoculated. i.e., under the null hypothesis Ho, the probability of contracting the disease is 40/130. Thus out of the 60 people inoculated we would expect (40 x 60)/130 =18.5 of them to contract the disease. The expected frequencies are therefore

The test statistic is

Attribute AAttribute B

Disease No diseaseInoculated 10 50Not Inoculated 30 40

Expected frequencies Disease No disease TotalInoculated 18.5 41.5 60Not Inoculated 21.5 48.5 70Total 40 90 130

35

Page 36: Testing of Hypothesis

The critical value for a 1% significance level with 1 d.f. is 6.63. The null hypothesis is therefore rejected at this level and it can be concluded that inoculation does have an effect on the probability of contracting the disease. From the contingency table it can be seen that inoculation reduces the risk.

NoteFor chi-squared tests expected frequencies should be at least 5.

PROBLEMS

1. An analysis of accident data was made to determine if the distribution of fatal accidents was dependent on the size of car involved. The following data were collected:

Small Medium LargeFatal 67 26 16Non-fatal 128 63 46

Test the hypothesis that the probability of a fatality is dependent on the size of car.

2. Low birth weight in babies is defined as weights below 2500 grams. The following table shows the number of low birth weight babies for three groups of mothers, non-smokers, smokers and ex-smokers. Do the results suggest that the smoking habits of the mother have an effect on birth weight?

Birth weight Non-smoker Smoker Ex-smoker< 2500 grams 140 153 27 2500 grams 2197 1510 433

ADDITIONAL PROBLEMS FOR PRACTICE

1. A soft drink manufacturer, situated in western India, wants to know whether there is a difference in product acceptance by sex groups. In a market survey, 58 percent of 200 men questioned liked the product and 50 percent of 150 women questioned liked the product. Is there a significant difference in product acceptance by men and women?

2. The president of the college has reported that the average age of evening students is 35 years. A random sample of 100 evening students was taken and it was found that the average of the sample was 34 years with a SD of 5 years. At 1 % level can we conclude that the resident’s claim is correct?

3. An educator claims that the average IQ of city college students is not more than 110. To test this claim, a random sample of 150 students was taken and gave

36

Page 37: Testing of Hypothesis

relevant test. Their average IQ score came to 11.2 with a standard deviation of 7.2. At level of significance 0.05 test is the claim of the educator is justified.

4. A sample of 450 items is taken from a population whose standard deviation is 20. Mean of the sample is 30. Test whether the sample has come from a population with mean 29 at 5% level of significance.

5. A random sample of 400 men and 600 women were asked whether they would like to have a flyover near their residence. 200 men and 325 were in favour of the proposal. Test the hypothesis that proportion of men and women in favour of the proposal are the same.

6. A manufacturer claimed that at least 95% of the equipment which he supplied to a factory conformed to specifications. An examination of a sample of 200 pieces of equipment revealed that 18 were faulty. Test his claim at a significance level of 5% and 1%.

7. A die was thrown 9000 times and through of 3 or 4 observed 3240 times. Show that the die can not be regarded as unbiased one.

8. In a sample of 600 men from a certain large city, 450 are found be smokers. In one of 900 from another large city 450 are smokers. Do the data indicate that the cities are significantly different with respect to the prevalence of smoking among men?

9. In a year there are 956 births in a town A of which 52.5% were male while in town A and B combined this proportion in a total of 1406 birth was 0.496. Is there any significance difference in the proportion of male births in the two towns?

10. In two large populations, there are 30 and 25% respectively for fair haired people. Is this difference likely to be hidden in samples of 1200 and 900 respectively from the two populations?

11.A machine puts out 16 imperfect articles in a sample of 500. After machine is overhauled, it puts out 3 imperfect articles in a batch of 100. Has the machine improved?

12.Two independent random samples of 30 and 40 individuals trained at two centers provide the examination scores in the following table:

Examination Score Results

Training centre A

Training Centre B

Sample size 1. 30 2. 40

Mean 3. 82 4. 78SD 5. 8 6. 10

37