CCCCOMPUTER &&&& DDDATA AAANALYSISNALYSISsite.iugaza.edu.ps/ssafi/files/2012/04/Exam-2007-Solution1.pdf · ا ا ا CCCCOMPUTER &&&& DDDATA AAANALYSISNALYSIS Theoretical Exam FINAL

�� ا ا�� ا��

CCCCOMPUTER OMPUTER OMPUTER OMPUTER &&&& DDDDATA ATA ATA ATA AAAANALYSISNALYSISNALYSISNALYSIS

Theoretical Exam

FINAL THEORETICAL EXAMINATION

Monday 15th

, 2007

Instructor: Dr. Samir Safi

Name:___________________________________ ID Number:_________________

Instructor:____________

INSTRUCTIONS:

1. Write your name, student ID and section number.

2. You have TWO hours.

3. This exam must be your own work entirely. You cannot talk to or share

information with anyone.

ABLETHIS T ON RITEWT 'ONDLEASE P

Student's Points

Total #7 #6 #5 #4 #3 #2 #1 Question

1

Point

2

Question #1: (20 Points) For each of the situations described below, state the statistical technique and the

sample(s) type that you believe is the most applicable.

Example: Two independent samples - t test.

1. As part of an attitude survey, a sample of men and women are asked to rate a number

of statements on a scale of 1 to 5, according to whether they agree or disagree. We wish

to determine whether there is a significant difference between the answers of men and

women.

Answer: _Two independent samples____Mann Whitney

test_________________________________.

2. Investors use many "indicators" in their attempts to predict the behavior of the stock

market. One of these is the "January indicator." Some investors believe that if the

market is up in January, then it will be up for the rest of the year. We wish to determine if

there is a relationship between the market's direction in January and the market's direction the rest

of the year.

Answer: ______Chi________________________________.

3. Bastien, Inc. has been manufacturing small automobiles that have averaged 50 miles

per gallon of gasoline in highway driving. The company has developed a more efficient

engine for its small cars and now advertises that its new small cars average more than 50

miles per gallon in highway driving. An independent testing service road-tested 25 of the

automobiles. We wish to determine whether or not the manufacturer's advertising

campaign is legitimate.

Answer: ___________One sample__t - test_________________________.

4. The Anderson Company has sent two groups of employees to a privately run program

providing word-processing training. One group was the data-processing department; the

other was from the typing pool. At the completion of the program, the Anderson

Company received a report showing the class rank for each of its employees. We wish to

determine to see whether there is a performance difference between the two groups in the

word-processing program.

Answer: ___________Ind__Mann_________________________.

5. A credit company wants to see if there is any difference in the average amount owed

by people under 30 years old and by people over 30 years. Independent random samples

of five were taken from both age groups. It can be assumed that the population variances

are the same. We wish to determine if there is a difference between the average amounts

owed by the two age groups

Answer: __________________Ind____________________.

3

6. A large corporation wants to determine whether or not the “typing efficiency” course

given at a local college can increase the typing speeds of its word processing personnel.

A sample of 6 typists is selected, and are sent to take the course. We wish to test to see if

it can be concluded that taking the course will actually increase the average typing speeds

of the typists.

Answer: ________________Paired______________________.

7. The Excellent Drug Company claims its aspirin tablets will relieve headaches faster than any

other aspirin on the market. To determine whether Excellent's claim is valid, random samples of

size 15 are chosen from aspirins made by Excellent and the Simple Drug Company. An aspirin is

given to each of the 30 randomly selected persons suffering from headaches and the number of

minutes required for each to recover from the headache is recorded. We wish to determine

whether Excellent's aspirin cures headaches significantly faster than Simple's aspirin.

Answer: ____________Ind__________________________.

8. An automobile manufacturer is trying to determine if 5 different types of bumpers

differ in their reaction to low-speed collisions. An experiment was conducted where 40

bumpers of each of 5 different types were installed on midsize cars, which were then

driven into a wall at 5 miles per hour. The cost of repairing the damage in each case was

assessed.

Answer: ___________ANOVA___________________________.

9. Is marital status related to health in the elderly? To answer this question, two hundred

elderly people whose marital status is known (single, married, widowed, or divorced) are

rated as to whether they are in good, fair, or poor health. Is there evidence of a

relationship?

Answer: __________________Chi____________________.

10. One company hires employees for its management staff from three local colleges.

The company's personnel has been collecting and reviewing annual performance ratings

in an attempt to determine if there are differences in performance among the managers

hired from these colleges. Performance-rating data are available from independent

samples seven employees from college A, six employees from college B, and seven

employees from college C. We wish to determine whether the three populations are

identical with respect to performance evaluations.

Answer: _____________________KW_________________.

4

Question #2: (14 Points) The following data are metabolic expenditures (amount of energy expended by patients)

for 8 patients admitted to a hospital for reasons other than trauma and for 8 patients

admitted for trauma (multiple fractures). Using .01αααα ==== and the SPSS output, give an

interpretation for each of the following:

Nontrauma 18.7 17.7 21.7 17.8 21.5 19.4 19.5 21.3

Trauma 25.1 38.4 35.9 26.4 25.7 20.3 21.4 24.7

(a) (3 Points) Examine the distribution of these scores. Does it seem normal?

(b) (3 Points) A couple of values are much higher than the rest. Explain why

outliers can cause a problem for t-analyses.

(c) (6 Points) Carry out a two-sample t-test comparing the means of the populations.

(d) (3 Points) Do the results of the Wilcoxon test and the usual t-test agree?

5

SPSS Output for question #2

Independent Samples Test

7.046 .019 -3.181 14 .007

-3.181 7.874 .013

Equal variances assumed

Equal variances not assumed

F Sig.

Levene's Test for

Equality of Variances

t df Sig. (2-tailed)

t-test for Equality of Means

Normal Q-Q Plot of ENERGY3

Observed Value

20100-10

Exp

ecte

d N

orm

al V

alu

e

10

0

-10

Ranks

8 5.13 41.00

8 11.88 95.00

16

TRAUMANontrauma

Trauma

Total

ENERGYN Mean Rank Sum of Ranks

Test Statisticsb

5.000

41.000

-2.836

.005

.003a

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Exact Sig. [2*(1-tailed

Sig.)]

ENERGY

Not corrected for ties.a.

Grouping Variable: TRAUMAb.

6

Question #3: (15 Points) A chain of convenience stores wanted to test three different advertising policies:

• Policy 1: No advertising.

• Policy 2: Advertise in neighborhoods with circulars.

• Policy 3: Use circulars and advertise in newspapers.

Eighteen stores were randomly selected and divided randomly into three groups of six

stores. Each group used one of the three policies. Following the implementation of the

policies, sales figures were obtained for each of the stores during a 1-month period.

Using the SPSS output, give an interpretation for each of the following:

1. (3 Points) Test of Homogeneity of Variances.

2. (6 Points) Explain the result of the ANOVA Table.

3. (6 Points) Discuss all the multiple comparisons.

7


Test of Homogeneity of Variances

DATA

.841 2 15 .451

Levene

Statistic df1 df2 Sig.

ANOVA

DATA

115.111 2 57.556 8.534 .003

101.167 15 6.744

216.278 17

Between Groups

Within Groups

Total

Sum of

Squares df Mean Square F Sig.

Multiple Comparisons

Dependent Variable: DATA

Bonferroni

-.6667 1.49938 1.000 -4.7056 3.3723

-5.6667* 1.49938 .005 -9.7056 -1.6277

.6667 1.49938 1.000 -3.3723 4.7056

-5.0000* 1.49938 .014 -9.0389 -.9611

5.6667* 1.49938 .005 1.6277 9.7056

5.0000* 1.49938 .014 .9611 9.0389

(J) GROUPPolicy 2

Policy 3

Policy 1

Policy 3

Policy 1

Policy 2

(I) GROUPPolicy 1

Policy 2

Policy 3

Mean

Difference

(I-J) Std. Error Sig. Lower Bound Upper Bound

95% Confidence Interval

The mean difference is significant at the .05 level.*.

8

Question #4: (12 Points) An experiment was conducted to evaluate the effectiveness of a treatment for tapeworm

in the stomachs of sheep. A random sample of 24 worm- infected lambs of approximately

the same age and health was randomly divided into two groups. Twelve of the lambs

were injected with the drug and the remaining twelve were left untreated. After a 6-

month period, the lambs were slaughtered and the following worm counts were recorded:

Drug-Treated Sheep 18 43 28 50 16 32 13 35 38 33 6 7

Untreated Sheep 40 54 26 63 21 37 39 23 48 58 28 39

Using .05αααα ==== and the SPSS output, give an interpretation for each of the following:

a. (3 Points) What's the suitable statistical technique?

b. (6 Points) Test whether the mean number of tapeworms in the stomachs of the treated

lambs is less than the mean for untreated lambs.

c. (3 Points) Place and interpret a 95% confidence interval on 1 2µ µ− to assess the size of

the difference in the two means.

9


Group Statistics

12 26.5833 14.36193 4.14593

12 39.6667 13.85859 4.00063

CODINGDrug-Treated Sheep

Untreated Sheep

Worm countsN Mean Std. Deviation

Std. Error

Mean


.205 .655 -2.271 22 .033 -13.0833 5.76141 -25.03176 -1.13491

-2.271 21.972 .033 -13.0833 5.76141 -25.03264 -1.13403

Equal variances

assumed

Equal variances

not assumed

Worm counts

F Sig.

Levene's Test for



Mean

Difference

Std. Error

Difference Lower Upper

95% Confidence

Interval of the

Difference


Paired Samples Statistics

26.5833 12 14.36193 4.14593

39.6667 12 13.85859 4.00063

Drug-Treated Sheep

Untreated Sheep

Pair

1

Mean N Std. Deviation

Std. Error

Mean

Paired Samples Correlations

12 .583 .046Drug-Treated Sheep

& Untreated Sheep

Pair

1

N Correlation Sig.

Paired Samples Test

-13.0833 12.88733 3.72025 -21.2716 -4.8951 -3.517 11 .005Drug-Treated Sheep

- Untreated Sheep

Pair

1

Mean Std. Deviation

Std. Error

Mean Lower Upper

95% Confidence

Interval of the

Difference

Paired Differences


10

Question #5:(11 Points)

Many states are considering lowering the blood alcohol level at which a driver is

designated as driving under the influence (DUI) of alcohol. An investigator for a

legislative committee designed the following test to study the effect of alcohol on

reaction time. Ten participants consumed a specified amount of alcohol. An-other group

of ten participants consumed the same amount of a nonalcoholic drink, a placebo. The

twenty participants' average reaction times (in seconds) to a series of simulated driving

situations are reported in the following table. Does it appear that alcohol consumption

increases reaction time?

Placebo 0.90 0.37 4.63 0.83 0.95 0.78 0.86 0.61 0.38 1.97

Alcohol 1.46 1.45 1.76 1.44 1.11 3.07 0.98 1.27 2.56 1.32


a. (3 Points) Why is the t test inappropriate for analyzing the data in this study?

b. (6 Points) Use the Wilcoxon rank sum test to test the hypotheses:

H0: The distributions of reaction times for the placebo and alcohol populations

are identical.

H1: The distribution of reaction times for the placebo consumption populations is

shifted to the left of the distribution for the alcohol population. (Larger relation

times are associated with the consumption of alcohol).

c. (2 Points) Place 95% confidence intervals on the median reaction times for the

two groups.

11


1010N =

Alcohol populationPlacebo population

3.5

3.0

2.5

2.0

1.5

1.0

.5

0.0

9

6

3

10

Normal Q-Q Plot of Placebo population

Observed Value

2.01.51.0.50.0

Ex

pe

cte

d N

orm

al

1.5

1.0

.5

0.0

-.5

-1.0

-1.5

Normal Q-Q Plot of Alcohol population

Observed Value

3.53.02.52.01.51.0.5

Ex

pe

cte

d N

orm

al

1.5

1.0

.5

0.0

-.5

-1.0

-1.5

Group Statistics

10 .9280 .50868 .16086

10 1.6420 .66416 .21003

Code: 1 Placebo,

2:AlcoholPlacebo

Alcohol

Blood-AlcoholN Mean Std. Deviation

Std. Error

Mean

12


.669 .424 -2.699 18 .015 -.7140 .26455 -1.26980 -.15820

-2.699 16.856 .015 -.7140 .26455 -1.27251 -.15549

Equal variances

assumed

Equal variances

not assumed

Blood-Alcohol

F Sig.

Levene's Test for



Mean

Difference

Std. Error

Difference Lower Upper

95% Confidence

Interval of the

Difference


Ranks

10 7.00 70.00

10 14.00 140.00

20

Code: 1 Placebo,

2:AlcoholPlacebo

Alcohol

Total

Blood-AlcoholN Mean Rank Sum of Ranks

Test Statisticsb

15.000

70.000

-2.646

.008

.007a

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Exact Sig. [2*(1-tailed

Sig.)]

Blood-Alcohol

Not corrected for ties.a.

Grouping Variable: Code: 1 Placebo, 2:Alcoholb.

13

Question #6: (17 Points)

A team of researchers wants to compare the yields (in pounds) of five different varieties

(A, B, C, D, E) of 4- year- old orange trees in one orchard. They obtain a random sample

of seven trees of each variety from the orchard.


a. (3 Points) Using tests and plots of the data, determine whether the conditions for

using the ANOVA are satisfied.

b. (6 Points) Conduct an ANOVA test of the null hypothesis that the five varieties

have the same mean yield.

c. (6 Points) Use the Kruskal-Wallis test to test the mull hypothesis that the five

varieties have the same yield distributions.

d. (2 Points) Are the conclusions you reached in (b) and (c) consistent?

14


Tests of Normality

.182 7 .200* .915 7 .428

.227 7 .200* .884 7 .243

.161 7 .200* .958 7 .804

.239 7 .200* .866 7 .172

.144 7 .200* .985 7 .980

Code (Yields)A

B

C

D

E

Yield (in pounds)Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is a lower bound of the true significance.*.

Lilliefors Significance Correctiona.

77777N =

Code (Yields)

EDCBA

Yie

ld (

in p

ou

nd

s)

50

40

30

20

10

0

Test of Homogeneity of Variances

Yield (in pounds)

5.214 4 30 .003

Levene

Statistic df1 df2 Sig.

15

ANOVA

Yield (in pounds)

1096.743 4 274.186 3.730 .014

2205.429 30 73.514

3302.171 34

Between Groups

Within Groups

Total

Sum of

Squares df Mean Square F Sig.

Ranks

7 11.64

7 21.36

7 26.79

7 13.64

7 16.57

35

Code (Yields)A

B

C

D

E

Total

Yield (in pounds)N Mean Rank

Test Statisticsa,b

10.011

4

.040

Chi-Square

df

Asymp. Sig.

Yield (in

pounds)

Kruskal Wallis Testa.

Grouping Variable: Code (Yields)b.

16

Question #7: (11 Points) A personnel director for large, research- oriented firm categorizes colleges and graduates.

The director collects data on 156 recent graduates, and has each rated supervisor.

Rating

School Outstanding Average Poor

Most desirable 21 25 2

Good 20 35 10

Adequate 4 14 7

Undesirable 3 8 6


a. (8 Points) Can the director safely conclude that there is a relation between school

type and rating?

b. (3 Points) Is there any problem in using the 2χ approximation?

17


SCHOOL * RATING Crosstabulation

21 25 2 48

14.8 25.5 7.7 48.0

20 36 10 66

20.3 35.1 10.6 66.0

4 14 7 25

7.7 13.3 4.0 25.0

3 8 6 17

5.2 9.0 2.7 17.0

48 83 25 156

48.0 83.0 25.0 156.0

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Count

Expected Count

Most Desirable

Good

Adequate

Undesirable

SCHOOL

Total

Outstanding Average Poor

RATING

Total

Chi-Square Tests

15.967a 6 .014

16.577 6 .011

13.934 1 .000

156

Pearson Chi-Square

Likelihood Ratio

Linear-by-Linear

Association

N of Valid Cases

Value df

Asymp. Sig.

(2-sided)

2 cells (16.7%) have expected count less than 5. The

minimum expected count is 2.72.

a.

Documents

CCCCOMPUTER &&&& DDDATA AAANALYSISNALYSISsite.iugaza.edu.ps/ssafi/files/2012/04/Exam-2007-Solution1.pdf · ا ا ا CCCCOMPUTER &&&& DDDATA AAANALYSISNALYSIS Theoretical Exam FINAL