Upload
vodieu
View
214
Download
1
Embed Size (px)
Citation preview
��� ا ا���� ا�����
CCCCOMPUTER OMPUTER OMPUTER OMPUTER &&&& DDDDATA ATA ATA ATA AAAANALYSISNALYSISNALYSISNALYSIS
Theoretical Exam
FINAL THEORETICAL EXAMINATION
Monday 15th
, 2007
Instructor: Dr. Samir Safi
Name:___________________________________ ID Number:_________________
Instructor:____________
INSTRUCTIONS:
1. Write your name, student ID and section number.
2. You have TWO hours.
3. This exam must be your own work entirely. You cannot talk to or share
information with anyone.
ABLETHIS T ON RITEWT 'ONDLEASE P
Student's Points
Total #7 #6 #5 #4 #3 #2 #1 Question
2
Question #1: (20 Points) For each of the situations described below, state the statistical technique and the
sample(s) type that you believe is the most applicable.
Example: Two independent samples - t test.
1. As part of an attitude survey, a sample of men and women are asked to rate a number
of statements on a scale of 1 to 5, according to whether they agree or disagree. We wish
to determine whether there is a significant difference between the answers of men and
women.
Answer: _Two independent samples____Mann Whitney
test_________________________________.
2. Investors use many "indicators" in their attempts to predict the behavior of the stock
market. One of these is the "January indicator." Some investors believe that if the
market is up in January, then it will be up for the rest of the year. We wish to determine if
there is a relationship between the market's direction in January and the market's direction the rest
of the year.
Answer: ______Chi________________________________.
3. Bastien, Inc. has been manufacturing small automobiles that have averaged 50 miles
per gallon of gasoline in highway driving. The company has developed a more efficient
engine for its small cars and now advertises that its new small cars average more than 50
miles per gallon in highway driving. An independent testing service road-tested 25 of the
automobiles. We wish to determine whether or not the manufacturer's advertising
campaign is legitimate.
Answer: ___________One sample__t - test_________________________.
4. The Anderson Company has sent two groups of employees to a privately run program
providing word-processing training. One group was the data-processing department; the
other was from the typing pool. At the completion of the program, the Anderson
Company received a report showing the class rank for each of its employees. We wish to
determine to see whether there is a performance difference between the two groups in the
word-processing program.
Answer: ___________Ind__Mann_________________________.
5. A credit company wants to see if there is any difference in the average amount owed
by people under 30 years old and by people over 30 years. Independent random samples
of five were taken from both age groups. It can be assumed that the population variances
are the same. We wish to determine if there is a difference between the average amounts
owed by the two age groups
Answer: __________________Ind____________________.
3
6. A large corporation wants to determine whether or not the “typing efficiency” course
given at a local college can increase the typing speeds of its word processing personnel.
A sample of 6 typists is selected, and are sent to take the course. We wish to test to see if
it can be concluded that taking the course will actually increase the average typing speeds
of the typists.
Answer: ________________Paired______________________.
7. The Excellent Drug Company claims its aspirin tablets will relieve headaches faster than any
other aspirin on the market. To determine whether Excellent's claim is valid, random samples of
size 15 are chosen from aspirins made by Excellent and the Simple Drug Company. An aspirin is
given to each of the 30 randomly selected persons suffering from headaches and the number of
minutes required for each to recover from the headache is recorded. We wish to determine
whether Excellent's aspirin cures headaches significantly faster than Simple's aspirin.
Answer: ____________Ind__________________________.
8. An automobile manufacturer is trying to determine if 5 different types of bumpers
differ in their reaction to low-speed collisions. An experiment was conducted where 40
bumpers of each of 5 different types were installed on midsize cars, which were then
driven into a wall at 5 miles per hour. The cost of repairing the damage in each case was
assessed.
Answer: ___________ANOVA___________________________.
9. Is marital status related to health in the elderly? To answer this question, two hundred
elderly people whose marital status is known (single, married, widowed, or divorced) are
rated as to whether they are in good, fair, or poor health. Is there evidence of a
relationship?
Answer: __________________Chi____________________.
10. One company hires employees for its management staff from three local colleges.
The company's personnel has been collecting and reviewing annual performance ratings
in an attempt to determine if there are differences in performance among the managers
hired from these colleges. Performance-rating data are available from independent
samples seven employees from college A, six employees from college B, and seven
employees from college C. We wish to determine whether the three populations are
identical with respect to performance evaluations.
Answer: _____________________KW_________________.
4
Question #2: (14 Points) The following data are metabolic expenditures (amount of energy expended by patients)
for 8 patients admitted to a hospital for reasons other than trauma and for 8 patients
admitted for trauma (multiple fractures). Using .01αααα ==== and the SPSS output, give an
interpretation for each of the following:
Nontrauma 18.7 17.7 21.7 17.8 21.5 19.4 19.5 21.3
Trauma 25.1 38.4 35.9 26.4 25.7 20.3 21.4 24.7
(a) (3 Points) Examine the distribution of these scores. Does it seem normal?
(b) (3 Points) A couple of values are much higher than the rest. Explain why
outliers can cause a problem for t-analyses.
(c) (6 Points) Carry out a two-sample t-test comparing the means of the populations.
(d) (3 Points) Do the results of the Wilcoxon test and the usual t-test agree?
5
SPSS Output for question #2
Independent Samples Test
7.046 .019 -3.181 14 .007
-3.181 7.874 .013
Equal variances assumed
Equal variances not assumed
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)
t-test for Equality of Means
Normal Q-Q Plot of ENERGY3
Observed Value
20100-10
Exp
ecte
d N
orm
al V
alu
e
10
0
-10
Ranks
8 5.13 41.00
8 11.88 95.00
16
TRAUMANontrauma
Trauma
Total
ENERGYN Mean Rank Sum of Ranks
Test Statisticsb
5.000
41.000
-2.836
.005
.003a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailed
Sig.)]
ENERGY
Not corrected for ties.a.
Grouping Variable: TRAUMAb.
6
Question #3: (15 Points) A chain of convenience stores wanted to test three different advertising policies:
• Policy 1: No advertising.
• Policy 2: Advertise in neighborhoods with circulars.
• Policy 3: Use circulars and advertise in newspapers.
Eighteen stores were randomly selected and divided randomly into three groups of six
stores. Each group used one of the three policies. Following the implementation of the
policies, sales figures were obtained for each of the stores during a 1-month period.
Using the SPSS output, give an interpretation for each of the following:
1. (3 Points) Test of Homogeneity of Variances.
2. (6 Points) Explain the result of the ANOVA Table.
3. (6 Points) Discuss all the multiple comparisons.
7
SPSS Output for question #3
Test of Homogeneity of Variances
DATA
.841 2 15 .451
Levene
Statistic df1 df2 Sig.
ANOVA
DATA
115.111 2 57.556 8.534 .003
101.167 15 6.744
216.278 17
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
Multiple Comparisons
Dependent Variable: DATA
Bonferroni
-.6667 1.49938 1.000 -4.7056 3.3723
-5.6667* 1.49938 .005 -9.7056 -1.6277
.6667 1.49938 1.000 -3.3723 4.7056
-5.0000* 1.49938 .014 -9.0389 -.9611
5.6667* 1.49938 .005 1.6277 9.7056
5.0000* 1.49938 .014 .9611 9.0389
(J) GROUPPolicy 2
Policy 3
Policy 1
Policy 3
Policy 1
Policy 2
(I) GROUPPolicy 1
Policy 2
Policy 3
Mean
Difference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.*.
8
Question #4: (12 Points) An experiment was conducted to evaluate the effectiveness of a treatment for tapeworm
in the stomachs of sheep. A random sample of 24 worm- infected lambs of approximately
the same age and health was randomly divided into two groups. Twelve of the lambs
were injected with the drug and the remaining twelve were left untreated. After a 6-
month period, the lambs were slaughtered and the following worm counts were recorded:
Drug-Treated Sheep 18 43 28 50 16 32 13 35 38 33 6 7
Untreated Sheep 40 54 26 63 21 37 39 23 48 58 28 39
Using .05αααα ==== and the SPSS output, give an interpretation for each of the following:
a. (3 Points) What's the suitable statistical technique?
b. (6 Points) Test whether the mean number of tapeworms in the stomachs of the treated
lambs is less than the mean for untreated lambs.
c. (3 Points) Place and interpret a 95% confidence interval on 1 2µ µ− to assess the size of
the difference in the two means.
9
SPSS Output for question #4
Group Statistics
12 26.5833 14.36193 4.14593
12 39.6667 13.85859 4.00063
CODINGDrug-Treated Sheep
Untreated Sheep
Worm countsN Mean Std. Deviation
Std. Error
Mean
Independent Samples Test
.205 .655 -2.271 22 .033 -13.0833 5.76141 -25.03176 -1.13491
-2.271 21.972 .033 -13.0833 5.76141 -25.03264 -1.13403
Equal variances
assumed
Equal variances
not assumed
Worm counts
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)
Mean
Difference
Std. Error
Difference Lower Upper
95% Confidence
Interval of the
Difference
t-test for Equality of Means
Paired Samples Statistics
26.5833 12 14.36193 4.14593
39.6667 12 13.85859 4.00063
Drug-Treated Sheep
Untreated Sheep
Pair
1
Mean N Std. Deviation
Std. Error
Mean
Paired Samples Correlations
12 .583 .046Drug-Treated Sheep
& Untreated Sheep
Pair
1
N Correlation Sig.
Paired Samples Test
-13.0833 12.88733 3.72025 -21.2716 -4.8951 -3.517 11 .005Drug-Treated Sheep
- Untreated Sheep
Pair
1
Mean Std. Deviation
Std. Error
Mean Lower Upper
95% Confidence
Interval of the
Difference
Paired Differences
t df Sig. (2-tailed)
10
Question #5:(11 Points)
Many states are considering lowering the blood alcohol level at which a driver is
designated as driving under the influence (DUI) of alcohol. An investigator for a
legislative committee designed the following test to study the effect of alcohol on
reaction time. Ten participants consumed a specified amount of alcohol. An-other group
of ten participants consumed the same amount of a nonalcoholic drink, a placebo. The
twenty participants' average reaction times (in seconds) to a series of simulated driving
situations are reported in the following table. Does it appear that alcohol consumption
increases reaction time?
Placebo 0.90 0.37 4.63 0.83 0.95 0.78 0.86 0.61 0.38 1.97
Alcohol 1.46 1.45 1.76 1.44 1.11 3.07 0.98 1.27 2.56 1.32
Using .05αααα ==== and the SPSS output, give an interpretation for each of the following:
a. (3 Points) Why is the t test inappropriate for analyzing the data in this study?
b. (6 Points) Use the Wilcoxon rank sum test to test the hypotheses:
H0: The distributions of reaction times for the placebo and alcohol populations
are identical.
H1: The distribution of reaction times for the placebo consumption populations is
shifted to the left of the distribution for the alcohol population. (Larger relation
times are associated with the consumption of alcohol).
c. (2 Points) Place 95% confidence intervals on the median reaction times for the
two groups.
11
SPSS Output for question #5
1010N =
Alcohol populationPlacebo population
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
9
6
3
10
Normal Q-Q Plot of Placebo population
Observed Value
2.01.51.0.50.0
Ex
pe
cte
d N
orm
al
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
Normal Q-Q Plot of Alcohol population
Observed Value
3.53.02.52.01.51.0.5
Ex
pe
cte
d N
orm
al
1.5
1.0
.5
0.0
-.5
-1.0
-1.5
Group Statistics
10 .9280 .50868 .16086
10 1.6420 .66416 .21003
Code: 1 Placebo,
2:AlcoholPlacebo
Alcohol
Blood-AlcoholN Mean Std. Deviation
Std. Error
Mean
12
Independent Samples Test
.669 .424 -2.699 18 .015 -.7140 .26455 -1.26980 -.15820
-2.699 16.856 .015 -.7140 .26455 -1.27251 -.15549
Equal variances
assumed
Equal variances
not assumed
Blood-Alcohol
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)
Mean
Difference
Std. Error
Difference Lower Upper
95% Confidence
Interval of the
Difference
t-test for Equality of Means
Ranks
10 7.00 70.00
10 14.00 140.00
20
Code: 1 Placebo,
2:AlcoholPlacebo
Alcohol
Total
Blood-AlcoholN Mean Rank Sum of Ranks
Test Statisticsb
15.000
70.000
-2.646
.008
.007a
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailed
Sig.)]
Blood-Alcohol
Not corrected for ties.a.
Grouping Variable: Code: 1 Placebo, 2:Alcoholb.
13
Question #6: (17 Points)
A team of researchers wants to compare the yields (in pounds) of five different varieties
(A, B, C, D, E) of 4- year- old orange trees in one orchard. They obtain a random sample
of seven trees of each variety from the orchard.
Using .01αααα ==== and the SPSS output, give an interpretation for each of the following:
a. (3 Points) Using tests and plots of the data, determine whether the conditions for
using the ANOVA are satisfied.
b. (6 Points) Conduct an ANOVA test of the null hypothesis that the five varieties
have the same mean yield.
c. (6 Points) Use the Kruskal-Wallis test to test the mull hypothesis that the five
varieties have the same yield distributions.
d. (2 Points) Are the conclusions you reached in (b) and (c) consistent?
14
SPSS Output for question #6
Tests of Normality
.182 7 .200* .915 7 .428
.227 7 .200* .884 7 .243
.161 7 .200* .958 7 .804
.239 7 .200* .866 7 .172
.144 7 .200* .985 7 .980
Code (Yields)A
B
C
D
E
Yield (in pounds)Statistic df Sig. Statistic df Sig.
Kolmogorov-Smirnova
Shapiro-Wilk
This is a lower bound of the true significance.*.
Lilliefors Significance Correctiona.
77777N =
Code (Yields)
EDCBA
Yie
ld (
in p
ou
nd
s)
50
40
30
20
10
0
Test of Homogeneity of Variances
Yield (in pounds)
5.214 4 30 .003
Levene
Statistic df1 df2 Sig.
15
ANOVA
Yield (in pounds)
1096.743 4 274.186 3.730 .014
2205.429 30 73.514
3302.171 34
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
Ranks
7 11.64
7 21.36
7 26.79
7 13.64
7 16.57
35
Code (Yields)A
B
C
D
E
Total
Yield (in pounds)N Mean Rank
Test Statisticsa,b
10.011
4
.040
Chi-Square
df
Asymp. Sig.
Yield (in
pounds)
Kruskal Wallis Testa.
Grouping Variable: Code (Yields)b.
16
Question #7: (11 Points) A personnel director for large, research- oriented firm categorizes colleges and graduates.
The director collects data on 156 recent graduates, and has each rated supervisor.
Rating
School Outstanding Average Poor
Most desirable 21 25 2
Good 20 35 10
Adequate 4 14 7
Undesirable 3 8 6
Using .01αααα ==== and the SPSS output, give an interpretation for each of the following:
a. (8 Points) Can the director safely conclude that there is a relation between school
type and rating?
b. (3 Points) Is there any problem in using the 2χ approximation?
17
SPSS Output for question #7
SCHOOL * RATING Crosstabulation
21 25 2 48
14.8 25.5 7.7 48.0
20 36 10 66
20.3 35.1 10.6 66.0
4 14 7 25
7.7 13.3 4.0 25.0
3 8 6 17
5.2 9.0 2.7 17.0
48 83 25 156
48.0 83.0 25.0 156.0
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Most Desirable
Good
Adequate
Undesirable
SCHOOL
Total
Outstanding Average Poor
RATING
Total
Chi-Square Tests
15.967a 6 .014
16.577 6 .011
13.934 1 .000
156
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
Value df
Asymp. Sig.
(2-sided)
2 cells (16.7%) have expected count less than 5. The
minimum expected count is 2.72.
a.