Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CHAPTER 5 RESULTS
5.1 Descriptive Statistics
5.1.1Test Set A
5.1.1.10ッera〃Results
Table 5-1 shows the information of the number of test takers, mean, standard
deviation, maximum score, minimum score, and KR20 of Test S et A With regard to all
test takers. Overall, the present authodnds no problematic results were fb皿d in
the descriptive statistics. The KR 20 may seem a bit low, considering that GTEC
administered in a formal situation usually bears a KR 20 higher than O.9. However,
since the test employed in the present study had only 27 test items with sufficient, but
limited, n㎜ber of test倣ers, itα㎜ot be helped that the reliability become lower
than that of GTEC. This and, also, the fact that the test items were written by the
present author with no particular means to item bahking being accounted as the
causes of decreased reliability, the reliability of O.768 seems acceptable fbr the
.
present sltuatlon・
Table 5-1 Descriptive statistics and reliability coefficients of Test Set A
#of Test Takers Mean S.D. Minimum Maximum KR20
573 17.7 4.5 4 27 0,768
There was no minimum score of zero, though the maximum score was a血ll
mark. Ideally, there should be no fUll marks in a proficiency test, because if there
are, it indicates that the test was not accurate in measuring the test taker’s ability,
which could have been beyond what was tested. However, since this test was not
quite a proficiency test, and since the frequency of fU11 marks was one, it was j udged
that this aspect of result was no threat to the reliability ofthe present study.
The histogram in Figure 5-1 indicates the distributions of test takers’scores as a
whole. The statistics bore-0.808 fbr Skewedness and O.227 fbr Kurtosis, which
63
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
allows the present author to determine that the curve presented is close enough to a
normal curve, although the peak is slightly on the right. This problem was solved
when different ability groups were determined.
Figure 5・1 Histogram of overall resultS for Test Set A
FREQUENCY 15.0+
1
1
1
1
10.0+
1
1
1
l
I
I
l
l
l
*
**
**
* **
** ***
******
******
* *******
* **********
************
* *************
*******************
0.0十一一一一一十一一一一十一一一一十一一一一十一一一一十一一一一十
0. 5. 10. 15. 20. 25. 30. SCORES
5.1.1.2」rte〃2レblidation
Facility value(percentage correct)and discrimination index(point-biserial
coefficient)fbr each item in Test Set A are provided in Table 5-2. Items 1,2,7and
16seem a little problematic when their point-biserial coefficients are examined.
One reason could have been because,㎜ong fbur options, the answer in each item
was皿clear and hard to distinguish from other options due to its defective
construction. Furthermore, the fact that the percentages correct for ltems 2,7,16are
especially low could mean that they were so difficult that even those who had scored
well on the test as a whole tended to get them wrong, resulting in low coefficients in
the point-biserial. However, overall, the figures seemed satisfactory as a test
64
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
instrument to be employed in this study, and it was decided to employ all the items in
the analyses to be fbllowed.
Table 5・2 Percentage Correct and Point-biserial Coefficient of ltems in Test Set A
ITEM# PC PBs1 0.66 0.16
2 0.34 0.07
3 0.37 0.26
4 0.92 0.36
5 0.84 0.30
6 0.46 0.237 0.39 0.00
8 0.58 0.25
9 0.36 0.28
13 0.68 0.49
14 0.72 0.54
15 0.90 0.40
16 0.37 0.04
17 0.70 0.55
18 0.71 0.47
19 0.81 0.53
20 0.71 0.42
21 0.51 0.46
22 0.84 0.51
23 0.71 0.50
24 0.74 0.54
25 0.76 0.49
26 0.78 0.57
27 0.87 0.40
28 0.71 0.55
29 0.68 0.58
30 0.63 0.47
Avera e 0.66 0.39
5.1.1.3Predeter〃lin ing’th e/l bility G7ro卯5
As explained in 4.3.1, the ability groups of Group A-Low and Group A-High
were predeterrnined based on the overall results of descriptive statistics on Test Set A.
Examining Figure 5-2, which is the score distribution table fbr the whole population
of Test Set A, the present author had detected something obscure was detected about
the population who scored g and皿der・They seem to deviate from the rest of the
group since they form a small normal curve of their own. Fulthemore, when the
65
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
distribution was reviewed丘om 11(items correct)to 26, it seemed to form a rather
perfect normal curve. Since the median is 19(items correct), those test takers who
血ll in the zone between line b and c would be considered as having
Figure 5■2
Number Freq-
Correct uency
. . . No exam i nees
3 0
4 2
5 5
6 4
7 9
8 10
9 6
10 15
Score distribution table for lrest Set A
Cum
Freq PR PCT
be l ow th i s score . .
0 1 0
2 1 0
7 1 1
11 2 1
20 3 2
30 5 2
36 6 1
51 9 3
1
1
+#
1#
1##
1##
1#
+###
51 people (8.9%)
一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一
11 7 58 10
12 24 82 14
13 18 100 17
14 24 124 22
15 30 154 27
16 24 178 31
17 54 232 40
1
4
3
4
5
4
9
1#
1##nv
l###
1#### 181 people (31.6%)
+##### 〈Group A-Low>
1####
1#########
_____________________________________一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一
000り011り乙
283 49
326 57
393 69
1######### 161 people (28.1%)
1######## ←median
+############
________________________一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一一 b
21 73 466 81 13 1#############
22 46 512 89 8 1########
23 31 543 95 5 1##### 180 people (31.4%)
24 17 560 98 3 1### 〈Group A-High>
25 8 568 99 1 +#
26 45729911# 1-一一一←一一一+一一一一←一一一+一一一一+
510152025
Percentage of Examinees
(27number c・rreCt・is・mitted・f・。m・the・table・because・its・f・eque・cy・was・1,・less・tha・the number f・・which#
would be given which is 4.)
marginal ability between those people who would be considered as having high
66
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
ability and those as having low ability. When the population percentage was
calculated for each zone, the population that fell皿der the zone between lines a and b
was 31.6%, lines b and c 28.1%, and line c below 31.4%. Since the grouping
seemed to allow roughly the same percentages of people to be allotted to the group,
the present author decided that those people who had scored between l l and 17
would be predetermined to be in Group A-Low and those between 21 and 27 in
Group A-High.
Table 5-3 shows the information of the number of test takers, mean, standard
deviation, maximum score and minimum score for Group A-Low and Group A-High.
Table 5-3 Descriptive statistics for Group A・Low and Group A-High
Group #of Test Takers Mean S.D. Minimum Maximum
A-Low 181 14.8 1.9 11 17
A-High 180 22.2 1.3 21 27
5.1.2Test Set B
5.1.2.10vera〃」Results
Table 5-4 shows the information of the number of test takers, mean, standard
deviation, maximum score, minimum score, and KR200f Test Set B. Overall, the
present author finds no problematic results in the descriptive statistics. The KR 20
may seem a bit too low, considering that GTEC and TOEIC administered in the
formal situation usually bears KR 20 higher than O.9. However, since the tests
employed in the present study had only 27 test items, far less than the number of
items included in o亘ginal tests, with su伍cient, but limited number of test takers, the
drop in the index seems unavoidable. Assessing this as an undisturbing element in
the present situation, the present author had decided to proceed with this result.
Table 54 Descriptive statistics and reliability coef「icien笛of Test Set B
#of Test Takers Mean S.D. Minimum Maximum KR20
67
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
257 16.3 4.0 4 25 0,675
The histogram in Figure 5-3 indicates the distributions of test takers’scores.
The statistics bore-0.340 fbr Skewedness and O.250 fbr Kurtosis, which allows the
present author to determine that the curve presented is close enough to a normal curve.
There was no minimum score of zero, and the maximum score was 25. From this,
along with the mean score of 16.3 and from the score distribution in Figure 5-3, it
was pre sumed that Test S et B had worked well to illustrate the reading ability of the
population who had worked on this test set.
Figure 5-3 Histogram of resultS for Test Set B
FREQUENCY : 15.0+
1
1
1
l
l
10.0+
1
l
l
l
l
l
l
I
l
*
*
* *
** *
*****
* *****
*********
**********
***********
***********
***********
******************
0.0十一一一一一十一一一一十一一一一十一一一一十一一一一十一一一一十一一一一十
0. 5. 10. 15. 20. 25. 30. SCORES
5.1.2.2」rte〃1 Vatidation
Facility value(percentage correct)and discrimination index(point-biserial
coefficient)fbr each item in Test Set B are provided in Table 5-5. When their
point-biserial coefficients are examined, items 2 and 4 seem to bear problems. One
explanation could be because,㎜ong fbur options, the answer in each item was
mclear and difficult to choose from the given options due to its defective
68
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
construction. However, overall, the figures seemed satisfactory as a test instrument
to be employed in this study, and all the items were observed in the analyses to be
fbllowed.
Table 5-5 Percentage Correct and Point-biseriat Coefficient of ltems in Test Set B
ITEM# PC PBs1 0.73 0.24
2 0.24 0.19
3 0.38 0.26
4 0.95 0.15
5 0.81 0.266 0.51 0.23
10 0.65 0.2811 0.61 0.30
12 0.72 0.36
13 0.72 0.41
14 0.42 0.34
15 0.88 0.30
16 0.40 0.22
17 0.53 0.38
18 0.78 0.26
19 0.44 0.29
20 0.68 0.33
21 0.45 0.46
22 0.63 0.38
23 0.40 0.31
24 0.55 0.25
25 0.77 0.48
26 0.73 0.49
27 0.70 0.46
28 0.66 0.44
29 0.54 0.41
30 0.46 0.30
Avera e 0.61 0.33
5.2 Factor Analytic Studies
5.2.1Group A-Low
A fU11-information factor analysis was applied to all the items in Test Set A With
the responses of Group A-Low. Here, a two-factor solution was adopted because of
69
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
its interpretability. The correlation between the factors is low,.341 between the first
and second factors, which indicates that the orthogona1(VARIMAX)analysis is
preferable. Table 5-6 illustrate s the factor loadings fbr this group.
For inspection of loadings on each factor, the factor loadings of each item were
rearranged in order from those that bear high loadings to low loadings on the first
飴ctoL The predetermined question t)?ell of each item is indicated in the table as
‘‘p-TYPE’うso that the relationship between factor loadings and question types might
be sought. The numbers under“P-#”in the table shows for which passage each test
item was responded.
Table 5-6 Factor Loadings for Test Set A by Group A-Low
21
111n order to make the reference to the terms simpler,‘‘global-inferential”will be presented as‘‘GI”,
‘‘撃盾モ≠戟|literal”as‘‘LL”and‘‘loca1-inferential”as‘‘LI”in the tables and in the discussion丘om here on.
70
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
The first thing noticed is that the text(context)characteristics did not affect the
extraction of factors. Items from different passages had significant loadings on the
same factors, and the loadings of the items that came from the same prompt varied
greatly in the loadings.
In looking fbr particularity in the loadings on the first factor, one notices that
the items that bear rather high loadings on the first factor are the items that have
smaller item numbers. In other words, they are the items that appear early in the test
set. In the same way, the items that bear negatively high loadings are the items that
apPear later in the test set. V西at this indicates is that the first factor in the factor
loadings for Test Set A via the performance of Group A-Low could be determined as
‘‘翌??窒?@a test item is located in the test setう’or‘‘item position”. This point will be
fUrther discussed in Chapter 6.
As fbr the interpretation of the second factor in the present analysis, the
possibility of a‘‘literal’うtype of reading being an attribute arises. The items that load
heavily on the second factor are items 2 and 7. The predetermined question type
varies between the two, so fUrther analyses of the two items were done.
Item 7, which was originally categorized to be a GI(‘‘global-inferential”, see
note#11)item, is presented in the test as fbllowing:
7.What is the main topic of this passage?
(A) The possibility of space celonies
(B) Space travel in the twenty-first century
(C) How to become an astronaut
(D) What people think about space exploration
The correct option(D)could be chosen if the test taker could observe that the
explanations about different percentages introduced in the passage are all about
‘‘vhat people think about space exploration , option(D), and that that i s the theme of
the passage. However, at the same time, it could be supposed that some test takers
71
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
look at the first sentence in the passage,‘‘During the last fbrty years, many studies
have been done to learn people’s opinions about space exploration,”and match the
phrase‘‘people’s opinion about space exploration”with what is said in option‘‘D”.
If it could be presumed that this type of reading was done to reach the answer, this is
aliteral matching of a limited(or local)information, and item 7 could be considered
as an LL(‘‘local-literal’う, see note#11)item.
Item 2 is also an LL item:
2.The speed at which the seafloor is spreading is
(A) about an inch in 200 million years.
(B) changes according to the year.
(C) half as fast as human fingernails grow.
(D) slower than the scientists can process.
This is indeed an LL item since the correct option(C)would be chosen when
the test taker notices that the last sentence in the passage,‘‘This spreading occurs in
half of a speed of how fast fingernails grow,” perfectly matches the phrases in option
(C).Thus, it is possible that the‘‘local-literal”element explains the second factor.
To㎞her con丘㎜this inteΦretation, another thing to be pointed out is that
there are quite a few items that bear negatively high loadings on the second factor:
items 14,17,18, and 19.
Items 17 and 18 share the same passage, and they were originally categorized
to be an LL item and an LI(‘‘local-inferential”, see note#11)item, respectively.
17.According to this passage, what do scientists now believe about the ocean
depths?
(A) There are many dark-shaded jellies・
(B) Sea color changes with the seasons.
(C) Akind of desert exists in some parts.
(D) Most of the living things there are jellies・
72
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
18.We can guess from the passage that
(A) scientists have found that deep-sea is Iike a watery desert.
(B) scientists don’t know why deep-sea jellies have bright colors.
(C) many jellies in the underwater have their common ciear color.
(D) many fish in the deep sea have very bright colors.
For item 17, the correct answer is(D). The sentence,‘‘Scientists now believe
that j elly animals may be one of the most common types of animal life in the ocean
depths,う’ starting from the seventh line, was to be matched with the question,“what do
the scientists now believe about the ocean depths?”and what is written in option(D).
However, it seems that, fbr Group A-Low test takers, making a link between the
phrase‘‘the most common types of animal life’うfrom the passage with the phrase
“Most of the living things”in option(D)was an‘‘inferential”type of reading, rather
than a‘‘matchingう’, which makes us identifシitem 17as an LI item fbr this group.
With regard to item 18, the correct option(B)would be chosen if the test taker
could locate the last sentence,‘‘The reason fbr these bright colors is a mystery,”and
infer that“a mystery”means that nobody knows why j ellies in the deep-sea have
bright colors. In other words, this item was constnlcted with the intention to test test
takers’ability to make an inference after understanding a small amount of information,
and, therefbre, it was labeled as an LI item. If this was what was done by the test
takers, it might l)e possible to explain that the second factor is indeed a‘‘local-literal”,
or at least a“literal”element, on acco皿t that the items that load negatively high on
the same factor are perceived to pre sent an‘‘inferential’うfeature, a feature that would
be on the other end of‘‘literal”.
This proposition is fUrther confirmed when items 14 and 19 are consulted.
Item 19is presented as:
19.What is the main idea of this passage?
(A) Scientists work very hard to make new discoveries.
73
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
(B)
(C)
(D)
lmportant things can be discovered accidentally.
Making scientific discoveries is an easy thing to do.
Sticky materials are useful in today‘s world.
This item was predetermined to be a GI item because the whole passage was
about how Art Fry had come up with the idea of stick-ons by luck, giving the message
that‘‘(B)Important things can be discovered accidentally. This answer could also be
reached if the first sentence of the passage,‘‘Some discoveries have come as a result
of luck-an accident that causes a scienti st to look differently at what has occurred,う’
is located, and the phrases‘‘as a result of luck”and‘‘an accidentう’from the sentence is
correctly linked with‘‘accidentally”in option(B). Making this link might require a
bit of infening, so this item could be determined as an‘‘inferential”item, whether it is
categorized as a GI or LI item.
Item 14 was an item which was predetermined to be an LL item:
14.Why was the Great Smoky Mountains National Park built?
(A) People in the East needed a pIace to take a walk for exercises.
(B) Many kinds of birds and trees were discovered in Smoky Mountains.
(C) Many parks in the West were becoming too crowded with cars.
(D) There were few national parks in the eastern part of the US.
The correct answer is(D), which gives the same explanation as the first
sentence in the passage,‘‘In the early 1920s, the new United States National Park
Service realized that most of its parks were in the West,”in a slightly different
expression. This item was first categorized as an LL item in the item-writing
process because she thought that this‘‘matching”was of a‘‘literar’nature. However,
given the p・pulati・n・f Gr・up A-L・w, it might be c・nsidered that the nature・f
matching here is something‘‘inferential”, which fUrther suggests that the attribute of
the second factor is whether the item elicits‘‘1iteral”or‘‘inferential” type of reading.
74
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
5.2.2 Group A-High
A fU11-information factor analysis was applied to all the items in Test S et A With
the responses of Group A-High. The pre sent author had adopted two-factor solution
fbr its interpretability. The correlation of.054 between the first factor and the
second factor is rather low, so the orthogonal(VARIMAX)analysis seemed more
appropriate. Table 5-7 illustrates the factor loadings fbr this group.
’『able 5・7 Factor Loadings for Test Set A by Group A-High
21
For inspection of loadings on each factor, the factor loadings of each item were
rearranged in order from tho se that bear high loadings to low on the first factor. The
question type(Q-TYPE)of each item is indicated in the table, along with the passage
number(P-#), so that the relationship between the factor loadings and questions types
75
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
and passage numbers might be sought.
The first thing noticed is that the text(context)characteristics did not affect the
extraction of factors. Items from different passages had significant loadings on the
same factors, and the loadings of the items that came from the same prompt varied
greatly in the loadings.
In seeking particularity in the loadings on the first factor, it seems that the items
that bear rather high loadings on the first factor are the items that have smaller item
numbers and tho se with low loadings have 1arger item numbers, as was the case with
Group A-Low. However, at the same time, one can also observe that the items that
bear rather high loadings on the first factor are the items that are labeled GI fbr
question type. These are the questions that ask fbr the main ideas of the given
passage. To make sure that these items actually elicit a GI type of reading, items 7,
4,and 22 were revisited with some test takers after the test implementation, and it
was confirmed that they do.
Item 6, labeled as an LI item, also bears a rather high loading on the first factor.
When each item is closely examined, item 6 is presented as:
6.We can guess from the passage that
(A) some trees in Muir Woods existed 1,200 years ago.
(B) the redwood trees have been discovered just recently.
(C) redwood trees are very popuIar in the US.
(D) cutting down of the redwood is not allowed in the US.
This question is given with the intention that, if the test taker could locate the
sentence,“Some are about 1,200 years of age,” on the fifth line of the passage(refer
to Test Set A in Appendix A), option(A)would be chosen after inferring that if the
trees are 1,200 years old, they should have existed in Muir Wbods 1,200 years ago.
This item was constructed with the intention to test a test taker’s ability to make an
inference after understanding a small amo皿t of information・
However, in closely examining item 6,0ne thing to which an attention is dra㎜
76
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
is that, compared to other LI items in Test Set A, in order to discard incorrect options
fbr this item, the test takers had to read and refer to rather a large amo皿t of
information. This could be considered as a type of reading predetermined for a GI
type of question, and, therefbre, it could be said that item 6 had worked as a GI item
in the present analysis, allowing us fUrther to interpret the first factor to be the ability
to read rather a large amo皿t of information and make inferences from its
comprehension.
Item 5, another item which loads heavily on the first factor, shares the passage
with item 6 and asks:
5.What is one reason why redwood trees have existed so long?
(A) They form an unusual forest just outside San Francisco.
(B) They have special covers that protect themselves.
(C) They are very tall, so the fire can’t reach the whole tree.
(D) They are officially protected by the State of California.
The correct answer(B)could be chosen if the test taker could locate the sentence,
‘‘
shey contain chemicals which protect them against fire, decay, and insects,”that
starts from the eighth line of the passage. This item was labeled as a LL
(local-literal)type and was constructed to test a test taker’s ability to皿derstand a
small amount of information with little or no inferring. However, when option(B)
is closely reviewed, to correctly choose option(B), the test takers had to comprehend
(and maybe infer)that the word“cover”in option(B)means the bark of . the tree.
Furthermore, the correct option could also be chosen when the sentence,‘‘One reason
is that they are not easily harmed by fire because they have very thick bark, and there
is much water in their wood,”starting from the sixth line of the passage, is located,
and the same inference about the“bark”was made by the test taker, which would
make this item‘‘LI9う.
At the same time, one notices that, although the correct answer could be
reached by LI type of reading as it was examined above, what is asked in item 5 is
77
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
actually the central theme of passage. On the sixth line, the passage presents a
question‘‘How do these trees live so long?’うand the rest of the text is fbcused on
answering this question. Although this question was given in the middle of the
passage, because the explanation of why redwood trees have lived so long seems to
dominate the main discussion in the passage, it could be judged that item 5 is asking
fbr the main idea to be comprehended. Therefbre, it could be deduced that item 5
might as well be categorized to be a GI item, which would allow us fUrther to
conclude that the first factor in the present analysis is the ability to present a GI type
of reading comprehension.
If the first factor could be explained by a GI nature of reading, items that hold
negatively high loadings could be perceived as items that elicit non-GI, and perhaps
LL, types of reading performance. These are items 17,19,28, and 30, in the order
ofhow negatively high factors are loaded on each item.
Item 28 was an item which was predetermined to be a GI item as i s clear from
the question given.
28.What is the main idea of this passage?
(A)The popularity of national parks is creating problems.
(B)National parks are built as children’s playground.
(C)Pollution is a problem in national parks.
(D)The cost of visiting a national park is increasing.
The passage was about how national parks in the US have problems because
too many people are visiting them. A similar proposition is expressed in option(A),
which should be chosen if the test taker had correctly comprehended the passage.
However, even if the whole passage was not read globally, the correct option could be
chosen after reading the first sentence,‘‘The U.S. National Parks Service is trying to
solve a difficult problem,”along with an earlier part of the second sentence,“Many
national parks have become too popular.” @If this was the case, it might be more
appropriate to consider this item as testing an LL type, or at least a local type, of
78
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
reading.
A similar case holds true for item 19:
19.What is the main idea of this passage?
(A)Scientists work very hard to make new discoveries.
(B)lmportant things can be discovered accidentally.
(C)Making scientific discoveries is an easy thing to do.
(D)Sticky materials are useful in today‘s world.
This item was termed to be a GI item in the factor analytic study done for
Group A-Low. However, with the population of Group A-High, because of their
higher ability and the ease with which they read English, the matching of‘‘Some
discoveries have come as a result of luck-an accident that causes a scientist to look
differently at what has occurred,う’ from the first sentence and option(B)had rather an
LL nature than GI. In the same respect, item 17, which was categorized to be an LI
item with Group A-Low, could now be considered to be an LL item.
Item 30, which shares the same passage with item 29 above, was constructed
with the intention to elicit a test taker’s LI reading performance.
30.Why is it necessary for some parks to limit the number of visitors?
(A)
(B)
(C)
(D)
There aren’t enough parking spaces for all the visitors around the parks.
Having too many visitors has bad influences on the living things in the
parks.
They don’t have enough money to hire people as the guides in the parks.
There would be too much traf「ic on the roads inside and around the parks.
In order to correctly choose(B)as an answer, the test taker was to locate the
second to last sentence,“The 1arge number of visitors is al so a threat to the plant and
animal life of the parks,”and infer that if something is“a threat to the plant and
animal life of the parksう’, it“has a bad influences on the living things.”When the
79
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
item was revisited with some of the test takers, this seemed to be the case.
Nevertheless, what became clear as items 17,19,28, and 30 were revisited was
that they were certainly not GI items, in fact, they seem to have elicited a type of
reading that could be considered as an opposite of GI, and shared a common feature
of a“localううnature. Therefbre, it might as well be concluded that the first factor in
the present analysis is indeed the‘‘global”, if not global-inferential, element of
reading performance.
As for the second factor in the factor loadings presented in Table 5-6, items 16,
23 and 30 bear high factor loadings. They are from different reading prompts, so the
text features cannot be a factor. Furthermore, all three items bear different
predetermined question types. Therefore, each items were reexamined to seek a
common feature that would help interpret this factor.
Item 16 was predetermined to be a GI question:
16.What is the main idea of this passage?
(A) Scientists believe that the deep sea is like a desert in water.
(B) Scientists learned a lot about jellies in the sea from the sailors.
(C) Scientists discovered a Iot about jellies in the ocean depths.
(D) Scientists were surprised to find so many jelIies in the deep-sea.
This item is given with the intention to elicit a test takeピs global comprehension of
the passage. The test taker is to read the whole passage and infer that the main idea
presented by the author is(C). When this item was revisited with some test takers,
more or less, this was what was done to reach(C)as an answer, which con丘㎜s that
item 16 was indeed a GI item. They said that the second sentence,‘‘But with new
ways to explore the oceanうs depths, we are finding that they are much richer in life
than we ever expected,”had worked as a clue to infer that the theme of this passage
was how scientists are‘‘discovering a lot about j ellies in the ocean depths”, and that
the rest of the passage was giving examples to support this theme.
Looking at item 23, which was originally labeled as an LL item, the correct
80
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
answer(D)would be chosen if the test taker could locate the first sentence in the
passage,‘‘About 85%of all animal life consists of insects,う’and match the phrase
‘‘W5%”with the phrase‘‘the most part’ and‘‘consists of’with‘‘forms” in option(D).
23.According to the passage,
(A) some insects eat other insects for food.
(B) some insects make food from oil pool.
(C) insects are usually found near the water.
(D) insects form the most part of animal life.
However, in the analysis subsequent to the data collection, it was perceived that what
had been presumed to be a‘‘literal matching”(an LL type of reading)was actually皿
‘‘
奄獅??窒窒奄獅〟D” In other words, interpreting“the most part”in option(D)to mean
‘‘
W5%”and‘‘forms”to mean‘‘consists of’in the original sentence could actually be
considered as an‘‘inferring”rather than a‘‘literal matchingう’. If this is true, item 23
should be called an LI item, and now, the common feature that items 16 and 23 share
is an‘‘inferential”element. Here, the possibility that an attribute that explains the
second factor is an‘‘inferential element”arises.
This proposition is fUrther confirmed when item 30 is examined. Item 30, in
the analysis that was done for the first factor, was determined to be an LI item, a
question type that holds an“inferential”element. Therefore, this leads us to affrirm
that an‘‘inferentialう’element is the attribute that explains the feature of the second
factor.
Conversely, if the second factor could be explained by an‘‘inferential”nature of
reading, items that hold negatively high loadings could be perceived as items that
elicit“non-inferential”, or“literal”, type of reading performance. These are items
18,26,and 27, in the order of how negatively high factors are loaded on each item.
Item 1 8 was termed to be an LI item in the factor artalytic study done for Group
A-Low.
81
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
18.We can guess from the passage that
(A)scientists have found that deep-sea is like a watery desert.
(B)scientists don’t know why deep-sea jellies have bright colors.
(C)many jellies in the underwater have their common clear・coior.
(D)many fish in the deep sea have very bright colors.
However, as was the case with items 17 and 19 in examining the nature of the
first factor in the present analysis, with the population of Group A-High, because of
their higher ability and the ease with which they read English, what was determined
as the‘‘inferential”matching of the last sentence,‘‘The reason fbr these bright colors
is a mystery,’うwith the correct option(B)fbr the test takers of Group A-Low had
rather a LL nature than GI for tho se in Group A-High.
Items 26 and 27 share the same passage and are presented as:
26.Mendez could succeed because his parents
(A)helped him travel around the world.
(B)brought him up very strictly.
(C)put in much money and time.
(D)taught him many kinds of sports.
27.Rober寸Mendez is
(A)afather of two children
(B)afisherman from California
(C)atennis player
(D)aTV star
Item 26 was predetermined to be an LL item because the correct option(C)
could be reached if the test taker could locate the sixth and seventh sentence in the
passage,‘‘Robert traces his success to his parents’sacrifices. They invested every
spare penny and every spare moment in their sons’ fUture,”and fbllow that, in essence,
82
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
what is said by these sentences is what is said in option(C), which makes this an LL
item.
In the same way, item 27 i s an LL que stion because the reading performance
elicited by this item is the‘‘literal’うmatching of‘‘a world-class player”and‘‘a tennis
racket”from the first two sentences of the passage with option(C). Although item
27 was predetermined to be an LI question because the matching above was thought
to hold an‘‘inferential”nature, in reality, the case above seems to hold true, which
allows the present author to conclude that the second factor in the present analysis is
well explained by the‘‘inferential/literal”nature that an item exhibits.
5.2.3 Group B
All the responses of Group B working on Test Set B were analyzed using a
血ll-information factor analysis. The present author had decided to employ a
two-factor solution after consulting the results since it seemed the most appropriate.
The correlation between the first and second factors was not too high,.549, which
indicates that the orthogonal(VARIMAX)analysis is preferable. Table 5-8
illustrates the factor loadings fbr this group.
For the purpose of inspecting the loadings on each factor, the factor loadings of
each item were rearranged in the order from tho se that bear high loadings to low on
the first factor. To help seeking the relationship between the factor loadings and
question types along with passage numbers, the predetermined question type
(Q-TYPE)of each test item and the number of passage for which each item was
answered(P-#)are indicated in the table.
It could be said that the text (context) characteristics did not affect the
extraction of factors. Items from different passages had significant loadings on the
same factors, and there were sufficient variations in the loadings fbr the three items
that were constructed fbr the same prompt.
In seeking Particularity in the loadings on the first factor, one notices that the
items with 1arger item numbers bear rather high loadings on the first factor. In other
83
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
words, the items that load heavily on the first factor are the items that appear later in
the test set. In the same way, the items that appear early in the test set bear
negatively high loadings. Vのb.at this indicates is that the first factor in the factor
loadings for Test S et B via the performance of Group B could be determined as
‘‘翌??窒?@a test item is located in the test set”, or‘‘item position”, as was the case with
Group A-Low. Thi s will finther be discussed in Chapter 6.
’『able 5-8 Factor Loadings for Test Set B by Group B
21
As fbr the interpretation of the second factor in the present analysis, the
possibility of an“inferential’type of reading being an attribute arises. The items
that load heavily on the second factor are items 14,15, and 21. The predetermined
question types are LL fbr item 14, LI fbr items 15and 21.
84
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
Items 14 and 15 share the same passage and are presented in the test set as
fbllowing:
14.Ptarmigan keep warm in the winter by
(A) huddling together on the ground with other birds
(B) building nests in trees
(C) burrowing into dense patches of vegetation
(D) digging tunnels into the snow
15.The author mentions kinglets in line 17 as an example of birds that
(A) protect themselves by nesting in holes
(B) nest with other species of birds
(C) nest together for warmth
(D) usualiy feed and nest in pairs
In order to correctly answer item 15, which shows the highest loading on the
second factor, the test takers are to locate the last and the second to the last sentences,
‘‘aody contact reduces the surface area exposed to the cold air, so the birds keep each
other w㎜. Two kinglets huddling together were found to reduce their heat losses by
aquarter, and three together saved a third of their heat.”@They are to integrate the
information given in these sentences to deduce that(C)is the correct answer, and this
leads us to con丘m that item 15is indeed an LI item.
Item 14 was an item that was constructed with the intention to elicit a test
taker’s LL type of reading perfbmance. The fifth sentence,“Solitary roosters
shelter in dense vegetation or enter a cavity-homed larks dig holes in the ground
and ptarmigan burrow into snow banks,う’is the key in choosing the correct option(D),
and it was supposed that the test takers in this group would try to match‘‘burrow into
snow banks”from the original sentence with“digging tunnels into the snow”in
option(D). However, at the same time, it could be presumed that this matching had
required a bit of inferring since the words used in the targeted phrases are slightly
85
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
different, which makes this an LI item.
Item 21 can also be confirmed as an LI item:
21.Why does the author mention Joseph Pulitzer and William Randolph
Hearst?
(A)They established New Ybrk’s first newspaper.
(B)They published comic strips about the newspaper war.
(C)Their comic strips are still published today.
(D)They owned major competitive newspapers.
The information from the two sentences from the passage,“The丘rst血11-color
comic strip appeared in January 1894 in the New Ybrk Wbrld, owned by Joseph
Pulitzer,”and‘‘The first regular weekly fUll-color comic supplement, similar to
today,s Sunday fUnnies, appeared two years later, in William Randolph Hearst’s rival
New、York paper, the Morning Joumal,”as well as the phrase,“between giants of the
㎞eric紐press”丘om the丘rst sentence皿d“Both were immensely popular,”丘om
the first sentence of the second paragraph are integrated to infer that these two people
‘‘盾翌獅?п@maj or competitive newspapers,う’(option(D)).
The fact that items 14,15, and 21 are all considered to be LI items allows us to
claim that the second factor in this analysis can be explained by the“local-inferential”
element of reading perfbrmances.
5.3 ltem Analyses
5.3.1Selecting items to be紐alyzed in this part of study
It is clear丘om the results of factor analytic studies in section 5.2 that some of
the question types that were predetermined for each item did not fUnction in the way
they were expected. However, at the same time, through the qualitative analyses of
each item that were done to specifシthe nature of each factor in sections 5.2.1,5.2.2,
and 5.2.3, new question types were assigned to the items which revealed a great
86
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
particularity to each factor. In investigating relationships between question types
and item diffriculty, it seems necessary to proceed with this part of analysis with the
items for which the question types became explicit and coherent in the factor analytic
studies above.12@For this cause, the items which are incorporated in this part of
analysis are listed in Table 5-9.
In Table 5-9, Group B is excluded because items from Test Set B ca皿ot be
incorporated in this part of the analysis owning to the fact that, fbr the second factor,
only a few items showed high loadings and that no item showed a strong negative
loading.
Table 5・9 1tems adopted for item anatyses by their question types and ability groups
Question Type Group A-Low Group A-High
Gl (inferential)
(inferential) 16,23.30
14,17,18,19 (global)
456722 , , , ,
Ll
(local)
17,19,28,30
(literal) (literal)
LL 2.7
Furthermore, in the factor anal)戊ic studies, the items in both Group A-Low and
Group A-High exhibited only partial aspects of question types that were defined
12 she items with the factor loadings of.400 and above and-.400 and below were selected as items
that had explicit featUres ofquestion types and were employed for fUrther analyses with each ability
9「oups・
87
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
earlier in the present thesis. Therefbre, the present author could only specify the
question types according to the literal/inferential or local/global dimensions, rather by
their‘‘question types”(i.e. local-inferential). This is why, fbr Group A-High, item
30,appears twice in Table 5-9:0nce as an inferential item and again as a local.
5.3.2Group A-1.ow
For each test item in Test Set A, item dif日culty was calibrated via Rasch
Analysis based on the test performances of the test takers in Group A-Low.
RASCAL converged after 3 Loops. The final parameter estimates are presented in
Table 5-10. Raw score conversion table, item by person distribution map, test
characteri stic curve, and test information curve are in Appendix C. The present
author had fb皿d nothing Problematic with test characteristic curve and test
information curve, and item by person distribution map indicated that the difficulty of
items in Test Set A was generally equal to the ability estimates of the test takers in
Group A-Low.
The value for item difficulty can vary between-3.00 and 3.00, with-3.00 being
the easiest and 3.OO the most difficult. The numbers in“Rank”column indicate the
difficulty ranking of each of 27 items included in Test Set A.
In investigating the relationship between item difficulty and question type, the
mean scores of item difficulty for items selected in Table 5-9 with reference to their
question types were calculated and are presented in Table 5-11. The items employed
in this part of analysis were limited to the items from Table 5-9 because they were the
items that loaded heavily on each factor in the factor analytic study and bore explicit
features of each question type.
For a precise examination of the difference in the means of difficulties in these
two groups of items, a t-test was carried out(p.<0.1[p.=0.090]). From this result, it
can be seen that, fbr the population of Group A-Low,‘‘1iteral”items pose more
di伍culty than‘‘inferential”items. No analysis of relationship between question
type and item difficulty could be done fbr‘‘local/global”items since factors丘om
factor analytic studies did not indicate this feature.
88
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
Table 5-10 Final Parameter Estimates of test items for Group A・Low
Item# Difficul Rank Std. Error Chi s. df Sc. Diff
1 一〇,076 18 0,152 24,899 6 992 0,763 4 0,155 5,555 6 1073 1,265 1 0,168 10,742 6 1124 一2.045 27 0,255 22,284 6 81
5 一1,270 25 0,193 7,301 6 886 0,740 5 0,154 4,057 6 1077 0,405 9 0,151 25,349 6 1048 0,188 12 0,150 9,565 6 1029 1,238 2 0,167 1,796 6 111
13 0,122 13 0,151 5,810 6 101
14 0,035 15 0,151 5,123 6 10015 一1.650 26 0,219 4,003 6 8516 0,604 6 0,152 12,121 6 10517 0,253 11 0,150 6,744 6 10218 0,100 14 0,151 8,806 6 101
19 一〇.549 22 0,162 5,841 6 9520 一〇.032 17 0,152 9,894 6 10021 1,238 3 0,167 8,678 6 111
22 一〇.818 23 0,171 2,087 6 9323 一〇.076 19 0,152 2,394 6 9924 一〇,010 16 0,152 8,168 6 10025 一〇.424 21 0,159 7,846 6 9626 一〇.189 20 0,154 8,111 6 9827 一1.095 24 0,184 4,450 6 9028 0,318 10 0,150 12,019 6 10329 0,449 8 0,151 4,310 6 10430 0,515 7 0,151 4,856 6 105
Table 5-11 Means of item dif『icutty for each question type in Group A-Low
Litera1
Item# Di伍cul2 0,7637 0,405
Mean 0,584
Infbrential
Item# Dif猛cul
14 0,03517 0,25318 0,100
19 一〇.549
Mean 一〇.040
89
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
5.3.3Group A-High
With the test performances of the test takers in Group A-High, item diffil culty of
each test item in Test Set A was calibrated via Rasch Analysis based on. RASCAL
converged after 3 Loops. The final parameter estimates are presented in Table 5-12,
and raw score conversion table, item by person distribution map, test characteristic
curve, and test information curve are in Appendix D. Nothing problematic was
found with the test characteristic curve and the test information curve, and the item by
person distribution map indicated that the difficulty of items in Test Set A was
generally lower than the ability estimates of the test takers in Group A-High. The
numbers in“Rank”column indicates the difficulty ranl(ing of each item out of 27
items included in Test Set B.
Table 5・12 Final Parameter Estimates of test items for Group A-High
Item# Difficul Rank Std. Error Chi s. df Sc. Diff
1 0,982 7 0,180 8,249 5 1092 2,575 2 0,157 4,634 5 123
3 1,995 5 0,155 4,524 5 118
4 一1.866 25 0,563 1,565 5 83
5 一〇.312 14 0,279 3,488 5 97
6 1,740 6 0,157 5,201 5 116
7 2,456 3 0,155 5,399 5 122
8 0,823 9 0,187 4,118 5 107
9 2,041 4 0,154 1,467 5 119
13 一〇.105 13 0,256 4,752 5 99
14 一〇.770 17 0,339 2,884 5 93
15 一1.583 23 0,492 87,938 5 86
16 2,647 1 0,158 5,898 5 124
17 一〇.662 16 0,323 5,814 5 94
18 一〇.390 15 0,288 8,646 5 96
19 一1.583 24 0,492 4,278 5 86
20 0,126 11 0,235 14,538 5 101
21 0,951 8 0,181 4,244 5 109
22 一2.263 27 0,682 0,783 5 79
23 0,016 12 0,245 4,522 5 100
24 一〇.770 18 0,339 6,776 5 93
25 一〇.890 19 0,357 2,161 5 92
26 一1.362 22 0,444 3,335 5 88
27 一1.866 26 0,563 4,698 5 83
28 一1.180 21 0,408 12,565 5 89
29 一1.025 20 0,380 5,544 5 91
30 0,276 10 0,222 6,360 5 103
90
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
In Table 5-13, the means of item difficulty for test items selected in Table 5-9
according to their question types are indicated so that the relationship between item
difficulty and question type of the items could be investigated. Only the items from
Table 5-9 were employed in this part of analysis because they were the items that
loaded heavily on each factor in the factor analytic study and bore explicit features of
each question type.
Table 5-13 Means of item difficulty’for each question type in Group A・High
Literal
18 一〇,390
25 一〇.890
26 一1.362
27 一1.866
Mean 一1.127
Inferential
16 2,647
23 0,016
30 0,276
Mean 0,980
Local
17 一〇.662
19 一1.583
28 一1.180
30 0,276
Mean 一〇.787
Global
4 一1.866
5 一〇.312
6 1,740
7 2,456
22 一2.263
Mean 一〇.049
At-test was carried out fbr the precise examination of the difference in the
mean difficulties of these two pairs of items. The difference was significant
between‘‘literal”and‘‘inferential”items(p.<0.05[p.=0.045])but not between
“local”and“global”items(p.>0.1[p.=0.533]). From this result, it can be seen that,
fbr the population of Group A-High,‘‘inferential”items pose more dif丘culty than
‘‘撃奄狽?窒≠戟hitems, but no difference could be found with regard to the local/global nature
of reading performance・
5.3.4Gmup B
Based on the test performances of the test takers in Group B, item di伍culty
was calibrated via Rasch Analysis fbr each test item in Test Set B. RASCAL
91
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)
converged after 3 Loops. The final parameter estimates are presented in Table 5-14.
Appendix E includes the raw score conversion table, the item by person distribution
map, the test characteristic curve, and the test infbmation curve. No problems were
fb皿d with the test characteristic cinve and the test information curve, and the item by
person distribution map indicated that the difficulty of items in Test Set B was
generally equal to the ability estimates of the test takers in Group B. The numbers
in“Rank”column of Table 5-14 indicates the difficulty ranking of each item out of 27
items included in Test Set B.
1「able 5・14 Final Parameter Estimates of test items for Group B
As it was explained in Section 5.3.1, items丘om Test Set B cannot be
incorporated in the analysis of the relationship between question types and item
difficulty since the‘‘item position”factor(or‘‘where a test item is indicated in the test
set), which accounted fbr the first factor in the factor analytic study of Group B
performances on Test S et B, was very strong, and only a few items showed high
loadings on the second factor・
92
東京外国語大学 博士学位論文 Doctoral thesis (Tokyo University of Foreign Studies)