Upload
donal
View
78
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Is the vocabulary learning burden of Japanese really heavier than that of English ? 日本語の語彙学習負担 は 本当 に英語よりも大きい か?. Tatsuhiko Matsushita PhD candidate, Victoria University of Wellington 17th Biennial Conference of the Japanese Studies Association of Australia (JSAA). Contents 本発表の内容. - PowerPoint PPT Presentation
Citation preview
Is the vocabulary learning burden of Japanese really
heavier than that of English?日本語の語彙学習負担は本当に英語よりも大きいか?Tatsuhiko MatsushitaPhD candidate, Victoria University of Wellington
17th Biennial Conference of the Japanese Studies Association of Australia (JSAA)
Contents 本発表の内容1. Motives for the
study2. Goals and research
questions 3. Method4. Results5. Discussion6. Conclusion
• References
1. 研究動機2. 目的・研究課題3. 方法4. 結果5. 考察6. まとめ• 引用文献
1. Motives for the study 研究動機 (1)• Heavy burden in learning Japanese vocabulary?
(Tamamura, 1984)• Text coverage study テキストカバー率の研究 Text coverage = Coverage of word tokens (延べ語数)• Top (=most frequent) 1000 words cover 60% in Japanese
magazines(NINJAL: The National Institute for Japanese Language, 1962;
2006)• Top 1000 words cover over 70% in English
(e.g., Carroll, Davies & Richman, 1971). • To reach 95%/98% text coverage, 9500/20000 words (lexeme 語彙素 ) are required in Japanese, while only 5000/9000 word families are required in English. (Matsushita, 2011; Nation, 2006)
Note: Word family (English) ≒ Lexeme (Japanese)?
1. Motives for the study 研究動機 (2)Word family (English) ≒ Lexeme 語彙素 (Japanese)?• “Word family” adopted by Nation (2006)• Level 6 of Bauer & Nation (1993) -- including derived words with
frequent affixes and ‘regular but infrequent affixes’e.g. Members of abbreviate : abbreviate, abbreviates, abbreviated,
abbreviating, abbreviation, abbreviations• Lexeme defined by UniDic (Den et al., 2009) Members of the short unit 短単位 of a lexeme e.g. 読む - 読み , やはり - やっぱり , 足 - 脚 , 受け入れる cf. 短縮/する
• Why is the text coverage in Japanese and English so different?• Possible explanation: many groups of words with different word-origins 語種 but similar meanings (e.g., Akimoto, 2002)
e.g., 旅館 , 宿屋 , ホテル
1. Motives for the study 研究動機 (3)• Questions about the explanation• Method: magazine texts? Coverage not including function words?• English synonyms with different word-origins e.g. liberty-freedom,
spirit-soul• Nature of Japanese: many transparent compounds composed of Kanji
e.g. 春季 /shunki/: low frequency word (Ranked at 28587 in Matsushita (2011))
春 /haru/: high frequency word (1019, ibid) 季節 /kisetsu/: high frequency word (1955, ibid)
not difficult to infer the meaning of 春季 if the meanings of 春 and 季節 are already known (春季 is transparent )• For those words, learners normally only need to understand the meanings of components and word formation rules –either implicitly or explicitly. cf. Harlan (2011)
1. Motives for the study 研究動機 (4)cf. Harlan (2011) = a comedian ‘Pakkun’ (パックン)「漢字はある程度覚えると、逆に語彙力を上げるのがすごく簡単になるんです。基本の数を覚えてしまえばあとは応用が利くこともありますし、 100 覚えれば、その次の 100 覚えるのがさらに早くなる。 500 覚えたら、その次の 500 、 1000 が倍、 3 倍速くなるんです。」「漢字を覚えると、新しく聞いた単語を漢字で分析すれば、その意味もわかります。「冷蔵庫」の冷は冷やす、蔵は「くら」だし、車庫の庫で、何か物置的なイメージです。その3つの字を組み合わせれば何となく意味がわかります。」
2. Goals and Research Questions 目的・研究課題Goals: To estimate the true learning burden of Japanese vocabularyTo think about more efficient order for learning Japanese vocabulary
Research Questions:1. How many ‘characters’ learners need to learn to attain a certain
level of text coverage of ‘words’?Note: it is not to see the simple text coverage by character.
cf. Chikamatsu et al. (2000) To know the meaning of a single character 節 is NOT enough to understand the
meaning of 季節 .
2. Do the characters which provide the certain level of text coverage (in Q.1) cover all the high frequency words? If no, what Kanji are further required to cover the words? (Is there any discrepancy between the word frequencies and character frequencies?)
3. Method 方法 (1) - 11) Calculate character frequencies in BCCWJ (the
Balanced Corpus of Contemporary Written Japanese 現代日本語書き言葉均衡コーパス (BCCWJ) 2009 monitor version: NINJAL, 2009)
2) Give a learning order ranking to each characterI. Rank the types of character as Alphabet, Hiragana, Katakana
and Kanji/signsII. Rank Kanji by frequency
3) List all words in orthographic forms (書字形) in BCCWJ
4) Separate each word into characters5) Give the learning order ranking to each character6) Calculate the text coverage by filtering the character
of the words by learning order ranking
3. Method 方法 (1) -2BCCWJ 2009 monitor version (NINJAL, 2009)• Book corpus (approx. 28 million running words) and • Internet forum site corpus (approx. 5 million running words)
•Unit of counting a ‘word’ used for this study: • the short form (短単位) defined by UniDic (Den et al., 2009)• the orthographic form (書字形)
i.e. 書く / 書か / かく or 足 / 脚 are counted as different orthographic forms but as one lexeme (語彙素)
3. Method 方法 (2)For RQ. 2,
• Identify the relationship between Kanji frequency levels & the former JLPT 旧日本語能力検定試験 Kanji levels to check if the JLPT Kanji are ranked properly• Identify the words which are not covered by the high frequency Kanji and check what Kanji are used in those words
4. Results 結果 (1) - 1RQ. 1: How many ‘characters’ learners need to learn to attain a certain level of text coverage of ‘words’?• 64% of the words (half of them are function words):
covered only by the phonographic characters (Hiragana, Katakana and alphabet)
• 82% : by phonographic characters + top 300 Kanji• Learning 100 Kanji in top 1000 Kanji means potential understanding of 6000 – 7000 types 異なり語 of orthographic forms (3000–4000 lexemes)
4. Results 結果 (1) - 2• 95 - 96%: by phonographic characters 表音文字
& top 1000 – 1100 Kanji threshold level for reading comprehension?
(Hu & Nation, 2000; Komori et al., 2004)• 98%: by phonographic characters & top 1500 kanji
4. Results Number/Ratio of Words (orthographic forms) and Text Coverage by Character Types (+Level of Kanji) in Japanese 日本語の文字タイプ(+漢字レベル)別の語の数/割合とテキストカバー率
Type of Chracter (+Level of Kanji)(*)文字タイプ(+漢字レベル)
A: AlphabetsH: HiraganaK: Katanaka
Number of Words(orthographic forms)by Character Types文字タイプ(+漢字レベル別)の語数(書字形)
CummulativeNumber of Words(orthographic forms)by Character Types文字タイプ(+漢字レベル別)の累積語数(書字形)
Increased TextCoverage by theWords Increasedby Learning 100More Kanji
100漢字 字増加による語のテキストカバー率増
CummulativeText Coverageby the Words語の累積テキストカバー率
Increased TextCoverage by theCharacters byLearning 100more Kanji
100漢字 字増加による文字のテキストカバー率
CummulativeText Coverageby theCharacters文字の累積テキストカバー率
Only Alphabets アルファベットのみ 17712 17712 0.7% 0.7% 1.1% 1.1%Only Hiragana (*) ひらがなのみ 20272 37984 59.7% 60.4% 51.9% 52.9%Mixture of A & H アルファベット・ひらがな混合 1 37985 0.0% 60.4% 0.0% 52.9%Only Katakana (*) カタカナのみ 49349 87334 3.3% 63.6% 7.3% 60.2%Mixture of A/ H/ K アルファベット・ひらがな・カタカナ混合 625 87959 0.0% 63.6% 0.0% 60.2%Ranking 1- 100 Kanji +A,H & K 100+漢字 字 7187 95146 10.1% 73.8% 9.7% 70.0%Ranking 101- 200 Kanji +A,H & K 200+漢字 字 7360 102506 5.2% 79.0% 5.8% 75.8%Ranking 201- 300 Kanji +A,H & K 300+漢字 字 7318 109894 3.6% 82.6% 4.1% 79.9%Ranking 301- 400 Kanji +A,H & K 400+漢字 字 6636 116530 2.8% 85.4% 3.3% 83.1%Ranking 401- 500 Kanji +A,H & K 500+漢字 字 6830 123360 2.6% 88.0% 2.9% 86.0%Ranking 501- 600 Kanji +A,H & K 600+漢字 字 6820 130180 2.0% 90.0% 2.4% 88.4%Ranking 601- 700 Kanji +A,H & K 700+漢字 字 6585 136765 1.6% 91.6% 1.8% 90.2%Ranking 701- 800 Kanji +A,H & K 800+漢字 字 6393 143158 1.4% 93.0% 1.6% 91.8%Ranking 801- 900 Kanji +A,H & K 900+漢字 字 6186 149344 1.1% 94.1% 1.4% 93.2%Ranking 901-1000 Kanji +A,H & K 1000+漢字 字 5427 154771 1.0% 95.1% 1.2% 94.4%Ranking 1001-1100 Kanji +A,H & K 1100+漢字 字 4703 159474 0.8% 96.0% 1.0% 95.3%Ranking 1101-1200 Kanji +A,H & K 1200+漢字 字 4262 163736 0.7% 96.6% 0.8% 96.1%Ranking 1201-1300 Kanji +A,H & K 1300+漢字 字 4222 167958 0.6% 97.2% 0.7% 96.8%Ranking 1301-1400 Kanji +A,H & K 1400+漢字 字 3691 171649 0.5% 97.7% 0.5% 97.4%Ranking 1401-1500 Kanji +A,H & K 1500+漢字 字 3541 175190 0.4% 98.1% 0.4% 97.8%Ranking 1501-1600 Kanji +A,H & K 1600+漢字 字 2909 178099 0.3% 98.4% 0.4% 98.2%Ranking 1601-1700 Kanji +A,H & K 1700+漢字 字 2793 180892 0.3% 98.6% 0.3% 98.5%Ranking 1701-1800 Kanji +A,H & K 1800+漢字 字 2554 183446 0.2% 98.9% 0.3% 98.7%Ranking 1801-1900 Kanji +A,H & K 1900+漢字 字 2164 185610 0.2% 99.0% 0.2% 98.9%Ranking 1901-2000 Kanji +A,H & K 2000+漢字 字 1993 187603 0.2% 99.2% 0.2% 99.1%Ranking 2001-2100 Kanji +A,H & K 2100+漢字 字 1933 189536 0.1% 99.3% 0.1% 99.3%Ranking 2101-2200 Kanji +A,H & K 2200+漢字 字 1495 191031 0.1% 99.4% 0.1% 99.4%Ranking 2201-2300 Kanji +A,H & K 2300+漢字 字 1427 192458 0.1% 99.5% 0.1% 99.5%Ranking 2301- 6323 Kanji +A,H & K +全部 15373 207831 0.5% 100.0% 0.5% 100.0%
4. Results 結果 日本語の単語のテキストカバー率(漢字レベル別/累積)
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
50.0%
55.0%
60.0%
65.0%
70.0%
75.0%
80.0%
85.0%
90.0%
95.0%
100.0%
Rank
ing
1-100
Kan
ji +A
,H
…Ra
nkin
g 10
1-200
Kan
ji +A
,H
…Ra
nkin
g 20
1-300
Kan
ji +A
,H
…Ra
nkin
g 30
1-400
Kan
ji +A
,H
…Ra
nkin
g 40
1-500
Kan
ji +A
,H
…Ra
nkin
g 50
1-600
Kan
ji +A
,H
…Ra
nkin
g 60
1-700
Kan
ji +A
,H
…Ra
nkin
g 70
1-800
Kan
ji +A
,H
…Ra
nkin
g 80
1-900
Kan
ji +A
,H
…Ra
nkin
g 90
1-100
0 Ka
nji
…Ra
nkin
g 10
01-110
0 Ka
nji …
Rank
ing
1101-1
200
Kanj
i …Ra
nkin
g 12
01-130
0 Ka
nji …
Rank
ing
1301-1
400
Kanj
i …Ra
nkin
g 14
01-150
0 Ka
nji …
Rank
ing
1501-1
600
Kanj
i …Ra
nkin
g 16
01-170
0 Ka
nji …
Rank
ing
1701-1
800
Kanj
i …Ra
nkin
g 18
01-190
0 Ka
nji …
Rank
ing
1901-2
000
Kanj
i …Ra
nkin
g 20
01-210
0 Ka
nji …
Rank
ing
2101-2
200
Kanj
i …Ra
nkin
g 22
01-230
0 Ka
nji …
Cummulative Text Coverage by the Words語の累積テキストカバー率
Increased Text Coverage by the Words Increased by Learning 100 More Kanji漢字100字増加による語のテキストカバー率増加分
4. Results 結果 (2) - 1RQ. 2: Do the characters which provide the text coverage in Q.1 cover all the high frequency words? If no, what Kanji are further required to cover the words?(Is there any discrepancy between the word frequencies and character frequencies?)
i.e. Can low frequency Kanji be barrier against learning high frequency words?
Number of Kanji at Different Frequency Levels and the Former JLPT Levels
4 3 2 1 Subtotal Others Total1-100 39 41 20 0 100 0 100
101-300 27 53 110 9 199 1 200301-1000 14 63 437 178 692 8 7001001-2000 0 8 187 580 775 225 1000
Others 0 0 1 159 160 4163 4323
Total 80 165 755 926 1926 4397 6323
Former JLPT LevelKanji Frequency Level
4. Results 結果 (2) - 2• A narrow gap between Kanji frequency level and the former JLPT Kanji Level• Among the top 1000 Kanji, more than 800 Kanji are covered by the Kanji at the former JLPT level 4, 3 and 2•More than 96% of the word tokens (延べ語数) in general texts will be covered by 1200 Kanji of: • All Kanji at the former JLPT level 4, 3, and 2 (Total: 1000)• + Top 200 Kanji at the former JLPT level 1
4. Results 結果 (2) - 3Top 196 Kanji at the former JLPT level 1 and others 級外• Within the top 300: 保義公価基条応態郎 & 々• Within the top 1000: 張士氏視素護離証企異評提姿井統振吉策影紀為宮江派僕従系衛皇展案松隊施我整及織環響修遺宗昭撃株節源養項興故裁沢端障志激弁益嫌佐司眼密載己債訳症標健納請授挙恵貴徳推描崎抗属盛監傷創徴街善援衆康模敵津拠継隠称尾聖鮮厳攻妙融丈筋帝秘敷驚射壊刑壁染功訴討幕扱脱範契弾診詳房避酸倉充典繰儀至削博瞬仮縁憲択就聴握詩秀柄浜滅拡惑踏華闘微雄維隣如審誘賀郷霊釈黙魔携掲遣艦剣致 & 誰頃藤俺之岡伊阪
4. Results 結果 (2) - 4• 95% text coverage requires• Top 9600 lexemes / Top 20749 orthographic forms (types 異なり語数 )• Top 1000 Kanji +Hiragana, Katakana + alphabet
•Within top 9600 lexemes, 1700 lexemes are estimated to require Kanji beyond the top 1000
e.g. 比較、記憶、批判、距離、指摘、希望、分析、 韓国、基礎、誕生、監督、雰囲気、卒業、洗濯•Many of them are often written in Hiragana/Katakana
e.g. 即ち、駄目、奴、凄い、頑張る、挨拶、嘘、 煙草、匂い、只、是非、無駄、喧嘩、噂、伺う
5. Discussion 考察 1)• For general texts, learners can attain more than 70% comprehension with the 95-96% coverage
(For English, see Hu & Nation, 2000; for Japanese, see Komori, Mikuni & Kondo, 2004)
• Learning Kanji by order of frequency is much more efficient to gain higher text coverage (Zipf’s Law: Zipf, 1949)• Top 300 – 500 Kanji seems much more essential• Top 1000- 1500 Kanji might be enough for general purposes (with occasional use of dictionary)• It may also mean that learning Kanji without reaching the threshold level is of little use…
5. Discussion 考察 2)• Also, to attain 95% coverage, 1000 Kanji are required; however, there are some important words not covered by the top 1000 Kanji• In other words, some low frequency Kanji are used for high frequency words• Many of those Kanji has low productivity, that is, they are rarely used for other words e.g. 雰囲気、卒業、洗濯• To cover top 9600 words (lexemes), further 200 – 500 Kanji are estimated to be required
5. Discussion 考察 3)• Certainly, the burden of learning Japanese characters is heavier than most other languages•However, the burden of learning Japanese vocabulary may be rather lighter once the learner knows: • the 1000-1500 characters• word formation/compounding rules of Kanji• metaphors of Kanji compounds e.g. 入門 : entering a gate first step, to start training
• despite the fact that the text coverage is lower than English at all word frequency levels
5. Discussion 考察 4)In other words, it is possible that • the number of ‘units of learning Japanese vocabulary’ is not so many as generally perceived• It will also be important for students/teachers to learn/teach• association of different readings (typically On-reading and Kun-
reading) of each Kanji to reduce the burden of learning Japanese vocabulary e.g. 入門 /nyuRmoN/ 入る /hairu/ + 門 /moN/ 入る (freq. ranking: 117) is more likely to be learned earlier than 入門 (freq. ranking: 6369) (Matsushita, 2011)• Without this kind of association, learners have to learn more words separately
6. Conclusion まとめ• 63% of BCCWJ texts are covered without Kanji (but half of them are function words)• To attain 95% coverage, 1000 Kanji are required; however, some important words are not covered by the top 1000 Kanji• To cover those words, further hundreds of Kanji will be required• The text coverage in Japanese are generally lower than in English, i.e. Japanese requires more words to learn• However, many Japanese words are composed of limited number of Kanji, therefore, the burden of learning Japanese vocabulary may not be heavy as expected from the text coverage studies, once the learner knows: • the 1000-1500 characters• form, meaning and compounding rules of Kanji• metaphors of Kanji compounds• association of different readings (e.g. On-reading and Kun-reading) of each Kanji
These slides will be uploaded in the site shown below.
「松下言語学習ラボ」http://www.wa.commufa.jp/~tatsum/
You can find the site by Google with the key words of 松下 (Matsushita) and 言語 (language).
References 引用文献 1) Akimoto, M. (秋元美晴) . (2002). よくわかる語彙
[Uniderstanding Vocabulary]. Tokyo: Alc (アルク) . Bauer, L. & Nation, P. (1993). Word families.
International Journal of Lexicography. 6(4), 253-279.
Carroll, J. B., Davies, P., & Richman, B. (1971). Word Frequency Book. New York: Houghton Mifflin, Boston American Heritage.
Chikamatsu, N., Yokoyama, S., Nozaki, H., Long, E., & Fukuda, S. (2000). A Japanese logographic character frequency list for cognitive science research. Behavior Research Methods, Instruments, & Computers, 32(3), 482-500.
Den, Y. (伝康晴) , Yamada, A. (山田篤) , Ogura, H. (小椋秀樹) , Koiso, H. (小磯花絵) , & Ogiso, T. (小木曽智信) . (2009). UniDic. Version. 1.3.11.
Downloaded from http://www.tokuteicorpus.jp/dist/
References 引用文献 2) Harlan, P. (パトリック・ハーラン) . (2011). ゼロからの日本語学
習と僕の好きな日本のカルチャー (Learning Japanese from zero, and the Japanese culture I like). Cited from http://www.wochikochi.jp/topstory/2011/04/packun.php
Hu, M. H. & Nation, P. (2000). Vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403-430.
Komori, K. (小森和子) , Mikuni, J. (三國純子) , & Kondo, A. (近藤安月子) . (2004). 文章理解を促進する語彙知識の量的側面 ―既知語率の閾値探索の試み― (What percentage of known words in a text facilitates reading comprehension: a case study for exploration of the threshold of known words coverage). 日本語教育 [Teaching Japanese as a Foreign Language], 125, 83-92.
References 引用文献 3) Matsushita, T. (松下達彦) . (2011). 日本語を読むための語彙データベース (The Database for Reading Japanese). Downloaded from http://www.geocities.jp/tatsum2003/ Nation, I. S. P. (2006). How large a vocabulary is
needed for reading and listening? Canadian Modern Language Review, 63(1), 59-82.
NINJAL: The National Institute for Japanese Language ( 国立国語研究所 ). (1962). 現代雑誌 90 種の用字・用語 第一分冊 総記および語彙表 (Vocabulary and Chinese characters in ninety magazines of today: (Volume I) General description & vocabulary frequency tables). Tokyo: Shuuei Shuppan ( 秀英出版 ).
References 引用文献 4) NINJAL: The National Institute for Japanese Language
( 国立国語研究所 ). (2006). 現代雑誌 200 万字言語調査語彙表 (The vocabulary lists from the language survey of contemporary magazines with two million running characters). Downloaded from http://www.kokken.go.jp/katsudo/seika/goityosa/index.html
Tamamura, F. ( 玉村文郎 ). (1984). 語彙の研究と教育(上) . Tokyo: The National Institute for Japanese Language ( 国立国語研究所 ).
Zipf, G. (1949). Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. New York: Hafner.