29
Is the vocabulary learning burden of Japanese really heavier than that of English? 日日日日日日日日日日日 日日日日日日日日日日日日Tatsuhiko Matsushita PhD candidate, Victoria University of Wellington 17th Biennial Conference of the Japanese Studies Association of Australia (JSAA)

Tatsuhiko Matsushita PhD candidate, Victoria University of Wellington

  • Upload
    donal

  • View
    78

  • Download
    2

Embed Size (px)

DESCRIPTION

Is the vocabulary learning burden of Japanese really heavier than that of English ? 日本語の語彙学習負担 は 本当 に英語よりも大きい か?. Tatsuhiko Matsushita PhD candidate, Victoria University of Wellington 17th Biennial Conference of the Japanese Studies Association of Australia (JSAA). Contents 本発表の内容. - PowerPoint PPT Presentation

Citation preview

Page 1: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

Is the vocabulary learning burden of Japanese really

heavier than that of English?日本語の語彙学習負担は本当に英語よりも大きいか?Tatsuhiko MatsushitaPhD candidate, Victoria University of Wellington

17th Biennial Conference of the Japanese Studies Association of Australia (JSAA)

Page 2: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

Contents 本発表の内容1. Motives for the

study2. Goals and research

questions 3. Method4. Results5. Discussion6. Conclusion

• References

1. 研究動機2. 目的・研究課題3. 方法4. 結果5. 考察6. まとめ• 引用文献

Page 3: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

1. Motives for the study  研究動機 (1)• Heavy burden in learning Japanese vocabulary?

(Tamamura, 1984)• Text coverage study テキストカバー率の研究 Text coverage = Coverage of word tokens (延べ語数)• Top (=most frequent) 1000 words cover 60% in Japanese

magazines(NINJAL: The National Institute for Japanese Language, 1962;

2006)• Top 1000 words cover over 70% in English

(e.g., Carroll, Davies & Richman, 1971). • To reach 95%/98% text coverage, 9500/20000 words (lexeme 語彙素 ) are required in Japanese, while only 5000/9000 word families are required in English. (Matsushita, 2011; Nation, 2006)

Note: Word family (English) ≒ Lexeme (Japanese)?

Page 4: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

1. Motives for the study  研究動機 (2)Word family (English) ≒ Lexeme 語彙素 (Japanese)?• “Word family” adopted by Nation (2006)• Level 6 of Bauer & Nation (1993) -- including derived words with

frequent affixes and ‘regular but infrequent affixes’e.g. Members of abbreviate : abbreviate, abbreviates, abbreviated,

abbreviating, abbreviation, abbreviations• Lexeme defined by UniDic (Den et al., 2009) Members of the short unit 短単位 of a lexeme e.g. 読む - 読み , やはり - やっぱり , 足 - 脚 , 受け入れる cf. 短縮/する

• Why is the text coverage in Japanese and English so different?• Possible explanation: many groups of words with different word-origins 語種 but similar meanings (e.g., Akimoto, 2002)

e.g., 旅館 , 宿屋 , ホテル

Page 5: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

1. Motives for the study  研究動機 (3)• Questions about the explanation• Method: magazine texts? Coverage not including function words?• English synonyms with different word-origins e.g. liberty-freedom,

spirit-soul• Nature of Japanese: many transparent compounds composed of Kanji

e.g. 春季 /shunki/: low frequency word (Ranked at 28587 in Matsushita (2011))

春 /haru/: high frequency word (1019, ibid) 季節 /kisetsu/: high frequency word (1955, ibid)

not difficult to infer the meaning of 春季 if the meanings of 春 and 季節 are already known (春季 is transparent )• For those words, learners normally only need to understand the meanings of components and word formation rules –either implicitly or explicitly. cf. Harlan (2011)

Tatsu
書籍中心のテキストで、機能語を含めて計算すると73%になる。(松下2010)
Page 6: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

1. Motives for the study  研究動機 (4)cf. Harlan (2011) = a comedian ‘Pakkun’ (パックン)「漢字はある程度覚えると、逆に語彙力を上げるのがすごく簡単になるんです。基本の数を覚えてしまえばあとは応用が利くこともありますし、 100 覚えれば、その次の 100 覚えるのがさらに早くなる。 500 覚えたら、その次の 500 、 1000 が倍、 3 倍速くなるんです。」「漢字を覚えると、新しく聞いた単語を漢字で分析すれば、その意味もわかります。「冷蔵庫」の冷は冷やす、蔵は「くら」だし、車庫の庫で、何か物置的なイメージです。その3つの字を組み合わせれば何となく意味がわかります。」

Page 7: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

2. Goals and Research Questions 目的・研究課題Goals: To estimate the true learning burden of Japanese vocabularyTo think about more efficient order for learning Japanese vocabulary

Research Questions:1. How many ‘characters’ learners need to learn to attain a certain

level of text coverage of ‘words’?Note: it is not to see the simple text coverage by character.

cf. Chikamatsu et al. (2000) To know the meaning of a single character 節 is NOT enough to understand the

meaning of 季節 .

2. Do the characters which provide the certain level of text coverage (in Q.1) cover all the high frequency words? If no, what Kanji are further required to cover the words? (Is there any discrepancy between the word frequencies and character frequencies?)

Page 8: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

3. Method  方法 (1) - 11) Calculate character frequencies in BCCWJ (the

Balanced Corpus of Contemporary Written Japanese 現代日本語書き言葉均衡コーパス (BCCWJ) 2009 monitor version: NINJAL, 2009)

2) Give a learning order ranking to each characterI. Rank the types of character as Alphabet, Hiragana, Katakana

and Kanji/signsII. Rank Kanji by frequency

3) List all words in orthographic forms (書字形) in BCCWJ

4) Separate each word into characters5) Give the learning order ranking to each character6) Calculate the text coverage by filtering the character

of the words by learning order ranking

Page 9: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

3. Method  方法 (1) -2BCCWJ 2009 monitor version (NINJAL, 2009)• Book corpus (approx. 28 million running words) and • Internet forum site corpus (approx. 5 million running words)

•Unit of counting a ‘word’ used for this study: • the short form (短単位) defined by UniDic (Den et al., 2009)• the orthographic form (書字形)

i.e. 書く / 書か / かく  or  足 / 脚 are counted as different orthographic forms but as one lexeme (語彙素)

Page 10: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

3. Method  方法 (2)For RQ. 2,

• Identify the relationship between Kanji frequency levels & the former JLPT 旧日本語能力検定試験 Kanji levels to check if the JLPT Kanji are ranked properly• Identify the words which are not covered by the high frequency Kanji and check what Kanji are used in those words

Page 11: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results  結果 (1) - 1RQ. 1: How many ‘characters’ learners need to learn to attain a certain level of text coverage of ‘words’?• 64% of the words (half of them are function words):

covered only by the phonographic characters (Hiragana, Katakana and alphabet)

• 82% : by phonographic characters + top 300 Kanji• Learning 100 Kanji in top 1000 Kanji means potential understanding of 6000 – 7000 types 異なり語 of orthographic forms (3000–4000 lexemes)

Page 12: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results  結果 (1) - 2• 95 - 96%: by phonographic characters 表音文字

& top 1000 – 1100 Kanji threshold level for reading comprehension?

(Hu & Nation, 2000; Komori et al., 2004)• 98%: by phonographic characters & top 1500 kanji

Page 13: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results   Number/Ratio of Words (orthographic forms) and Text Coverage by Character Types (+Level of Kanji) in Japanese 日本語の文字タイプ(+漢字レベル)別の語の数/割合とテキストカバー率

Type of Chracter (+Level of Kanji)(*)文字タイプ(+漢字レベル)

A: AlphabetsH: HiraganaK: Katanaka

Number of Words(orthographic forms)by Character Types文字タイプ(+漢字レベル別)の語数(書字形)

CummulativeNumber of Words(orthographic forms)by Character Types文字タイプ(+漢字レベル別)の累積語数(書字形)

Increased TextCoverage by theWords Increasedby Learning 100More Kanji

100漢字 字増加による語のテキストカバー率増

CummulativeText Coverageby the Words語の累積テキストカバー率

Increased TextCoverage by theCharacters byLearning 100more Kanji

100漢字 字増加による文字のテキストカバー率

CummulativeText Coverageby theCharacters文字の累積テキストカバー率

Only Alphabets アルファベットのみ 17712 17712 0.7% 0.7% 1.1% 1.1%Only Hiragana (*) ひらがなのみ 20272 37984 59.7% 60.4% 51.9% 52.9%Mixture of A & H アルファベット・ひらがな混合 1 37985 0.0% 60.4% 0.0% 52.9%Only Katakana (*) カタカナのみ 49349 87334 3.3% 63.6% 7.3% 60.2%Mixture of A/ H/ K アルファベット・ひらがな・カタカナ混合 625 87959 0.0% 63.6% 0.0% 60.2%Ranking 1- 100 Kanji +A,H & K 100+漢字 字 7187 95146 10.1% 73.8% 9.7% 70.0%Ranking 101- 200 Kanji +A,H & K 200+漢字 字 7360 102506 5.2% 79.0% 5.8% 75.8%Ranking 201- 300 Kanji +A,H & K 300+漢字 字 7318 109894 3.6% 82.6% 4.1% 79.9%Ranking 301- 400 Kanji +A,H & K 400+漢字 字 6636 116530 2.8% 85.4% 3.3% 83.1%Ranking 401- 500 Kanji +A,H & K 500+漢字 字 6830 123360 2.6% 88.0% 2.9% 86.0%Ranking 501- 600 Kanji +A,H & K 600+漢字 字 6820 130180 2.0% 90.0% 2.4% 88.4%Ranking 601- 700 Kanji +A,H & K 700+漢字 字 6585 136765 1.6% 91.6% 1.8% 90.2%Ranking 701- 800 Kanji +A,H & K 800+漢字 字 6393 143158 1.4% 93.0% 1.6% 91.8%Ranking 801- 900 Kanji +A,H & K 900+漢字 字 6186 149344 1.1% 94.1% 1.4% 93.2%Ranking 901-1000 Kanji +A,H & K 1000+漢字 字 5427 154771 1.0% 95.1% 1.2% 94.4%Ranking 1001-1100 Kanji +A,H & K 1100+漢字 字 4703 159474 0.8% 96.0% 1.0% 95.3%Ranking 1101-1200 Kanji +A,H & K 1200+漢字 字 4262 163736 0.7% 96.6% 0.8% 96.1%Ranking 1201-1300 Kanji +A,H & K 1300+漢字 字 4222 167958 0.6% 97.2% 0.7% 96.8%Ranking 1301-1400 Kanji +A,H & K 1400+漢字 字 3691 171649 0.5% 97.7% 0.5% 97.4%Ranking 1401-1500 Kanji +A,H & K 1500+漢字 字 3541 175190 0.4% 98.1% 0.4% 97.8%Ranking 1501-1600 Kanji +A,H & K 1600+漢字 字 2909 178099 0.3% 98.4% 0.4% 98.2%Ranking 1601-1700 Kanji +A,H & K 1700+漢字 字 2793 180892 0.3% 98.6% 0.3% 98.5%Ranking 1701-1800 Kanji +A,H & K 1800+漢字 字 2554 183446 0.2% 98.9% 0.3% 98.7%Ranking 1801-1900 Kanji +A,H & K 1900+漢字 字 2164 185610 0.2% 99.0% 0.2% 98.9%Ranking 1901-2000 Kanji +A,H & K 2000+漢字 字 1993 187603 0.2% 99.2% 0.2% 99.1%Ranking 2001-2100 Kanji +A,H & K 2100+漢字 字 1933 189536 0.1% 99.3% 0.1% 99.3%Ranking 2101-2200 Kanji +A,H & K 2200+漢字 字 1495 191031 0.1% 99.4% 0.1% 99.4%Ranking 2201-2300 Kanji +A,H & K 2300+漢字 字 1427 192458 0.1% 99.5% 0.1% 99.5%Ranking 2301- 6323 Kanji +A,H & K +全部 15373 207831 0.5% 100.0% 0.5% 100.0%

Page 14: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results 結果 日本語の単語のテキストカバー率(漢字レベル別/累積)

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

50.0%

55.0%

60.0%

65.0%

70.0%

75.0%

80.0%

85.0%

90.0%

95.0%

100.0%

Rank

ing

1-100

Kan

ji +A

,H

…Ra

nkin

g 10

1-200

Kan

ji +A

,H

…Ra

nkin

g 20

1-300

Kan

ji +A

,H

…Ra

nkin

g 30

1-400

Kan

ji +A

,H

…Ra

nkin

g 40

1-500

Kan

ji +A

,H

…Ra

nkin

g 50

1-600

Kan

ji +A

,H

…Ra

nkin

g 60

1-700

Kan

ji +A

,H

…Ra

nkin

g 70

1-800

Kan

ji +A

,H

…Ra

nkin

g 80

1-900

Kan

ji +A

,H

…Ra

nkin

g 90

1-100

0 Ka

nji

…Ra

nkin

g 10

01-110

0 Ka

nji …

Rank

ing

1101-1

200

Kanj

i …Ra

nkin

g 12

01-130

0 Ka

nji …

Rank

ing

1301-1

400

Kanj

i …Ra

nkin

g 14

01-150

0 Ka

nji …

Rank

ing

1501-1

600

Kanj

i …Ra

nkin

g 16

01-170

0 Ka

nji …

Rank

ing

1701-1

800

Kanj

i …Ra

nkin

g 18

01-190

0 Ka

nji …

Rank

ing

1901-2

000

Kanj

i …Ra

nkin

g 20

01-210

0 Ka

nji …

Rank

ing

2101-2

200

Kanj

i …Ra

nkin

g 22

01-230

0 Ka

nji …

Cummulative Text Coverage by the Words語の累積テキストカバー率

Increased Text Coverage by the Words Increased by Learning 100 More Kanji漢字100字増加による語のテキストカバー率増加分

Page 15: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results 結果 (2) - 1RQ. 2: Do the characters which provide the text coverage in Q.1 cover all the high frequency words? If no, what Kanji are further required to cover the words?(Is there any discrepancy between the word frequencies and character frequencies?)

i.e. Can low frequency Kanji be barrier against learning high frequency words?

Page 16: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

Number of Kanji at Different Frequency Levels and the Former JLPT Levels

4 3 2 1 Subtotal Others Total1-100 39 41 20 0 100 0 100

101-300 27 53 110 9 199 1 200301-1000 14 63 437 178 692 8 7001001-2000 0 8 187 580 775 225 1000

Others 0 0 1 159 160 4163 4323

Total 80 165 755 926 1926 4397 6323

Former JLPT LevelKanji Frequency Level

Page 17: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results 結果 (2) - 2• A narrow gap between Kanji frequency level and the former JLPT Kanji Level• Among the top 1000 Kanji, more than 800 Kanji are covered by the Kanji at the former JLPT level 4, 3 and 2•More than 96% of the word tokens (延べ語数) in general texts will be covered by 1200 Kanji of: • All Kanji at the former JLPT level 4, 3, and 2 (Total: 1000)• + Top 200 Kanji at the former JLPT level 1

Page 18: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results 結果 (2) - 3Top 196 Kanji at the former JLPT level 1 and others 級外• Within the top 300: 保義公価基条応態郎 & 々• Within the top 1000: 張士氏視素護離証企異評提姿井統振吉策影紀為宮江派僕従系衛皇展案松隊施我整及織環響修遺宗昭撃株節源養項興故裁沢端障志激弁益嫌佐司眼密載己債訳症標健納請授挙恵貴徳推描崎抗属盛監傷創徴街善援衆康模敵津拠継隠称尾聖鮮厳攻妙融丈筋帝秘敷驚射壊刑壁染功訴討幕扱脱範契弾診詳房避酸倉充典繰儀至削博瞬仮縁憲択就聴握詩秀柄浜滅拡惑踏華闘微雄維隣如審誘賀郷霊釈黙魔携掲遣艦剣致 & 誰頃藤俺之岡伊阪

Page 19: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

4. Results 結果 (2) - 4• 95% text coverage requires• Top 9600 lexemes / Top 20749 orthographic forms (types 異なり語数 )• Top 1000 Kanji +Hiragana, Katakana + alphabet

•Within top 9600 lexemes, 1700 lexemes are estimated to require Kanji beyond the top 1000

e.g. 比較、記憶、批判、距離、指摘、希望、分析、 韓国、基礎、誕生、監督、雰囲気、卒業、洗濯•Many of them are often written in Hiragana/Katakana

e.g. 即ち、駄目、奴、凄い、頑張る、挨拶、嘘、   煙草、匂い、只、是非、無駄、喧嘩、噂、伺う

Page 20: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

5. Discussion  考察 1)• For general texts, learners can attain more than 70% comprehension with the 95-96% coverage

(For English, see Hu & Nation, 2000; for Japanese, see Komori, Mikuni & Kondo, 2004)

• Learning Kanji by order of frequency is much more efficient to gain higher text coverage (Zipf’s Law: Zipf, 1949)• Top 300 – 500 Kanji seems much more essential• Top 1000- 1500 Kanji might be enough for general purposes (with occasional use of dictionary)• It may also mean that learning Kanji without reaching the threshold level is of little use…

Page 21: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

5. Discussion  考察 2)• Also, to attain 95% coverage, 1000 Kanji are required; however, there are some important words not covered by the top 1000 Kanji• In other words, some low frequency Kanji are used for high frequency words• Many of those Kanji has low productivity, that is, they are rarely used for other words e.g. 雰囲気、卒業、洗濯• To cover top 9600 words (lexemes), further 200 – 500 Kanji are estimated to be required

Page 22: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

5. Discussion  考察 3)• Certainly, the burden of learning Japanese characters is heavier than most other languages•However, the burden of learning Japanese vocabulary may be rather lighter once the learner knows: • the 1000-1500 characters• word formation/compounding rules of Kanji• metaphors of Kanji compounds e.g. 入門 : entering a gate first step, to start training

• despite the fact that the text coverage is lower than English at all word frequency levels

Page 23: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

5. Discussion  考察 4)In other words, it is possible that • the number of ‘units of learning Japanese vocabulary’ is not so many as generally perceived• It will also be important for students/teachers to learn/teach• association of different readings (typically On-reading and Kun-

reading) of each Kanji to reduce the burden of learning Japanese vocabulary e.g. 入門 /nyuRmoN/ 入る /hairu/ + 門 /moN/ 入る (freq. ranking: 117) is more likely to be learned earlier than 入門 (freq. ranking: 6369) (Matsushita, 2011)• Without this kind of association, learners have to learn more words separately

Page 24: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

6. Conclusion  まとめ• 63% of BCCWJ texts are covered without Kanji (but half of them are function words)• To attain 95% coverage, 1000 Kanji are required; however, some important words are not covered by the top 1000 Kanji• To cover those words, further hundreds of Kanji will be required• The text coverage in Japanese are generally lower than in English, i.e. Japanese requires more words to learn• However, many Japanese words are composed of limited number of Kanji, therefore, the burden of learning Japanese vocabulary may not be heavy as expected from the text coverage studies, once the learner knows: • the 1000-1500 characters• form, meaning and compounding rules of Kanji• metaphors of Kanji compounds• association of different readings (e.g. On-reading and Kun-reading) of each Kanji

Page 25: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

These slides will be uploaded in the site shown below.

「松下言語学習ラボ」http://www.wa.commufa.jp/~tatsum/

You can find the site by Google with the key words of 松下 (Matsushita) and 言語 (language).

Page 26: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

References  引用文献 1) Akimoto, M. (秋元美晴) . (2002). よくわかる語彙

[Uniderstanding Vocabulary]. Tokyo: Alc (アルク) . Bauer, L. & Nation, P. (1993). Word families.

International Journal of Lexicography. 6(4), 253-279.

Carroll, J. B., Davies, P., & Richman, B. (1971). Word Frequency Book. New York: Houghton Mifflin, Boston American Heritage.

Chikamatsu, N., Yokoyama, S., Nozaki, H., Long, E., & Fukuda, S. (2000). A Japanese logographic character frequency list for cognitive science research. Behavior Research Methods, Instruments, & Computers, 32(3), 482-500.

Den, Y. (伝康晴) , Yamada, A. (山田篤) , Ogura, H. (小椋秀樹) , Koiso, H. (小磯花絵) , & Ogiso, T. (小木曽智信) . (2009). UniDic. Version. 1.3.11.

Downloaded from http://www.tokuteicorpus.jp/dist/

Page 27: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

References  引用文献 2) Harlan, P. (パトリック・ハーラン) . (2011). ゼロからの日本語学

習と僕の好きな日本のカルチャー (Learning Japanese from zero, and the Japanese culture I like). Cited from http://www.wochikochi.jp/topstory/2011/04/packun.php

Hu, M. H. & Nation, P. (2000). Vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403-430.

Komori, K. (小森和子) , Mikuni, J. (三國純子) , & Kondo, A. (近藤安月子) . (2004). 文章理解を促進する語彙知識の量的側面 ―既知語率の閾値探索の試み― (What percentage of known words in a text facilitates reading comprehension: a case study for exploration of the threshold of known words coverage). 日本語教育 [Teaching Japanese as a Foreign Language], 125, 83-92.

Page 28: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

References  引用文献 3) Matsushita, T. (松下達彦) . (2011). 日本語を読むための語彙データベース (The Database for Reading Japanese). Downloaded from http://www.geocities.jp/tatsum2003/ Nation, I. S. P. (2006). How large a vocabulary is

needed for reading and listening? Canadian Modern Language Review, 63(1), 59-82.

NINJAL: The National Institute for Japanese Language ( 国立国語研究所 ). (1962). 現代雑誌 90 種の用字・用語 第一分冊 総記および語彙表 (Vocabulary and Chinese characters in ninety magazines of today: (Volume I) General description & vocabulary frequency tables). Tokyo: Shuuei Shuppan ( 秀英出版 ).

Page 29: Tatsuhiko  Matsushita PhD candidate,  Victoria University of Wellington

References  引用文献 4) NINJAL: The National Institute for Japanese Language

( 国立国語研究所 ). (2006). 現代雑誌 200 万字言語調査語彙表 (The vocabulary lists from the language survey of contemporary magazines with two million running characters). Downloaded from http://www.kokken.go.jp/katsudo/seika/goityosa/index.html

Tamamura, F. ( 玉村文郎 ). (1984). 語彙の研究と教育(上) . Tokyo: The National Institute for Japanese Language ( 国立国語研究所 ).

Zipf, G. (1949). Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. New York: Hafner.