Frequency RU

Embed Size (px)

Citation preview

  • 7/28/2019 Frequency RU

    1/21

    i

    .. , ..

    1

    , , . , , . , ,.

    Word Frequencies in Written and SpokenEnglish (Leech et al. 2001) ., . , . , , , , , , , .

    : .. (1963), .. (1977), . (1993) ., (400 1 ) : .

    ,, ( 1999, . 2003, . 1996), . , (Josselson 1953), ( 1970), ( . 2008). , , ; ,

    , ,.

    . , Davies 2005 Davies & Gardner 2010., - , (). . ,.

  • 7/28/2019 Frequency RU

    2/21

    ii

    2 , 19502007 . , , 92 . .

    (http://www.ruscorpora.ru) ( 2003, 2005, 20062008 .), 2001. XVIII XXI ( ), , , , , ,, , .

    , , . . . , , (British National Corpus), (Corpus del espaol), (esk nrodnkorpus) . , (, , . .) . .

    (, ),, . ( 2005). (, ..), (, ..), . , : ()1, ( , , ..), (, , ..),, , (, ..) (, ..). - , , , , , ( 100 ). 54 , : . . ( ,), (, , ..) .

    . . : , ,

    1 ., , -. . , ,, -.

  • 7/28/2019 Frequency RU

    3/21

    iii

    ()., , . 1.

    . 1.

    ,

    ,

    .

    39.04% 45 150 317 35 150 521 2 418

    42.21% 48 818 173 39 739 644 27 390

    , 16.96% 19 618 518 15 478 151 7 495

    . . 11.30% 13 067 152 3 994

    1.62% 1 872 482 1 075

    1.49% 1 727 363 133

    1.44% 1 664 804 488

    0.57% 659 707 1 232

    0.48% 556 291 439

    0.26% 295 206 134

    0.88% 1 017 568 758 407 1 005

    (.. ) 0.90% 1 037 468 827 580 61

    100% 115 642 044 91 954 303 38 369

    , , (), , , , . . , , . , . ; 5%.

    3

    (400 . , ): . , , 1970( . 1972), , 16001700 400 . -. ,

  • 7/28/2019 Frequency RU

    4/21

    iv

    : (. . ), (. . ).

    2 , 150 , ( . Sharoff 2006). , ,, ( 200500 ), . (, , ). ( ),2.

    . 2. (, ipm)

    202 364 138 436 428

    609 1094 1058 756 818

    69 1 15 11

    499 421 250 282 292

    193 110 75 78

    415 632 595 503 650

    58 242 135 91 110

    , . , , . , . . , . , , ,

    .3 , , , .

    2 .

    , , .3 (Church 2000), - whelk

    problem (Kilgarriff 1997). (19831989 .). , , 1989, , . Whelk , .

  • 7/28/2019 Frequency RU

    5/21

    v

    2 , , -. , , , . , . , - ( BNC ), . , ., , , , Cieri & Liberman 2002), .

    , ., - , (), , , . , . , 25 . , . , (. , , 1970, ), (), (/, /), (, ) . . (. 5 ), 91 982 416 , , , , . , 115 642 044 ( [ ,, ----] ).

    686 566 (, ), 1 729 928 , 564 555 70 931 , . 270 498 , 203 185 0, 106 874 . 16.5%, 100 37%, 1 000 60%, 2 000 69%, 10 000 85% (.. 6.7).

    4 ,

    4. 1.

    , ipm(instances per million words). , , . , 55 400 . , 364 39 653 , ipm

  • 7/28/2019 Frequency RU

    6/21

    vi

    137.5, 364.0 435.6, . ipm 92 ( 92 ). , F(ipm) 92, , 805.8 ipm x 92 = 74 134 .

    , , . 1, 2 .., 10 000 . , , , (. 2). ipm 1, , , 1 000(, , ), 120 ipm. 6.7. 6: , 1 000 0.6094, , 1 1 000 ( ) 60.94% ; 50 000 93% .

    (. 4), ,, (. 6).

    4. 2. R (range) (D)

    , . , , ,

    , (. ). , , .

    (ARF, Average Reduced Frequency), (ermak & Ken 2005). (, , , Lyne 1985) D, . (JuillandsD, . Juilland et al. 1970; . Gries 2008).

    D

    :

    , , (. . , n), .

    n (, 100 , 90 ). , (, ) .

    )1

    (1100D

    =

    n

  • 7/28/2019 Frequency RU

    7/21

    vii

    R (range) , . / 0 ( ) 1 ( )., D , , 100, , , 0.4

    , (R=100), 5381.4 ipm, , 97. 100 , 395.0ipm, , 76. , 10.2 ipm, 916 () 3 9, 9.

    ( 1) ipm, R D, (), . , . , , . , , (, ), . , , ( ). , , R. , , ( , 47 ). , , , R=71. , .

    D, , , ,, (Lyne 1986)., , ( 25 ipm), D 46, 78, 97, , () .

    D R (range) , : . , : (R=91) , 400 (D=28).

    , D , , D . . :

    4 Leech et al. 2001.

    100. .

  • 7/28/2019 Frequency RU

    8/21

    viii

    D (. . ). , , , .

    4. 3. LLscore ()

    . , . ,, , , , , ,, . ,5.

    (log

    likelihood), :

    b +b d c+d

    G2 (LLscore) :

    , b, c, d, E1E2 (. Rayson & Garside 2000).

    ( ), . , , 10 , , 5 500 . , .

    , .

    15.31, 99% , (Rayson &Garside, 2000).

    , , . ( ipm) ( ). (15 ipm 10ipm ), (a) (a+b) , . , .

    5 2003: 17-19 .

    d+c

    b+ad=E2;

    d+c

    b+a=E1));

    E2

    b(b+)

    E1

    ((= lnln2

  • 7/28/2019 Frequency RU

    9/21

    ix

    , ipm ( 10:1).

    . 3. LLscore

    . .

    . 300 1000 30 100 30 100 30 1000

    20 000 000 100 000 000 20 000 000 100 000 000 2 000 000 10 000 000 2 000 000 100 000 000

    ipm 15 10 1.5 1 15 10 15 10

    E1 200 20 20 20

    E2 800 80 80 980

    LL 56.34 5.63 5.63 4.43

    (300 , ) (15.31).

    ( D) . , , (Kilgarriff, 2005). , 195060 ( ),

    ( , , ,).

    5

    5.1

    , : , . (Zipf 1935) -

    (r,) (f):fkr,

    k , ( ), , ( , , ; . . 1975). 1: .

  • 7/28/2019 Frequency RU

    10/21

    x

    . 1: ().

    , . , 20 000 , 30 000, -.

    , . ,

    , . 100 5 (ipm), 13 000 (460 ). , , , , . , , , , ?, , , 40 .

    2.6 ipm ( 2,20 000 ) 0.4 ipm ( 1, 50 000 , 33).

    5.2

    () . : , (. , , , ) ; (, , , , , ,, ).

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    100 10000

  • 7/28/2019 Frequency RU

    11/21

  • 7/28/2019 Frequency RU

    12/21

  • 7/28/2019 Frequency RU

    13/21

    xiii

    1) , , . , , ,, . (., ).

    2) , , , ., , , ,

    , , , , ,

    , . : , , . , , . , , ;, , .3) pluralia tantum, , , . , , , , , , , , , , , . .

    5.4

    1 2, 1 2, , . . , (), , . , ,

    , (. , )., , :

    . 4.

    Lemma PoS F(ipm) R D Doc

    s 32.6 100 93 952 v 8.7 95 93 511

    . , ., , , , . , ,, , , ( ), ( ), (), .

    , , : , . . ,, . , (,. , .), ,.

  • 7/28/2019 Frequency RU

    14/21

    xiv

    , ,', . (. ). .

    , , VS . , , ( /, /), 8 , , . .

    6

    :

    1. ()2. ()3. ()3.1..3.1.. 3.2..3.2.. 3.3..3.3..

    3.4..3.4.. 4. 5. 5.1. 5.2. 5.3. 5.4. 5.5. (, , , )5.6. 5.7.

    6. 7.

    ( 1) , PoS, F(ipm), R (range), D () Doc, . 50 000 () . (. , ), (*). 1 , , ipm

  • 7/28/2019 Frequency RU

    15/21

    xv

    , 100, 1000, 100000 .. , .

    . 5. 1 ()

    Lemma PoS F(ipm) R D Doc

    s 0.5 15 63 22

    v 0.4 18 72 25

    v 1.0 51 85 76

    adv 0.7 41 84 54

    , ( 2), Rank, , PoS, F(ipm) (19501969 , 19701989 , 19902007 ) 8. 20 000

    .

    . 6. 2 ()

    .

    Lemma PoS F(ipm) 1950-1960

    1970-

    1980

    1990-

    2000

    1950-

    1960

    1970-

    1980

    1990-

    2000

    s 32.5 6.4 11.0 15.7 22.2 23.9 64.5

    s 32.5 2.1 4.0 16.0 10.4 74.5 52.1

    . 7. ,

    19501969 19701989 19902007

    : . 5 642 070 7 818 865 21 756 323

    309 585 1 524

    : . 674 566 2 725 968 34 950 394

    509 623 26 264

    , 19902000 (.. 7), 60 ., , , , .

    8 ,, , 1975-2003 , 1900-2000-.

  • 7/28/2019 Frequency RU

    16/21

    xvi

    , ( 1). 2.6 ipm, .

    ( 3) , , . 5 000 . F(ipm). , .

    . 8. 3.4 (:)

    Lemma PoS F(sp)

    s 22.7

    a 26.5

    (. . 4). , 9 , , , , . , . , F(all) ipm, , ipm, LLscore.

    . 9. 3.4 ()

    Lemma PoS F(all) F(sp) LL

    part 1114.6 17208.0 50672

    part 787.5 11847.0 34394

    part 1785.1 15698.6 32662

    ( 4) 5 ipm, , ( 20 ). . , , ,., :

    . 10. 4 ()

    Word F(ipm)

    3504.1

    631.5

    5.5

    276.9

    45.7

    9 , , , , . , , , , . .

  • 7/28/2019 Frequency RU

    17/21

    xvii

    , , ipm , 100,1000, 100000.. , .

    5 ( ) : , , , ( . .), , , (,, , ). F(ipm) ( ) Rank. 1 . , .

    . 11. 5.7 (

    : )

    Lemma PoS F(sp)

    pr 147.3

    conj 134.9

    ( 6) , , , . 6.1 F(abs) (%) . 6.26.5 , , . ; F(abs) Rank. 6.6 (, ).

    6.7 : (Rank) (Coverage). , , ( 1) 3.6% , . . 3.6% , 12 6.7% , 110 16.6% , 93% 150000.

    6.8 . 1100, 101200 . ., NT(im), NT(n) NT(nf). , . 6.9 : (L), (Example), (N) ipm (F) (all) (im), (n), (nf) (sp). .

    ( 7)., , . 1993 , .

  • 7/28/2019 Frequency RU

    18/21

    xviii

    , ipm . ( Doc, R D) . ., , 150 (1.6 ipm) 50 .

    , 90 , . , , , , . , , , .. , ,, , , , .., (,, ), (15). , , , , , .

    7 , 2 500 . 1, F(ipm), R D Doc.

    . 12. 7 ()

    Lemma F(ipm) R D Doc

    9.1 72 88 372

    52.4 99 67 522

    12.0 90 87 275

    115.9 100 91 3387 11.3 57 82 305

    , (. , ), (*). (, ,, , / ). (), (),(), (), /, /.. , .

    , F(ipm).

    ***

    . .. (), (Universitetet i Troms, ), (University of Leeds, ), . . , , ,

  • 7/28/2019 Frequency RU

    19/21

    xix

    , . .. ,, , .. . .. , .. ,.. , .. , .. , .. , .. , , . .. .. , .. , .. , C.. , .. ,.. , .. , .. , .. ,.. .. , , .. , . .. .

    (66). .

    . .. (http://dict.ruslang.ru).

    .., .. , .. (1975). //. 2. 1. . 920. http://kudrinbi.ru/public/442/.

    .., .. , .. (.) (1996). . 4. : .

    .. (1977). : . .; 4.:.: , 2003.

    .. (.) (1977). . .: .

    (.) (1993). (Lnngren,Lennart. The Frequency Dictionary of Modern Russian). Acta Univ. Ups., Studia Slavica Upsaliensia

    Uppsala 32. Uppsala.

    .., .., .. (2007). // .. (.), 2007. : . . . 118125.

    20032005: 20032005: . .: , 2005.

  • 7/28/2019 Frequency RU

    20/21

    xx

    20062008: 20062008. .: ,2009.

    .., .. , .. (1972). . .:.

    .. (2005). ? // 20032005. . .: . . 620.

    .. (1999). (.. )// 99 . , 1999. . . 2. . 230236.

    C.O. (2005). // 20032005. . .: . .6288.

    .., .. (1998). // '98 . , 1998. .2. . 547552.

    .. (2004). www.aot.ru // : 2004. . http://www.dialog-21.ru/Archive/2004/Sokirko.pdf.

    .., .. (2005). ( ) // 2005..:ndex. . 8094.

    .. (1970). . .: .

    .., .., .. (2003). . .:.

    .., .., .. (2008). (1990). .:.

    .. (2003). //. 2. 5. . 819.

    .. (1963). ..

    ermk, Frantiek & Michal Ken (2005). New generation corpusbased frequency dictionaries: Thecase of Czech // International Journal of Corpus Linguistics, 10. P. 453467.

    ermk, Frantiek, Michal Ken et al. (2004). Frekvenn slovnketiny. Praha: NLN.

  • 7/28/2019 Frequency RU

    21/21

    xxi

    Church, Kenneth W. (2000). Empirical estimates of adaptation: the chance of two Noriegas is closer to

    p/2 than p2

    // Proceedings of the 17th conference on Computational linguistics. Saarbrucken,

    Germany, 2000. P. 180186.

    Cieri, Christopher & Mark Liberman (2002). Language resources creation and distribution at the

    Linguistic Data Consortium // Proceedings of LREC02. Las Palmas, Spain, 2002. C. 13271333.

    Davies, Mark (2005). A Frequency Dictionary of Spanish: Core Vocabulary for Learners. London

    N.Y.: Routledge.

    Davies, Mark & Dee Gardner (2010). A Frequency Dictionary of American English: Word Sketches,

    Collocates, and Thematic Lists. LondonN.Y.: Routledge. http://www.wordfrequency.info/

    Gries, Stefan Th. (2008). Dispersions and adjusted frequencies in corpora // International Journal of

    Corpus Linguistics 13, 4. P. 403437.

    Josselson Harry H. (1953). The Russian word count and frequency analysis of grammatical categories

    of Standard Literary Russian. Detroit: Wayne University Press.

    Juilland, Alphonse, Dorothy Brodin & Catherine Davidovitch (1970). Frequency dictionary of French

    words. The HagueParis: Mouton.

    Kilgarriff, Adam (1997). Putting frequencies in the dictionary // International Journal of

    Lexicography, 10(2). P. 135155.

    Kilgarriff, Adam (2005). Language is never ever ever random // Corpus Linguistics and Linguistic

    Theory 1 (2): 263276. http://www.kilgarriff.co.uk/Publications/2005-K-lineer.pdf

    Leech, Geoffrey, Paul Rayson & Andrew Wilson (2001). Word Frequencies in Written and Spoken

    English: based on the British National Corpus. Longman, London.

    Lyne, Anthony A. (1986). In Praise of Juilland's 'D'; a contribution to the empirical evaluation of

    various measures of dispersion applied to word frequencies // Ch. Muller (ed.) Methodes quantitatives

    et informatiques dans l'etude des textes. GeneveParis. P. 588595.

    Lyne, Anthony A. (1985). The vocabulary of French business correspondence: word frequencies,

    collocations and problems of lexicometric method. Genve: Slatkine, Paris: Champion. (Travaux de

    linguistique quantitative, 23).

    Rayson, Paul & Roger Garside (2000). Comparing corpora using frequency profiling // Proceedings ofthe Comparing Corpora Workshop at ACL 2000. Hong Kong, 2000. P. 16.

    Sharoff, Serge (2006). Creating generalpurpose corpora using automated search engine queries //

    Baroni, Marco, Silvia Bernardini (eds.): WaCky! Working papers on the Web as Corpus. Bologna:

    Gedit. P. 6398. http://wackybook.sslmit.unibo.it.

    Zipf, George Kingsley (1935). The PsychoBiology of Language:An Introduction to Dynamic

    Philology. Boston: Houghton Mifflin.