Upload
irena-misic
View
217
Download
0
Embed Size (px)
Citation preview
7/28/2019 Frequency RU
1/21
i
.. , ..
1
, , . , , . , ,.
Word Frequencies in Written and SpokenEnglish (Leech et al. 2001) ., . , . , , , , , , , .
: .. (1963), .. (1977), . (1993) ., (400 1 ) : .
,, ( 1999, . 2003, . 1996), . , (Josselson 1953), ( 1970), ( . 2008). , , ; ,
, ,.
. , Davies 2005 Davies & Gardner 2010., - , (). . ,.
7/28/2019 Frequency RU
2/21
ii
2 , 19502007 . , , 92 . .
(http://www.ruscorpora.ru) ( 2003, 2005, 20062008 .), 2001. XVIII XXI ( ), , , , , ,, , .
, , . . . , , (British National Corpus), (Corpus del espaol), (esk nrodnkorpus) . , (, , . .) . .
(, ),, . ( 2005). (, ..), (, ..), . , : ()1, ( , , ..), (, , ..),, , (, ..) (, ..). - , , , , , ( 100 ). 54 , : . . ( ,), (, , ..) .
. . : , ,
1 ., , -. . , ,, -.
7/28/2019 Frequency RU
3/21
iii
()., , . 1.
. 1.
,
,
.
39.04% 45 150 317 35 150 521 2 418
42.21% 48 818 173 39 739 644 27 390
, 16.96% 19 618 518 15 478 151 7 495
. . 11.30% 13 067 152 3 994
1.62% 1 872 482 1 075
1.49% 1 727 363 133
1.44% 1 664 804 488
0.57% 659 707 1 232
0.48% 556 291 439
0.26% 295 206 134
0.88% 1 017 568 758 407 1 005
(.. ) 0.90% 1 037 468 827 580 61
100% 115 642 044 91 954 303 38 369
, , (), , , , . . , , . , . ; 5%.
3
(400 . , ): . , , 1970( . 1972), , 16001700 400 . -. ,
7/28/2019 Frequency RU
4/21
iv
: (. . ), (. . ).
2 , 150 , ( . Sharoff 2006). , ,, ( 200500 ), . (, , ). ( ),2.
. 2. (, ipm)
202 364 138 436 428
609 1094 1058 756 818
69 1 15 11
499 421 250 282 292
193 110 75 78
415 632 595 503 650
58 242 135 91 110
, . , , . , . . , . , , ,
.3 , , , .
2 .
, , .3 (Church 2000), - whelk
problem (Kilgarriff 1997). (19831989 .). , , 1989, , . Whelk , .
7/28/2019 Frequency RU
5/21
v
2 , , -. , , , . , . , - ( BNC ), . , ., , , , Cieri & Liberman 2002), .
, ., - , (), , , . , . , 25 . , . , (. , , 1970, ), (), (/, /), (, ) . . (. 5 ), 91 982 416 , , , , . , 115 642 044 ( [ ,, ----] ).
686 566 (, ), 1 729 928 , 564 555 70 931 , . 270 498 , 203 185 0, 106 874 . 16.5%, 100 37%, 1 000 60%, 2 000 69%, 10 000 85% (.. 6.7).
4 ,
4. 1.
, ipm(instances per million words). , , . , 55 400 . , 364 39 653 , ipm
7/28/2019 Frequency RU
6/21
vi
137.5, 364.0 435.6, . ipm 92 ( 92 ). , F(ipm) 92, , 805.8 ipm x 92 = 74 134 .
, , . 1, 2 .., 10 000 . , , , (. 2). ipm 1, , , 1 000(, , ), 120 ipm. 6.7. 6: , 1 000 0.6094, , 1 1 000 ( ) 60.94% ; 50 000 93% .
(. 4), ,, (. 6).
4. 2. R (range) (D)
, . , , ,
, (. ). , , .
(ARF, Average Reduced Frequency), (ermak & Ken 2005). (, , , Lyne 1985) D, . (JuillandsD, . Juilland et al. 1970; . Gries 2008).
D
:
, , (. . , n), .
n (, 100 , 90 ). , (, ) .
)1
(1100D
=
n
7/28/2019 Frequency RU
7/21
vii
R (range) , . / 0 ( ) 1 ( )., D , , 100, , , 0.4
, (R=100), 5381.4 ipm, , 97. 100 , 395.0ipm, , 76. , 10.2 ipm, 916 () 3 9, 9.
( 1) ipm, R D, (), . , . , , . , , (, ), . , , ( ). , , R. , , ( , 47 ). , , , R=71. , .
D, , , ,, (Lyne 1986)., , ( 25 ipm), D 46, 78, 97, , () .
D R (range) , : . , : (R=91) , 400 (D=28).
, D , , D . . :
4 Leech et al. 2001.
100. .
7/28/2019 Frequency RU
8/21
viii
D (. . ). , , , .
4. 3. LLscore ()
. , . ,, , , , , ,, . ,5.
(log
likelihood), :
b +b d c+d
G2 (LLscore) :
, b, c, d, E1E2 (. Rayson & Garside 2000).
( ), . , , 10 , , 5 500 . , .
, .
15.31, 99% , (Rayson &Garside, 2000).
, , . ( ipm) ( ). (15 ipm 10ipm ), (a) (a+b) , . , .
5 2003: 17-19 .
d+c
b+ad=E2;
d+c
b+a=E1));
E2
b(b+)
E1
((= lnln2
7/28/2019 Frequency RU
9/21
ix
, ipm ( 10:1).
. 3. LLscore
. .
. 300 1000 30 100 30 100 30 1000
20 000 000 100 000 000 20 000 000 100 000 000 2 000 000 10 000 000 2 000 000 100 000 000
ipm 15 10 1.5 1 15 10 15 10
E1 200 20 20 20
E2 800 80 80 980
LL 56.34 5.63 5.63 4.43
(300 , ) (15.31).
( D) . , , (Kilgarriff, 2005). , 195060 ( ),
( , , ,).
5
5.1
, : , . (Zipf 1935) -
(r,) (f):fkr,
k , ( ), , ( , , ; . . 1975). 1: .
7/28/2019 Frequency RU
10/21
x
. 1: ().
, . , 20 000 , 30 000, -.
, . ,
, . 100 5 (ipm), 13 000 (460 ). , , , , . , , , , ?, , , 40 .
2.6 ipm ( 2,20 000 ) 0.4 ipm ( 1, 50 000 , 33).
5.2
() . : , (. , , , ) ; (, , , , , ,, ).
0
100
200
300
400
500
600
700
800
900
1000
100 10000
7/28/2019 Frequency RU
11/21
7/28/2019 Frequency RU
12/21
7/28/2019 Frequency RU
13/21
xiii
1) , , . , , ,, . (., ).
2) , , , ., , , ,
, , , , ,
, . : , , . , , . , , ;, , .3) pluralia tantum, , , . , , , , , , , , , , , . .
5.4
1 2, 1 2, , . . , (), , . , ,
, (. , )., , :
. 4.
Lemma PoS F(ipm) R D Doc
s 32.6 100 93 952 v 8.7 95 93 511
. , ., , , , . , ,, , , ( ), ( ), (), .
, , : , . . ,, . , (,. , .), ,.
7/28/2019 Frequency RU
14/21
xiv
, ,', . (. ). .
, , VS . , , ( /, /), 8 , , . .
6
:
1. ()2. ()3. ()3.1..3.1.. 3.2..3.2.. 3.3..3.3..
3.4..3.4.. 4. 5. 5.1. 5.2. 5.3. 5.4. 5.5. (, , , )5.6. 5.7.
6. 7.
( 1) , PoS, F(ipm), R (range), D () Doc, . 50 000 () . (. , ), (*). 1 , , ipm
7/28/2019 Frequency RU
15/21
xv
, 100, 1000, 100000 .. , .
. 5. 1 ()
Lemma PoS F(ipm) R D Doc
s 0.5 15 63 22
v 0.4 18 72 25
v 1.0 51 85 76
adv 0.7 41 84 54
, ( 2), Rank, , PoS, F(ipm) (19501969 , 19701989 , 19902007 ) 8. 20 000
.
. 6. 2 ()
.
Lemma PoS F(ipm) 1950-1960
1970-
1980
1990-
2000
1950-
1960
1970-
1980
1990-
2000
s 32.5 6.4 11.0 15.7 22.2 23.9 64.5
s 32.5 2.1 4.0 16.0 10.4 74.5 52.1
. 7. ,
19501969 19701989 19902007
: . 5 642 070 7 818 865 21 756 323
309 585 1 524
: . 674 566 2 725 968 34 950 394
509 623 26 264
, 19902000 (.. 7), 60 ., , , , .
8 ,, , 1975-2003 , 1900-2000-.
7/28/2019 Frequency RU
16/21
xvi
, ( 1). 2.6 ipm, .
( 3) , , . 5 000 . F(ipm). , .
. 8. 3.4 (:)
Lemma PoS F(sp)
s 22.7
a 26.5
(. . 4). , 9 , , , , . , . , F(all) ipm, , ipm, LLscore.
. 9. 3.4 ()
Lemma PoS F(all) F(sp) LL
part 1114.6 17208.0 50672
part 787.5 11847.0 34394
part 1785.1 15698.6 32662
( 4) 5 ipm, , ( 20 ). . , , ,., :
. 10. 4 ()
Word F(ipm)
3504.1
631.5
5.5
276.9
45.7
9 , , , , . , , , , . .
7/28/2019 Frequency RU
17/21
xvii
, , ipm , 100,1000, 100000.. , .
5 ( ) : , , , ( . .), , , (,, , ). F(ipm) ( ) Rank. 1 . , .
. 11. 5.7 (
: )
Lemma PoS F(sp)
pr 147.3
conj 134.9
( 6) , , , . 6.1 F(abs) (%) . 6.26.5 , , . ; F(abs) Rank. 6.6 (, ).
6.7 : (Rank) (Coverage). , , ( 1) 3.6% , . . 3.6% , 12 6.7% , 110 16.6% , 93% 150000.
6.8 . 1100, 101200 . ., NT(im), NT(n) NT(nf). , . 6.9 : (L), (Example), (N) ipm (F) (all) (im), (n), (nf) (sp). .
( 7)., , . 1993 , .
7/28/2019 Frequency RU
18/21
xviii
, ipm . ( Doc, R D) . ., , 150 (1.6 ipm) 50 .
, 90 , . , , , , . , , , .. , ,, , , , .., (,, ), (15). , , , , , .
7 , 2 500 . 1, F(ipm), R D Doc.
. 12. 7 ()
Lemma F(ipm) R D Doc
9.1 72 88 372
52.4 99 67 522
12.0 90 87 275
115.9 100 91 3387 11.3 57 82 305
, (. , ), (*). (, ,, , / ). (), (),(), (), /, /.. , .
, F(ipm).
***
. .. (), (Universitetet i Troms, ), (University of Leeds, ), . . , , ,
7/28/2019 Frequency RU
19/21
xix
, . .. ,, , .. . .. , .. ,.. , .. , .. , .. , .. , , . .. .. , .. , .. , C.. , .. ,.. , .. , .. , .. ,.. .. , , .. , . .. .
(66). .
. .. (http://dict.ruslang.ru).
.., .. , .. (1975). //. 2. 1. . 920. http://kudrinbi.ru/public/442/.
.., .. , .. (.) (1996). . 4. : .
.. (1977). : . .; 4.:.: , 2003.
.. (.) (1977). . .: .
(.) (1993). (Lnngren,Lennart. The Frequency Dictionary of Modern Russian). Acta Univ. Ups., Studia Slavica Upsaliensia
Uppsala 32. Uppsala.
.., .., .. (2007). // .. (.), 2007. : . . . 118125.
20032005: 20032005: . .: , 2005.
7/28/2019 Frequency RU
20/21
xx
20062008: 20062008. .: ,2009.
.., .. , .. (1972). . .:.
.. (2005). ? // 20032005. . .: . . 620.
.. (1999). (.. )// 99 . , 1999. . . 2. . 230236.
C.O. (2005). // 20032005. . .: . .6288.
.., .. (1998). // '98 . , 1998. .2. . 547552.
.. (2004). www.aot.ru // : 2004. . http://www.dialog-21.ru/Archive/2004/Sokirko.pdf.
.., .. (2005). ( ) // 2005..:ndex. . 8094.
.. (1970). . .: .
.., .., .. (2003). . .:.
.., .., .. (2008). (1990). .:.
.. (2003). //. 2. 5. . 819.
.. (1963). ..
ermk, Frantiek & Michal Ken (2005). New generation corpusbased frequency dictionaries: Thecase of Czech // International Journal of Corpus Linguistics, 10. P. 453467.
ermk, Frantiek, Michal Ken et al. (2004). Frekvenn slovnketiny. Praha: NLN.
7/28/2019 Frequency RU
21/21
xxi
Church, Kenneth W. (2000). Empirical estimates of adaptation: the chance of two Noriegas is closer to
p/2 than p2
// Proceedings of the 17th conference on Computational linguistics. Saarbrucken,
Germany, 2000. P. 180186.
Cieri, Christopher & Mark Liberman (2002). Language resources creation and distribution at the
Linguistic Data Consortium // Proceedings of LREC02. Las Palmas, Spain, 2002. C. 13271333.
Davies, Mark (2005). A Frequency Dictionary of Spanish: Core Vocabulary for Learners. London
N.Y.: Routledge.
Davies, Mark & Dee Gardner (2010). A Frequency Dictionary of American English: Word Sketches,
Collocates, and Thematic Lists. LondonN.Y.: Routledge. http://www.wordfrequency.info/
Gries, Stefan Th. (2008). Dispersions and adjusted frequencies in corpora // International Journal of
Corpus Linguistics 13, 4. P. 403437.
Josselson Harry H. (1953). The Russian word count and frequency analysis of grammatical categories
of Standard Literary Russian. Detroit: Wayne University Press.
Juilland, Alphonse, Dorothy Brodin & Catherine Davidovitch (1970). Frequency dictionary of French
words. The HagueParis: Mouton.
Kilgarriff, Adam (1997). Putting frequencies in the dictionary // International Journal of
Lexicography, 10(2). P. 135155.
Kilgarriff, Adam (2005). Language is never ever ever random // Corpus Linguistics and Linguistic
Theory 1 (2): 263276. http://www.kilgarriff.co.uk/Publications/2005-K-lineer.pdf
Leech, Geoffrey, Paul Rayson & Andrew Wilson (2001). Word Frequencies in Written and Spoken
English: based on the British National Corpus. Longman, London.
Lyne, Anthony A. (1986). In Praise of Juilland's 'D'; a contribution to the empirical evaluation of
various measures of dispersion applied to word frequencies // Ch. Muller (ed.) Methodes quantitatives
et informatiques dans l'etude des textes. GeneveParis. P. 588595.
Lyne, Anthony A. (1985). The vocabulary of French business correspondence: word frequencies,
collocations and problems of lexicometric method. Genve: Slatkine, Paris: Champion. (Travaux de
linguistique quantitative, 23).
Rayson, Paul & Roger Garside (2000). Comparing corpora using frequency profiling // Proceedings ofthe Comparing Corpora Workshop at ACL 2000. Hong Kong, 2000. P. 16.
Sharoff, Serge (2006). Creating generalpurpose corpora using automated search engine queries //
Baroni, Marco, Silvia Bernardini (eds.): WaCky! Working papers on the Web as Corpus. Bologna:
Gedit. P. 6398. http://wackybook.sslmit.unibo.it.
Zipf, George Kingsley (1935). The PsychoBiology of Language:An Introduction to Dynamic
Philology. Boston: Houghton Mifflin.