Upload
lindsay
View
16
Download
0
Embed Size (px)
DESCRIPTION
Choices over time. Some methodological issues in research into current change. Bas Aarts, Jo Close and Sean Wallis Survey of English Usage University College London {b.aarts, j.close, s.wallis}@ucl.ac.uk. Introducing DCPSE. The Diachronic Corpus of Present-day Spoken English - PowerPoint PPT Presentation
Citation preview
Choices over timeChoices over time
Some methodological issues in research into current change
Bas Aarts, Jo Close and Sean WallisSurvey of English Usage
University College London
{b.aarts, j.close, s.wallis}@ucl.ac.uk
Introducing DCPSEIntroducing DCPSE
• The Diachronic Corpus of Present-daySpoken English– orthographically transcribed spoken BrE– fully parsed, searchable with ICECUP and
FTFs – 400,000 words each from
• LLC (‘Survey Corpus’)• ICE-GB
– balanced by text category– not evenly distributed by year
• LLC: samples from 1958-1977• ICE-GB: 1990-1992
What can a parsed corpus tell What can a parsed corpus tell us?us?• Parsed corpora contain tree diagrams
– Use Fuzzy Tree Fragment (FTF) queries to get data
– An FTF:
– A matchingcase in a tree:
willwill vs. vs. shallshall
will
shall
2,798
355
1960s BrE
2,723
200
1990s BrE
2,702
267
1960s AmE
2,402
150
1990s AmE
• Barber (1964)– “[T]he distinctions formerly made between
shall and will are being lost, and will is coming increasingly to be used instead of shall.”
• Mair and Leech (2006)– lexical counts in Brown family of corpora
(written)• BrE and AmE: shall falls (~50%) with time
willwill vs. vs. shallshall
• Barber (1964)– “[T]he distinctions formerly made between
shall and will are being lost, and will is coming increasingly to be used instead of shall.”
• Mair and Leech (2006)– lexical counts in Brown family of corpora
(written)• BrE and AmE: shall falls (~50%) with time• Transatlantic convergence: AmE and BrE are
distinct in 1960s but not distinct in the 1990swill
shall
2,798
355
1960s BrE
2,723
200
1990s BrE
2,702
267
1960s AmE
2,402
150
1990s AmE
willwill vs. vs. shallshall
will
N-will
2,798
997,202
1960s BrE
2,723
997,277
1990s BrE
355
999,645
1960s BrE
200
999,800
1990s BrE
shall
N-shall
• Questions...– Are will and shall true alternates in each
case?• what about will not, shall not, won’t, shan’t and
interrogative forms? • do we include ’ll ?• Mair and Leech cite log-likelihood of words
– a kind of 2 for [{x, x’}, {N-x, N’-x’}](x, x’ = frequency of item, N, N’ = corpus size)
– it tells us that shall is less frequent in the later corpus
– it does not tell us whether will is replacing shall
N = 1M
willwill vs. vs. shallshall
• Questions...– Are will and shall true alternates in each
case?• what about will not, shall not, won’t, shan’t and
interrogative forms? • do we include ’ll ?• Mair and Leech cite log-likelihood of words
– a kind of 2 for [{x, x’}, {N-x, N’-x’}](x, x’ = frequency of item, N, N’ = corpus size)
– it tells us that shall is less frequent in the later corpus– it does not tell us whether will is replacing shall
• we’ve reanalysed data using 2 for [{x, x’}, {y, y’}]
will
shall
2,798
355
1960s BrE
2,723
200
1990s BrE
2,702
267
1960s AmE
2,402
150
1990s AmE
willwill vs. vs. shallshall
• Questions...– Are will and shall true alternates in each case?
• what about will not, shall not, won’t, shan’t and interrogative forms?
• do we include ’ll ?• Mair and Leech cite log-likelihood of words
– a kind of 2 for [{x, x’}, {N-x, N’-x’}](x, x’ = frequency of item, N, N’ = corpus size)
– it tells us that shall is less frequent in the later corpus– it does not tell us whether will is replacing shall
• we’ve reanalysed data using 2 for [{x, x’}, {y, y’}]
– Can we show a change in use in speech?– Can we show change over this period?
willwill vs. vs. shall shall vs. ’vs. ’ll ll (DCPSE)(DCPSE)
• Use parsing to find plausible alternatesCreate FTFs like this for shall, will and ’ll
Then create FTFs for shall not and will not• Subtract from first set of results (a different
experiment)
– These counts exclude• negative forms: shall not, shan’t, will not, won’t • subject-auxiliary inversion
willwill vs. vs. shall shall vs. ’vs. ’ll ll (DCPSE)(DCPSE)
• Consider the three-way alternation
• Most variation is for shall
LLC
ICE-GB
124
46
501
544
663
638
1,288
1,228TOTAL 170 1,045 1,301 2,516
’llwillshall TOTAL 2(shall) 2(will) 2(’ll)
15.71
16.48
2.16
2.26
0.01
0.01
36.63s2
shall will ’ll
willwill vs. vs. shall shall vs. ’vs. ’ll ll (DCPSE)(DCPSE)
• If will and’ll behave similarly, group them
LLC
ICE-GB
501
544
663
638
1,164
1,182TOTAL 1,045 1,301 2,346
’llwill TOTAL 2(will) 2(’ll)
0.58
0.58
0.47
0.47
2.11ns2
shall
will ’ll
will+’ll
124
46
170
shall
willwill vs. vs. shall shall vs. ’vs. ’ll ll (DCPSE)(DCPSE)
• If will and’ll behave similarly, group them
LLC
ICE-GB
1,164
1,182TOTAL 2,346
will+’ll TOTAL 2(will+’ll)
1.14
1.19
34.52s2
shall
will ’ll
will+’ll
124
46
170
shall
1,288
1,228
2,516
2(shall)
15.71
16.48
willwill vs. vs. shall shall vs. ’vs. ’ll ll (DCPSE)(DCPSE)
• If will and’ll behave similarly, group them
LLC
ICE-GB
1,164
1,182TOTAL 2,346
will+’ll TOTAL 2(will+’ll)
1.14
1.19
34.52s2
shall
will ’ll
will+’ll
124
46
170
shall
1,288
1,228
2,516
2(shall)
15.71
16.489.7%9.7%
3.7%3.7%
shall shall over time (DCPSE)over time (DCPSE)
• Proportion of alternates that are shall, by year
0
0.1
0.2
0.3
0.4
1955 1960 1965 1970 1975 1980 1985 1990 1995
p (shall | {shall, will, ’ll})
ICE-GB
LLC
0
0.1
0.2
0.3
0.4
1955 1960 1965 1970 1975 1980 1985 1990 1995
p (shall | {shall, will, ’ll})
ICE-GB
LLC
shall shall over time (DCPSE)over time (DCPSE)
• Proportion of alternates that are shall, by year
x̄ = p
p
z . s- z . s+
0
error bars based on Poisson
Focusing on true alternationFocusing on true alternation
• Aim: to focus on true alternation– minimise other sources of variation
• Consider changing use of the progressiveVP(¬prog) VP(prog)
‘progressivisable VP’
VP
all words
{better
truealternates
The progressive (DCPSE)The progressive (DCPSE)
• FTF to retrieve progressives from DCPSE
• Identifying the alternates (see Smitterberg 2005; Aarts, Close & Wallis forthcoming)
– VP(prog)• Exclude be going to future (automatic)
– VP(¬prog)• Exclude imperatives, infinitives, (benefits of using a
parsed corpus)
The progressive over time The progressive over time (DCPSE)(DCPSE)
• The rise of the English progressive in spoken English (as a proportion of alternates)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
1955 1960 1965 1970 1975 1980 1985 1990 1995
ICE-GBLLC
p (VP(prog) | {VP(prog), VP(¬prog)})
ConclusionsConclusions
• We focus on true alternation to investigate if replacement is occurring by considering:– variation (over time) where there is a choice– hierarchies of alternates
• as with {shall, {will, ’ll }}
• This can be difficult– Requires a linguistic argument– May require careful examination of cases
• It is extensible to other types of experiment, e.g. interaction between choices
ReferencesReferences
• Aarts, Bas, Jo Close and Sean Wallis (forthcoming) Recent changes in the use of the progressive construction in English. In: Bert Cappelle and Naoaki Wada (eds.) Festschrift for (secret).
• Barber, Charles (1964) Linguistic change in present-day English. Edinburgh: Oliver & Boyd.
• Mair, Christian and Geoffrey Leech (2006) “Current Changes in English Syntax,” The Handbook of English linguistics, ed. by Aarts, Bas, and April McMahon, 318-342, Blackwell Publishers, Malden MA.
• Nelson, Gerald, Sean Wallis and Bas Aarts (2002) Exploring natural language: working with the British component of the International Corpus of English. Amsterdam: John Benjamins.
• Smitterberg, Erik (2005) The Progressive in 19th-Century English: A Process of Integration. (Language and Computers: Studies in Practical Linguistics 54.) Amsterdam: Rodopi.
Bas Aarts, Jo Close and Sean Wallis
{b.aarts, j.close, s.wallis}@ucl.ac.uk
www.ucl.ac.uk/english-usage