Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Preview:

Citation preview

The Old Bailey Corpus

Spoken English in the 18th and

19th centuries The use of historical court records in

the investigation of language change

Digital History Seminar, 21 February 2012

Magnus Huber

Department of English

University of Giessen

Otto-Behaghel-Str. 10B

D-35394 Giessen, Germany

magnus.huber@anglistik.uni-giessen.de

2

Structure 1. Introduction

1.1 Corpus linguistics, sociolinguistics and

sociohistorical linguistics

1.2 The Proceedings of the Old Bailey

1.3 Turning the Proceedings into a linguistic corpus

2. How linguistically accurate is OBC?

2.1 Comparison with alternative accounts

2.2 Language event and its representation

2.3 Internal consistency: negative contraction

2.4 Sociolinguistic potential: relative clauses

3. Brief summary

Definition of linguistic corpus

Generally speaking, a

(usually large) collection of

machine-readable texts used

as a database in linguistic

analyses

Importance of

spoken language

Spoken language precedes

written language

1. Introduction

1.1 Corpus linguistics, sociolinguistics and

sociohistorical linguistics

0

20

40

60

80

100

MMC LMC UWC MWC LWC

Female

Male

Percentage

of (ng):[n] by

social class

and sex

MMC middle middle class

LMC lower middle class

UWC upper working class

MWC middle working class

LWC lower working class

drinking

(ng):[n]

= [drɪnkɪn]

Peter Trudgill (1974)

The social differentiation of English in Norwich

Historical linguistics: language change

ye > you in subject position

when ye

come set it in

sech rewle as

ye seeme

best (1465)

And thus in

hast fare you

hartely well

(1545)

Sociohistorical linguistics

Gender-related change: ye > you

7

1.2 The Proceedings of the Old Bailey

• Old Bailey = London's Central Criminal Court

• meets 8 times/year, from 1830s 10 times/year

• "Proceedings" published 1674-1913

• start as a commercial enterprise: publishers

send scribes into courtroom

• proceedings taken down in shorthand

• sold privately by publishers

• City of London gains more and more control

during 18th century

• 2100+ volumes

• ca. 200,000 trials

• ca. 134 million words

www.oldbaileyonline.org

<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>

<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>

<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]

Original computerized Proceedings (Sheffield)

<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>

<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>

<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]

Original computerized Proceedings (Sheffield)

Sociolinguistically useful XML-tags

in Sheffield Proceedings

• name

<given>Sarah</given> <surname>Sanders</surname>

• year

<identifier>t17180110-1</identifier>

• gender

<defend gender="f">

• age

<age>43</age>

• profession

<deflabel>Servant</deflabel>

• origin

<crimeloc>Tottenham</crimeloc>

1.3 Turning the Proceedings

into a linguistic corpus of

early spoken English

13

<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>

<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>

<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]

<speech>

Tagging spoken language

• Need for automatic annotation

• Perl script identifying non-linguistic

patterns indicating spoken language

in the original proceedings

– layout

– metalinguistic information

• Linguistic markers indicating spoken

language? > 1st + 2nd person prns

Automatic speech tagging

e.g. "Q. – A."-sequences

Q. Did you see him on Sunday night? - A.

Yes, at Walworth, on Sunday night, the

12th of January, at one o'clock - I am sure

of that.</p>

<speech> </speech>

<speech>

</speech>

17

Sociobiographical speech event annotation

The New Bailey Tag Assistant

- <xml>

- <document name="19100426">

. . .

- <speaker id="271">

<sex>m</sex>

<age></age>

<given>Thomas</given>

<surname>Tuckey</surname>

<occupation>Warder</occupation>

<occupation2></occupation2>

<hiscolabel>Prison Guard</hiscolabel>

<hiscocode>58930</hiscocode>

<hiscolabel2></hiscolabel2>

<hiscocode2></hiscocode2>

<crimescene></crimescene>

<birthplace></birthplace>

<workplace>Wormwood Scrubs Prison</workplace>

<placeofresidence></placeofresidence>

<role>witness</role>

</speaker>

. . .

- </document>

- </xml>

18

Social data file

• XML format

• attributes of every speaker

in OBC

• plus: scribe, printer,

publisher

2. How linguistically accurate is OBC?

Proceedings (718 words) Tryal (1290 words)

Thomas. I am clerk to Mr Jones,

a Stationer in the Temple.

Henry Thomas. I am clerk to Mr

Jones, a Stationer, in the Temple.

Hargrave. By Mr Ayliffe: I saw

him seal and deliver it.

Walter Hargrave. By Mr Ayliffe. – I

saw him sign, seal, and deliver it, as

his act and deed.

./. John Fannen. I am not sure; but to

the best of my remembrance, it was

sometime the beginning of

December last, at Mr Fox's house.

19

2.1. Comparison with alternative accounts, e.g.

trial of John Ayliffe, 17591024-27, vs. alternative

account The tryal at large of John Ayliffe

Proceedings (718 words) Tryal (1290 words)

Hargrave. Because he said he

was not willing Mr Fox should

know of it?

Walter Hargrave. The reason Mr

Ayliffe gave, was, that he would not

on any account have it come to Mr

Fox's ears.

Thomas. I can't particularly say

that; sometimes we leave a

blank by the gentlemens desire,

perhaps they may add another

covenant, or something of that

sort, I can't recollect the reason

for that.

Henry Thomas. I cannot positively

say. – We sometimes leave out the

conclusion by gentlemen's desire, in

order that they may add a covenant,

or some such thing, if it should be

thought necessary; but I cannot

particularly recollect the reason why

the conclusion was omitted in this

case.

20

speech event

perception by scribe

shorthand script

expanding shorthand

proof reading

type setting

21

formulation writing

Letters

Trial proceedings (e.g. Old Bailey Proceedings)

2.2 Language event ↔ written representation

Gurney (1752)

Brachygraphy: or short-writing

22

'to take a Speech,

or Sermon

verbatim, as a

Person talks in

common' (p. 3)

Scribes

Thomas Gurney

(1749-1770)

Joseph Gurney

(1770-1782)

Recording linguisticdetails

• no distinction between inflected and

uninflected auxiliaries

= 'may' or 'mayst'

= 'can' or 'canst'

= 'should' or 'shouldst'

• dot placed on the top left of the noun phrase

= allomorphs a and an

• auxiliary contractions

'you will' (you w-il) vs. 'you'll' (you-l)

but │ 'it will' ~ 'twill' (│= <t> and it) 23

24

2.3 Internal consistency:

negative contraction

e.g. do not > don't, need not > needn't, was not > wasn't

N = 1,344,244

0

2

4

6

8

10

12

14

16

18

1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913

NEG contraction in %

Negative contraction in the

OBC, 1732-1912 1. Lexeme?

AUX form % contr. N

do not 28.9 189,776

will not 27.7 17,302

shall not 20.6 4,172

cannot 13.3 106,005

are not 3.2 11,552

dare not 3.1 260

need not 0.6 2,136

did not 0.4 429,143

does not 0.4 9,539

have not 0.4 44,038

could not 0.2 85,361

25

AUX form % contr. N

is not 0.2 47,142

must not 0.2 1,620

would not 0.2 52,123

had not 0.1 72,395

has not 0.1 9,244

should not 0.1 20,192

was not 0.1 64,574

may not 0.0 1,271

might not 0.0 2,404

ought not 0.0 1,221

Negative contraction in the

OBC, 1732-1912 2. Frequency?

AUX form % contr. N

do not 28.9 189,776

will not 27.7 17,302

shall not 20.6 4,172

cannot 13.3 106,005

are not 3.2 11,552

dare not 3.1 260

need not 0.6 2,136

did not 0.4 429,143

does not 0.4 9,539

have not 0.4 44,038

could not 0.2 85,361

26

AUX form % contr. N

is not 0.2 47,142

must not 0.2 1,620

would not 0.2 52,123

had not 0.1 72,395

has not 0.1 9,244

should not 0.1 20,192

was not 0.1 64,574

may not 0.0 1,271

might not 0.0 2,404

ought not 0.0 1,221

Negative contraction in the

OBC, 1732-1912 3. Tense?

AUX form % contr. N

do not 28.9 189,776

will not 27.7 17,302

shall not 20.6 4,172

cannot 13.3 106,005

are not 3.2 11,552

dare not 3.1 260

need not 0.6 2,136

did not 0.4 429,143

does not 0.4 9,539

have not 0.4 44,038

could not 0.2 85,361

27

AUX form % contr. N

is not 0.2 47,142

must not 0.2 1,620

would not 0.2 52,123

had not 0.1 72,395

has not 0.1 9,244

should not 0.1 20,192

was not 0.1 64,574

may not 0.0 1,271

might not 0.0 2,404

ought not 0.0 1,221

28

Explaining the absence of

negative contraction

• combination of phonology and genre

• n't is phonetically reduced, less salient than not

• do-don't [u - o(u)] vs. did-didn't [ɪ - ɪ]

can-can't vs. could-couldn't

will-won't vs. would-wouldn't

shall-shan't vs. should-shouldn't

• negative contraction is (near) absent where the

context (e.g. change in the stem vowel in the

negative) does not allow disambiguation

Hierarchy of perceptive difference

between positive and negative

contracted forms

29

V change C change/

addition

Score

do-don('t) 1 1 2

will-won('t) 1 1 2

shall-shan('t) 0.5 1 1.5

can-can('t) 0.5 0 0.5

2.4 Sociolinguistic potential: relative

clauses

• random extracts of speech events from OBC:

20,000 words/decade (10,000 w. each for m + f)

• 2500+ relative clauses, of which 1533 restrictive

30

1720-

1779

% 1780-

1839

% 1840-

1913

% ∑ %

that 259 53.8 240 45.4 136 26.0 635 41.4

zero 107 22.2 118 22.3 201 38.4 426 27.8

which 70 14.6 97 18.3 92 17.6 259 16.9

who 38 7.9 69 13.0 89 17.0 196 12.8

whom 6 1.2 2 0.4 5 1.0 13 0.8

whose 1 0.2 3 0.6 0 0.0 4 0.3

∑ 481 529 523 1533

Diagram 1 Distribution of that with regard to

animacy of the head

1720-1779 vs 1780-1839 p = 0.000

1720-1779 vs 1840-1913 p = 0.000

1780-1839 vs 1840-1913 p = 0.070 31

1720-1779 1780-1839 1840-1913

non-human 121 164 105

human 137 76 31

0%

20%

40%

60%

80%

100%

32

Diagram 2 Distribution of that and pronominal

relativizers with human heads

1720-1779 vs 1780-1839: p = 0.000

1720-1779 vs 1840-1913: p = 0.000

1780-1839 vs 1840-1913: p = 0.000

1720-1779 1780-1839 1840-1913

PRN 49 72 93

that 137 76 31

0%

20%

40%

60%

80%

100%

Diagram 3 Relativizers by gender (excl. genitives)

f 1720-1779 vs 1780-1839: p = 0.135 m 1720-1779 vs 1780-1839: p = 0.033

f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.000

f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.000

f m f m f m

1720-1779 1780-1839 1840-1913

PRN 43 71 56 112 66 119

zero 53 54 66 52 110 73

that 124 134 108 132 72 64

0%

20%

40%

60%

80%

100% p = 0.135 p = 0.001 p = 0.000

Diagram 4 Zero relativizer by gender (excl. genitives)

f 1720-1779 vs 1780-1839: p = 0.268 m 1720-1779 vs 1780-1839: p = 0.326

f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.022

f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.001

f m f m f m

1720-1779 1780-1839 1840-1913

other 167 205 164 244 138 173

zero 53 54 66 52 110 73

0%

20%

40%

60%

80%

100%

Thank you

35

References

• Gurney, Thomas. 1752. Brachygraphy: or short-writing.

2nd ed. London: [no publisher].

• Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds).

1996. Sociolinguistics and language history: studies

based on the corpus of early English correspondence.

Amsterdam: Rodopi.

• Trudgill, Peter. 1974. The Social Differentiation of

English in Norwich. Cambridge: Cambridge University

Press.

• van Leeuwen, Marco H.D., Ineke Maas and Andrew

Miles. 2002. HISCO: Historical international standard

classification of occupations. Leuven: Leuven University

Press. 36

Recommended