52
Key Cluster Patterns in Shakespeare 2009 Aston Symposium 22 May 2009 Mike Scott

Key Cluster Patterns in Shakespeare 2009 Aston Symposium 22 May 2009 Mike Scott

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Key Cluster Patterns in Shakespeare

2009 Aston Symposium22 May 2009Mike Scott

…in pursuit of the…

"cunning'st pattern of excelling nature" (Othello)

or

but sound and fury signifying nothing?

Abstract Key words (KWs) in Shakespeare plays have been shown to

belong to certain category-types such as theme-related KWs, character-related KWs.

Other KWs, generally the more interesting ones, seem to be pointers to other patterns indicative of quite specific features of the language, or of the status of characters or of individual sub-themes.

It may be that there is a tension between global KWs and much more localised, "bursty" ones in this regard. 

The presentation turns attention now to key word clusters, that is n-grams which are shown to occur distinctively in each individual play, or in the speeches of an individual character. The diverse types of patterns are what will be explored here.

Are n-grams a mere coincidence of relatively frequent words co-occurring frequently so that they are but sound and fury signifying nothing?

Alas poor Yorick! Double, double toil and trouble And thereby hangs a tale Friends, Romans, countrymen, lend

me your ears A blinking idiot Beggar'd all description

yet

Crystal & Crystal (2002) only list one-word headwords

Aims

• take previous key word (KW) analysis of Shakespeare plays up one level

• by examining KW clusters

… a proviso

no claim to illuminate understanding of the plays,

the objective being to understand more about keyness and key words

Clusters

sequences of consecutive words repeatedly found in corpora Biber's "bundles" n-grams

no guarantee they are "phrases"

In WordSmith, n is between 2 and 8

Why bother?

(increasing awareness that words don't act alone…

and anyway some inconsistencies e.g. "behind" v. "in front

of" "France" v. "Saudi

Arabia" v. "United Arab Emirates")

…but hang about in gangs)

So how should we think about words?

When you pick up a word,

you pick up another two

or three….

Keyness A word is said to be "key" if a)        it occurs in the text at least as many times as

the user has specified as a Minimum Frequency b)        its frequency in the text when compared with

its frequency in a reference corpus is such that the statistical probability as computed by an appropriate procedure is smaller than or equal to a p value specified by the user.

(WordSmith manual)

KW Clusters

re-interpreting "word" to include "cluster"

so the questions are1. How much overlap is there between

KWs and KW clusters? 2. What (if anything) do key clusters

show that KWs don't?

Procedures

with the 1916 OUP Shakespeare corpus at my site

build one overall "index" which knows the positions and neighbours of each word in all 37 plays

compute 2-word clusters using the index build one individual index for each of the

plays compute 2-word clusters for each play

using its index

Procedures (cont.)

repeat previous steps for all lengths of cluster 2 to 5

result = 38 indexes 37 × 4 = 152 individual play cluster

wordlists 4 cluster wordlists for the set of 37

plays

single-word list (all the plays)

N Word Freq. % Texts %

1 THE 26,831 3.29 37 100.00

2 AND 24,110 2.95 37 100.00

3 I 20,536 2.51 37 100.00

4 TO 19,155 2.35 37 100.00

5 OF 15,997 1.96 37 100.00

6 A 13,980 1.71 37 100.00

7 YOU 13,855 1.70 37 100.00

8 MY 12,283 1.50 37 100.00

9 THAT 10,760 1.32 37 100.00

10 IN 10,569 1.29 37 100.00

pure grammar

2-word clusters

N Word Freq. % Texts %

1 I AM 1,858 0.23 37 100.00

2 MY LORD 1,685 0.21 36 97.30

3 I HAVE 1,628 0.20 37 100.00

4 I WILL 1,582 0.19 37 100.00

5 IN THE 1,582 0.19 37 100.00

6 TO THE 1,518 0.19 37 100.00

7 OF THE 1,376 0.17 37 100.00

8 IT IS 1,079 0.13 37 100.00

9 TO BE 971 0.12 37 100.00

10 THAT I 914 0.11 37 100.00

I + AUX

incomplete prepositional phrases

3-word clusters

N Word Freq. % Texts %

1 I PRAY YOU 250 0.03 34 91.89

2 I WILL NOT 214 0.03 36 97.30

3 I KNOW NOT 162 0.02 36 97.30

4 I DO NOT 160 0.02 33 89.19

5 I AM A 141 0.02 35 94.59

6 I AM NOT 139 0.02 34 91.89

7 MY GOOD LORD 132 0.02 29 78.38

8 AND I WILL 129 0.02 34 91.89

9 I WOULD NOT 126 0.02 34 91.89

10 THIS IS THE 122 0.01 36 97.30

negatives

4-word clusters

N Word Freq. % Texts %

1 WITH ALL MY HEART 47 21 56.76

2 I KNOW NOT WHAT 39 20 54.05

3 GIVE ME YOUR HAND 34 19 51.35

4 I DO BESEECH YOU 33 17 45.95

5 GIVE ME THY HAND 31 22 59.46

6 I DO NOT KNOW 29 17 45.95

7 I WOULD NOT HAVE 26 18 48.65

8 AY MY GOOD LORD 25 13 35.14

9 WHAT IS THE MATTER 25 13 35.14

10 GIVE ME LEAVE TO 24 18 48.65

requesting etc., social interactions

5-word clusters

N Word Freq. % Texts %

1 I AM GLAD TO SEE 16 9 24.32

2 I THANK YOU FOR YOUR 12 11 29.73

3 FOR MINE OWN PART I 10 8 21.62

4 I HAD RATHER BE A 9 8 21.62

5 WITH ALL MY HEART AND 9 8 21.62

6 AM GLAD TO SEE YOU 8 5 13.51

7 AS I AM A GENTLEMAN 8 6 16.22

8 I PRAY YOU TELL ME 8 7 18.92

9 KNOW NOT WHAT TO SAY 8 8 21.62

10 SO I TAKE MY LEAVE 8 7 18.92

social formulae

Procedures (cont.)

compare the 2-cluster wordlists of each play with the 2-cluster wordlist of all the plays

repeat for 3-, 4- and 5-word clusters 37 × 4 = 148 key cluster lists

KW settings

p value = 0.001 minimum frequency = 2 negative KW clusters excluded

Key 3-clusters in Lear

just a title

N Concordance

1 night. Have you not spoken 'gainst the Duke of Cornwall? He's coming hither,

2 father, and given him notice that the Duke of Cornwall and Regan his duchess

3 and foolish. Holds it true, sir, that the Duke of Cornwall was so slain? Most

4 Gloucester, I'd speak with the Duke of Cornwall and his wife. Well, my

repetition!

When we are born, we cry that we are come

To this great stage of fools. This' a good block!

It were a delicate stratagem to shoe

A troop of horse with felt; I'll put it in proof,

And when I have stol'n upon these sons-in-law,

Then, kill, kill, kill, kill, kill, kill! (Lear)

more repetition!And my poor fool is hang'd! No, no, no life!Why should a dog, a horse, a rat, have life,And thou no breath at all? Thou'lt come no more,Never, never, never, never, never!Pray you, undo this button: thank you, sir.Do you see this? Look on her, look, her lips,Look there, look there!

</LEAR><STAGE DIR><Dies.></STAGE DIR>

Character-specific

the foul fiend (Edgar) Tom's a cold (Edgar) i' the middle (Fool)

theme of the play

dost thou know? thou know me?

speech-specific, rhythmicHave more than thou showest,Speak less than thou knowest,Lend less than thou owest,Ride more than thou goest,Learn more than thou trowest,Set less than thou throwest; Leave thy drink and thy whore,And keep in-a-door,And thou shalt have moreThan two tens to a score

RQ 1 (How much overlap is there between KWs and KW clusters?)

Procedure

For selected plays (Hamlet, Romeo, Henry IV part 1, As You Like It):1. Save the column of single word KWs as a plain text file2. Save the column of 2-cluster KWs as a separate file too3. Save the columns of 3-, 4- and 5-cluster KWs likewise4. Make wordlists of these "texts"5. Compute "detailed consistency" of these wordlists6. Use "Set" function to classify items which appear in various

listings 7. Identify the percentage of words which appear in the KW-cluster

lists but not in the single word KW listings & vice-versa8. Identify items which appear in numerous listings.

Romeo and Juliet

There are 43% (207-117 = 90) of the KWs which come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.

2s not found in the single KW list include high frequency grammar items (THE, MY, AT, TO etc.)

2s which are not found elsewhere in any cluster include SHALL

3s not found elsewhere include TELL, WHERE

4s not found elsewhere include COMMEND

types in KW list but not in KW clusters (A-C)

AH, ALACK, AN, APOTHECARY, BED, BENVOLIO, CAPULET, CLOUDS, CORDS, CORSE

Common to 4 or 5 KW listings

HER, O, SILVER, A, ART, BOTH, JULE, LADY, PLAGUE, SOUND, THOU, THY, WITH YOUR

As You Like It There are 48% (190-98 = 92) KWs which

come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.

2s not found in the single KW list include high freq. grammar items (THE, OF, FOR, AND)

2s which are not found elsewhere include HIM, WHO

3s not found elsewhere include AT, WOULD

types in KW list but not in KW clusters (A-C)

ADAM, ALIENA, AMBLES, AURDEY, BEARDS, CELIA, CHARLES, CLOWN, COUNTERFEITED, COUTIER'S, COVERED, COZ, CURED

Henry IV part 1 There are 43% (204-117 =87) KWs which come into

the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.

2s not found in the single KW list include high frequency grammar items (IN, TO, YOU) but also SIR, TRUE

2s which are not found elsewhere include TWO, FEAR, FIRE, CUDGEL

3s not found elsewhere include WELL, WHY, FATHER 4s not found elsewhere include GIVE, ARE, DOOR, LET

types in KW list but not in KW clusters (A-C)

AFOOT, BANISH, BARDOLPH, CLIFTON, COMPULSION, COUNTERFEIT, COWARD

Hamlet There are (44%) 140-79 =61 KWs which

come into the 2-,3-,4-,or 5-word KW clusters but are absent from the single KW list.

2s not found in the single KW list include high freq. grammar items (MY, AND OF) but also GOOD

2s which are not found elsewhere include FROM, O, OUR, IS, IN

3s not found elsewhere include HOW, LIFE, EXCEPT, YOUR, REVENGE, NOT, OWN

types in KW list but not in KW clusters (A-C)

ACT, ARGAL, BERNARDO, CLOSES, CUSTOM

Common to 3 or 4 KW listings

NUNNERY, A, HAMLET, HAVE, I, IT, LORD, OPHELIA, THE, TO, WAGER

RQ 1: How much overlap is there between KWs and KW clusters?

More than 50% of the single-word KWs are in the clusters

but the clusters add some 40% or more extra words

not all additions are grammatical Key clusters tail off at 4 or 5

at 4 Kws, which play is this?

midsummer night's dream

all's well that ends well

anthony & cleopatra

"bursty" keyness?

bursts (1)

midsummer night's dream

bursts (2)

julius caesar

bursts (3)

macbeth

bursts of burstiness

as you like it

compare burstinesses?

king lear 2s (part)

3s and 4s

king lear

Conclusions

1. How much overlap is there between KWs and KW clusters?

Only a moderate amount; they highlight different aspects of the play

2. What (if anything) do key clusters show that KWs don't?

At the extremes they may highlight songs and very localised bursts in the play but by no means always or only this

<SHALLOW> It is well said, in faith, sir; and it is well said

indeed too. 'Better accommodated!' it is good; yea indeed, is it: good phrases are surely and ever were, very commendable. Accommodated! it comes of accommodo: very good; a good phrase.

</SHALLOW>

<BARDOLPH> Pardon me, sir; I have heard the word.

'Phrase,' call you it? By this good day, I know not the phrase; but I will maintain the word with my sword to be a soldier-like word, and a word of exceeding good command, by heaven. Accommodated; that is, when a man is, as they say, accommodated; or, when a man is, being, whereby, a' may be thought to be accommodated, which is an excellent thing.

</BARDOLPH>

References

• Crystal, David & Ben Crystal, 2002. Shakespeare's words. London: Penguin.

Join us in Liverpool