43
Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Embed Size (px)

Citation preview

Page 1: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Corpus Linguistics and Stylistics

PALA Summer School, Maribor, 2014

Page 2: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

In this lecture...

• Stylistics and style• Combining stylistics + corpus linguistics• Examples of studies combining corpus linguistics

and stylistics– Analysis of genres– Analysis of the works by particular authors– Analysis of individual texts– Analysis of variation inside texts

• Corpus Tools– WMatrix

Page 3: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Stylistics

Stylistics is the study of literature using methods, theories and concepts from linguistics (Leech and Short 2007: 1)

it is "[...] the study of the relationship between linguistic form and literary function [...]” (Leech and Short 2007: 3).

Page 4: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Linguistic style

‘Style is a way in which language is used’(Leech and Short 2007: 31)

‘[S]tyle consists in choices made from the repertoire of the language.’(Leech and Short 2007: 31)

Page 5: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Linguistic style

‘Stylistic choice is limited to those aspects of linguistic choice which concern alternative ways of rendering the same subject matter’(Leech and Short 2007: 31)

e.g. horse vs. steed but not horse vs. dog

Page 6: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Linguistic style

• Style and genre, e.g. science fiction, romance novels, etc.

• Style and author• Style and text• Style and parts of texts (e.g. the narration or

speech of different characters)

Page 7: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Ways of analysing style

• Analyst’s intuitions• ‘Manual’ comparative analysis

Page 8: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Ways of analysing style

Style and comparison‘Even if style is defined as that variety of language which correlates with context, the recognition and analysis of styles are squarely based on comparison. The essence of variation, and thus of style, is difference, and differences cannot be analysed and described without comparison.’ (Enkvist 1973: 21)

Page 9: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Ways of analysing style

• Comparative analysis – manually– OK for shorter texts/extract

• Comparative analysis – using computers:– Corpus linguistic methods/tools– Especially useful for longer texts – prose fiction

Page 10: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Combining corpus linguistics and stylistics

• The ‘corpus turn’ (Leech and Short 2007:284).• On-going trend in stylistics to use methods

and tools from corpus-linguistics for the analysis of literary and other texts.

• Usually referred to as corpus stylistics• Other terms:

digital stylistics (Louw 2008)electronic text analysis (Adolphs 2006)

Page 11: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Examples of studies

• Combining corpus linguistics and stylistics– Analysis of genres– Analysis of the works by particular authors– Analysis of individual texts– Analysis of variation inside texts

Page 12: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Genre style

• Biber (1988) – multivariate statistical techniques– factor analysis– many different variables– variables = linguistic features (e.g. passive constructions)

• e.g. narrative versus non-narrative texts– important variables = past tense verbs, 3rd person

pronouns, perfect aspect, present participle clauses

– High scores = narrative– Low scores = non-narrative

Page 13: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

A range not a dichotomy

narrative / non-narrative

the top text-types

the bottom text typesthere exists a whole range of text-types in the middle – it’s not just a two-way distinction

Note also –spoken and written genres are mixed together along the dimension

Page 14: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Genre style – direct speech

Corpus-based study of speech, writing and thought presentation(Semino and Short 2004)

Page 15: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Genre style – direct speech

Corpus of 260,000 (approx) words of (late) 20th century written British English

• 120 text samples • 2,000 (approx) words each, amounting to a

total of 258,348 words. It is divided into three sections:

Page 16: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Genre style – direct speech

Corpus divided into three sections:

– prose fiction (87,709 words), – newspaper news reports (83,603 words), and– biography and autobiography (87,036 words)

Each genre section further divided into a ‘serious’ and a ‘popular’ sub-sections.

Page 17: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Genre style – direct speech

• Corpus tagged – manually

<sptag cat=NRS next=DS s=0.37 w=7>The theme park’s manager, Mike Slattery said: <sptag cat=DS next=NRS s=1.63 w=18>‘By closing Crinkley Bottom, the council has shot Morecambe in the foot. And I’m out of a job.’

Page 18: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Genre style – direct speech

Section of the corpus Number of instances of DSWhole corpus 2,974

Fiction 1,569

Press 770

(Auto)biography 635

Fiction sub-section Number of instances of DS

Serious 629

Popular 940

Page 19: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Authorial style

• Studies attempting to ‘fingerprint’ authors: i.e. to identify linguistic items that distinguish the works by one author from those of others.

• Burrows (1987): study of Jane Austen’s novels focusing on closed-class words, such as the, and, of, a and to.

• Burrows found that these words can distinguish the works of different authors , different novels, and even the words spoken by different characters.

Page 20: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Authorial style

• Hoover (2002) studied a series of corpora containing chunks from novels by different authors.

• For example, he looked at a corpus containing the first 30,000 words of 29 novels by 17 different authors.

• The distribution of the 300 most frequent words in the corpus as a whole correctly clusters 15 out of 17 novels.

Page 21: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Authorial style

• An analysis of the most frequent word sequences (n-grams) can also be useful, e.g. – of the– in the – to the – it was– he was– and the

Page 22: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Authorial style

• Mahlberg (2007, 2009, 2012) • Corpus stylistics and Dickens’s fiction• Also shows that analysis of frequent

word sequences (clusters) can be useful.

• Clusters containing body parts– “his hands in his pockets”– “his head on one side”– “his hands upon his”

Page 23: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Text style

• Stubbs’s (2005) study of Joseph Conrad’s Heart of Darkness, first published in 1899.

• Marlow, the protagonist and first-person narrator, tells of how he was contracted to travel up a river in the Belgian Congo, in order to find an ivory trader called Kurtz, who was the subject of stories of madness and suspect practices. However, Kurtz dies while travelling back down the river.

Page 24: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

• Main themes– ‘hypocrisy of the colonizers’– ‘unreliability of progress and civilization’ – ‘breakdowns in communication’– Light vs. dark– Restraint vs. frenzy– Appearance vs. reality– Marlow’s ‘unreliable and distorted knowledge

(Stubbs 2005: 8-9)

Text style

Page 25: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Text style

• Used WordSmith Tools (Scott 2007)• Compared one novel with a corpus of fictional texts

of around 700,000 words• Overused words in novel include: seemed, mystery,

darkness, absurd, horror, terror, desolation• Several words concern uncertainty, perception and

knowledge.• Coincide with some of the novel’s themes

Page 26: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Text style

• Stubbs shows how the application of corpus methods can provide:– further justification for well-established

interpretations, – new insights into the language and meaning

potential of the text.

Page 27: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Text style: variation inside texts

• Culpeper (2002) used WordSmith Tools to do a key-word analysis of the speech of the main characters in Romeo and Juliet

• A file with the words spoken by each character was compared to a ‘reference corpus’ containing the words of all the other characters.

• Findings are relevant to an understanding of how the characters are linguistically constructed (characterisation).

Page 28: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Text style: variation inside texts

Juliet’s key-words (raw frequencies in brackets):

If (31), Or (25), Sweet (16), Be (59), News (9), My (92), Night (27), I (138), Would (20), Yet (18), Thou (71), Words (5), Name (11), Nurse (20), Tybalt’s (6), Send (7), Husband (7), That (82), Swear (5)

Page 29: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Text style: variation inside texts

Key-words such as if, or, would, yet can be related to Juliet’s tendency to express uncertainty and anxiety throughout the play:

‘I fear it is: and yet, methinks, it should not, For he hath still been tried a holy man’ (IV.iii.)[Context: Wondering whether the Friar has supplied sleeping potion or poison]

Page 30: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Corpus tools

Corpus tools make comparison relatively easy• WordSmith Tools (Scott 2007)• WMatrix (Rayson 2009)• AntConc (Anthony 2011)• MLCT (Piao)

Page 31: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Summary

• Style is the way in which language is used.• The notion of ‘style’ is fundamentally based on

comparison• Corpus linguistic methods are relevant to the

analysis of style in fiction/literature.• They have been applied to the analysis of

genres, authors and texts.• Manual analysis and interpretation of the

output from corpus tools is needed.

Page 32: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

Summary

[...] ‘corpus stylistics’ is not purely a quantitative study of literature. Rather, it is still a qualitative stylistic approach to the study of the language of literature, combined with or supported by corpus-based quantitative methods and technology.(Ho 2011:10)

Page 33: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

ReferencesCulpeper, J. (2009) “Keyness: words, parts-of-speech and semantic categories in the character-talk of

Shakespeare’s Romeo and Juliet” International Journal of Corpus Linguistics, 14(1): 29-59. Ho, Y. (2011) Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles’ The Magus.

London: Continuum Leech, G. (2008) Language in Literature: style and foregrounding Harlow, UK: PearsonLouw, B. (2008) "Consolidating Empirical method in data-assisted stylistics: Towards a corpus-attested

glossary of literary terms" in Zyngier, S., Bortlussi, M., Chesnokova, A. and Auracher, J. Directions in Empirical Literary Studies, pp. 243-264. Amsterdam: Benjamins.

Mahlberg M. (2007) “Clusters, Key Clusters and local textual functions in Dickens” Corpora 2(1): 1-31Mahlberg, M. (2009) “Corpus Stylistics and the Pickwickian watering-pot”, in Contemporary Corpus

Linguistics Baker, P. (ed.) Contemporary Corpus Linguistics, pp47-63. London: Continuum.Mahlberg, M. (2012) Corpus Stylistics and Dickens’s Fiction. London: RoutledgeMcIntyre, D. (2010) “Dialogue and Characterization in Quentin Tarantino’s Reservoir Dogs: A Corpus Stylistic

Analysis”, in McIntyre, M. and Busse, B. (eds.) Language and Style pp 162-182. Basingstoke: Palgrave. McIntyre, D. and Walker, B. (2010) 'How can corpora be used to explore the language of poetry and drama?'

in McCarthy, M. and O’Keefe, A. (eds) The Routledge Handbook of Corpus Linguistics. London: RoutledgeWiddowson, H. G. (2008) “The Novel Features of Text. Corpus Analysis and Stylistics” in Gerbig, A. and

Mason, O. (eds.)Language, People, Numbers: Corpus Stylistics and Society, pp. 293-304. Amsterdam: Rodopi.

Page 34: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

Page 35: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

• Web-based corpus tool• Developed by Paul Rayson at Lancaster

University• Automated grammatical and semantic analysis

of texts/corpora• A web-based front end for CLAWS and USAS

Page 36: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

Using a web interface: • Texts are uploaded onto the Wmatrix server

(at Lancaster)• The upload procedure automatically adds

(i) Grammatical or Part of Speech (POS) tags;(ii) Semantic tags

Page 37: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

• CLAWS grammatical (POS) tagger.CLAWS = Constituent Likelihood Automatic Word-tagging System

• USAS semantic taggerUSAS = UCREL Semantic Analysis System

• (UCREL = University Centre for Corpus Research on Language)

Page 38: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

USAS

• Assigns tags to each word using a hierarchical framework of categorization

• Based originally on McArthur’s (1981) Longman Lexicon of Contemporary English

Page 39: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

The 21 Top Level Semantic Categories of the USAS Tag-set

AGENERAL & ABSTRACT TERMS

BTHE BODY & THE INDIVIDUAL

CARTS & CRAFTS

EEMOTION

FFOOD & FARMING

GGOVERNMENT & PUBLIC DOMAIN

HARCHITECTURE, HOUSING & THE HOME

IMONEY & COMMERCE (IN INDUSTRY)

KENTERTAINMENT

LLIFE & LIVING THINGS

MMOVEMENT, LOCATION, TRAVEL, TRANSPORT

NNUMBERS & MEASUREMENT

OSUBSTANCES, MATERIALS, OBJECTS, EQUIPMENT

PEDUCATION

QLANGUAGE & COMMUNICATION

SSOCIAL ACTIONS, STATES & PROCESSES

TTIME

WWORLD & ENVIRONMENT

XPSYCHOLOGICAL ACTIONS, STATES & PROCESSES

YSCIENCE & TECHNOLOGY

ZNAMES & GRAMMAR

Page 40: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

G - Government and the public domain G1.1

G1.2

Government, politics and elections

Crime, law and order

War, defence and the army: weapons

Government, etc.

Politics

G1

G2

G3

Page 41: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

Allows analysis of texts at :

– the word level– the grammatical level (POS)– and the semantic level

Page 42: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

Allows text comparison at:

– the word level– the grammatical level (POS)– and the semantic level

Page 43: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014

WMatrix

Keyness

• Word level – Key-words• Grammatical level – Key-POS • Semantic level – Key-concepts