Upload
camron-grew
View
218
Download
1
Embed Size (px)
Citation preview
Corpus Linguistics and Stylistics
PALA Summer School, Maribor, 2014
In this lecture...
• Stylistics and style• Combining stylistics + corpus linguistics• Examples of studies combining corpus linguistics
and stylistics– Analysis of genres– Analysis of the works by particular authors– Analysis of individual texts– Analysis of variation inside texts
• Corpus Tools– WMatrix
Stylistics
Stylistics is the study of literature using methods, theories and concepts from linguistics (Leech and Short 2007: 1)
it is "[...] the study of the relationship between linguistic form and literary function [...]” (Leech and Short 2007: 3).
Linguistic style
‘Style is a way in which language is used’(Leech and Short 2007: 31)
‘[S]tyle consists in choices made from the repertoire of the language.’(Leech and Short 2007: 31)
Linguistic style
‘Stylistic choice is limited to those aspects of linguistic choice which concern alternative ways of rendering the same subject matter’(Leech and Short 2007: 31)
e.g. horse vs. steed but not horse vs. dog
Linguistic style
• Style and genre, e.g. science fiction, romance novels, etc.
• Style and author• Style and text• Style and parts of texts (e.g. the narration or
speech of different characters)
Ways of analysing style
• Analyst’s intuitions• ‘Manual’ comparative analysis
Ways of analysing style
Style and comparison‘Even if style is defined as that variety of language which correlates with context, the recognition and analysis of styles are squarely based on comparison. The essence of variation, and thus of style, is difference, and differences cannot be analysed and described without comparison.’ (Enkvist 1973: 21)
Ways of analysing style
• Comparative analysis – manually– OK for shorter texts/extract
• Comparative analysis – using computers:– Corpus linguistic methods/tools– Especially useful for longer texts – prose fiction
Combining corpus linguistics and stylistics
• The ‘corpus turn’ (Leech and Short 2007:284).• On-going trend in stylistics to use methods
and tools from corpus-linguistics for the analysis of literary and other texts.
• Usually referred to as corpus stylistics• Other terms:
digital stylistics (Louw 2008)electronic text analysis (Adolphs 2006)
Examples of studies
• Combining corpus linguistics and stylistics– Analysis of genres– Analysis of the works by particular authors– Analysis of individual texts– Analysis of variation inside texts
Genre style
• Biber (1988) – multivariate statistical techniques– factor analysis– many different variables– variables = linguistic features (e.g. passive constructions)
• e.g. narrative versus non-narrative texts– important variables = past tense verbs, 3rd person
pronouns, perfect aspect, present participle clauses
– High scores = narrative– Low scores = non-narrative
A range not a dichotomy
narrative / non-narrative
the top text-types
the bottom text typesthere exists a whole range of text-types in the middle – it’s not just a two-way distinction
Note also –spoken and written genres are mixed together along the dimension
Genre style – direct speech
Corpus-based study of speech, writing and thought presentation(Semino and Short 2004)
Genre style – direct speech
Corpus of 260,000 (approx) words of (late) 20th century written British English
• 120 text samples • 2,000 (approx) words each, amounting to a
total of 258,348 words. It is divided into three sections:
Genre style – direct speech
Corpus divided into three sections:
– prose fiction (87,709 words), – newspaper news reports (83,603 words), and– biography and autobiography (87,036 words)
Each genre section further divided into a ‘serious’ and a ‘popular’ sub-sections.
Genre style – direct speech
• Corpus tagged – manually
<sptag cat=NRS next=DS s=0.37 w=7>The theme park’s manager, Mike Slattery said: <sptag cat=DS next=NRS s=1.63 w=18>‘By closing Crinkley Bottom, the council has shot Morecambe in the foot. And I’m out of a job.’
Genre style – direct speech
Section of the corpus Number of instances of DSWhole corpus 2,974
Fiction 1,569
Press 770
(Auto)biography 635
Fiction sub-section Number of instances of DS
Serious 629
Popular 940
Authorial style
• Studies attempting to ‘fingerprint’ authors: i.e. to identify linguistic items that distinguish the works by one author from those of others.
• Burrows (1987): study of Jane Austen’s novels focusing on closed-class words, such as the, and, of, a and to.
• Burrows found that these words can distinguish the works of different authors , different novels, and even the words spoken by different characters.
Authorial style
• Hoover (2002) studied a series of corpora containing chunks from novels by different authors.
• For example, he looked at a corpus containing the first 30,000 words of 29 novels by 17 different authors.
• The distribution of the 300 most frequent words in the corpus as a whole correctly clusters 15 out of 17 novels.
Authorial style
• An analysis of the most frequent word sequences (n-grams) can also be useful, e.g. – of the– in the – to the – it was– he was– and the
Authorial style
• Mahlberg (2007, 2009, 2012) • Corpus stylistics and Dickens’s fiction• Also shows that analysis of frequent
word sequences (clusters) can be useful.
• Clusters containing body parts– “his hands in his pockets”– “his head on one side”– “his hands upon his”
Text style
• Stubbs’s (2005) study of Joseph Conrad’s Heart of Darkness, first published in 1899.
• Marlow, the protagonist and first-person narrator, tells of how he was contracted to travel up a river in the Belgian Congo, in order to find an ivory trader called Kurtz, who was the subject of stories of madness and suspect practices. However, Kurtz dies while travelling back down the river.
• Main themes– ‘hypocrisy of the colonizers’– ‘unreliability of progress and civilization’ – ‘breakdowns in communication’– Light vs. dark– Restraint vs. frenzy– Appearance vs. reality– Marlow’s ‘unreliable and distorted knowledge
(Stubbs 2005: 8-9)
Text style
Text style
• Used WordSmith Tools (Scott 2007)• Compared one novel with a corpus of fictional texts
of around 700,000 words• Overused words in novel include: seemed, mystery,
darkness, absurd, horror, terror, desolation• Several words concern uncertainty, perception and
knowledge.• Coincide with some of the novel’s themes
Text style
• Stubbs shows how the application of corpus methods can provide:– further justification for well-established
interpretations, – new insights into the language and meaning
potential of the text.
Text style: variation inside texts
• Culpeper (2002) used WordSmith Tools to do a key-word analysis of the speech of the main characters in Romeo and Juliet
• A file with the words spoken by each character was compared to a ‘reference corpus’ containing the words of all the other characters.
• Findings are relevant to an understanding of how the characters are linguistically constructed (characterisation).
Text style: variation inside texts
Juliet’s key-words (raw frequencies in brackets):
If (31), Or (25), Sweet (16), Be (59), News (9), My (92), Night (27), I (138), Would (20), Yet (18), Thou (71), Words (5), Name (11), Nurse (20), Tybalt’s (6), Send (7), Husband (7), That (82), Swear (5)
Text style: variation inside texts
Key-words such as if, or, would, yet can be related to Juliet’s tendency to express uncertainty and anxiety throughout the play:
‘I fear it is: and yet, methinks, it should not, For he hath still been tried a holy man’ (IV.iii.)[Context: Wondering whether the Friar has supplied sleeping potion or poison]
Corpus tools
Corpus tools make comparison relatively easy• WordSmith Tools (Scott 2007)• WMatrix (Rayson 2009)• AntConc (Anthony 2011)• MLCT (Piao)
Summary
• Style is the way in which language is used.• The notion of ‘style’ is fundamentally based on
comparison• Corpus linguistic methods are relevant to the
analysis of style in fiction/literature.• They have been applied to the analysis of
genres, authors and texts.• Manual analysis and interpretation of the
output from corpus tools is needed.
Summary
[...] ‘corpus stylistics’ is not purely a quantitative study of literature. Rather, it is still a qualitative stylistic approach to the study of the language of literature, combined with or supported by corpus-based quantitative methods and technology.(Ho 2011:10)
ReferencesCulpeper, J. (2009) “Keyness: words, parts-of-speech and semantic categories in the character-talk of
Shakespeare’s Romeo and Juliet” International Journal of Corpus Linguistics, 14(1): 29-59. Ho, Y. (2011) Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles’ The Magus.
London: Continuum Leech, G. (2008) Language in Literature: style and foregrounding Harlow, UK: PearsonLouw, B. (2008) "Consolidating Empirical method in data-assisted stylistics: Towards a corpus-attested
glossary of literary terms" in Zyngier, S., Bortlussi, M., Chesnokova, A. and Auracher, J. Directions in Empirical Literary Studies, pp. 243-264. Amsterdam: Benjamins.
Mahlberg M. (2007) “Clusters, Key Clusters and local textual functions in Dickens” Corpora 2(1): 1-31Mahlberg, M. (2009) “Corpus Stylistics and the Pickwickian watering-pot”, in Contemporary Corpus
Linguistics Baker, P. (ed.) Contemporary Corpus Linguistics, pp47-63. London: Continuum.Mahlberg, M. (2012) Corpus Stylistics and Dickens’s Fiction. London: RoutledgeMcIntyre, D. (2010) “Dialogue and Characterization in Quentin Tarantino’s Reservoir Dogs: A Corpus Stylistic
Analysis”, in McIntyre, M. and Busse, B. (eds.) Language and Style pp 162-182. Basingstoke: Palgrave. McIntyre, D. and Walker, B. (2010) 'How can corpora be used to explore the language of poetry and drama?'
in McCarthy, M. and O’Keefe, A. (eds) The Routledge Handbook of Corpus Linguistics. London: RoutledgeWiddowson, H. G. (2008) “The Novel Features of Text. Corpus Analysis and Stylistics” in Gerbig, A. and
Mason, O. (eds.)Language, People, Numbers: Corpus Stylistics and Society, pp. 293-304. Amsterdam: Rodopi.
WMatrix
WMatrix
• Web-based corpus tool• Developed by Paul Rayson at Lancaster
University• Automated grammatical and semantic analysis
of texts/corpora• A web-based front end for CLAWS and USAS
WMatrix
Using a web interface: • Texts are uploaded onto the Wmatrix server
(at Lancaster)• The upload procedure automatically adds
(i) Grammatical or Part of Speech (POS) tags;(ii) Semantic tags
WMatrix
• CLAWS grammatical (POS) tagger.CLAWS = Constituent Likelihood Automatic Word-tagging System
• USAS semantic taggerUSAS = UCREL Semantic Analysis System
• (UCREL = University Centre for Corpus Research on Language)
WMatrix
USAS
• Assigns tags to each word using a hierarchical framework of categorization
• Based originally on McArthur’s (1981) Longman Lexicon of Contemporary English
The 21 Top Level Semantic Categories of the USAS Tag-set
AGENERAL & ABSTRACT TERMS
BTHE BODY & THE INDIVIDUAL
CARTS & CRAFTS
EEMOTION
FFOOD & FARMING
GGOVERNMENT & PUBLIC DOMAIN
HARCHITECTURE, HOUSING & THE HOME
IMONEY & COMMERCE (IN INDUSTRY)
KENTERTAINMENT
LLIFE & LIVING THINGS
MMOVEMENT, LOCATION, TRAVEL, TRANSPORT
NNUMBERS & MEASUREMENT
OSUBSTANCES, MATERIALS, OBJECTS, EQUIPMENT
PEDUCATION
QLANGUAGE & COMMUNICATION
SSOCIAL ACTIONS, STATES & PROCESSES
TTIME
WWORLD & ENVIRONMENT
XPSYCHOLOGICAL ACTIONS, STATES & PROCESSES
YSCIENCE & TECHNOLOGY
ZNAMES & GRAMMAR
WMatrix
G - Government and the public domain G1.1
G1.2
Government, politics and elections
Crime, law and order
War, defence and the army: weapons
Government, etc.
Politics
G1
G2
G3
WMatrix
Allows analysis of texts at :
– the word level– the grammatical level (POS)– and the semantic level
WMatrix
Allows text comparison at:
– the word level– the grammatical level (POS)– and the semantic level
WMatrix
Keyness
• Word level – Key-words• Grammatical level – Key-POS • Semantic level – Key-concepts