29
Using texts to explore historical texts: Examples from Lake District literature and the Registrar General’s Reports Ian Gregory Lancaster University Acknowledgements: Alistair Baron, Patricia Murrieta-Flores, Andrew Hardie , and Paul Rayson (Lancaster) Claire Grover (Edinburgh) providing access to the geo-reference Histpop data Richard Deswarte help with the HistPop data

Ig ihr 2012

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ig ihr 2012

Using texts to explore historical texts: Examples from Lake District literature and the

Registrar General’s Reports

Ian Gregory

Lancaster University

Acknowledgements:

Alistair Baron, Patricia Murrieta-Flores, Andrew Hardie , and Paul Rayson (Lancaster)

Claire Grover (Edinburgh) – providing access to the geo-reference Histpop data

Richard Deswarte – help with the HistPop data

Page 2: Ig ihr 2012

What is GIS?

Page 3: Ig ihr 2012

Change in Infant Mortality in England & Wales, 1851-2001

0

20

40

60

80

100

120

140

160

180

1851 1861 1871 1881 1891 1901 1911 1921 1931 1941 1951 1961 1971 1981 1991 2001

IMR

Page 4: Ig ihr 2012

Traditional HGIS: Infant mortality decline in England & Wales, 1851-1911

-30

-20

-10

0

10

20

30

1850s 1860s 1870s 1880s 1890s 1900s

% n

atio

nal r

ate

.

1

2

3

4

5

6

7

8

Source: Gregory (2008)

Annals of the Assoc. of

American Geographers

Page 5: Ig ihr 2012

Distant Reading

Graphs (p. 16) Maps (p. 55) Trees (p. 73)

Moretti (2005) Graphs, Maps, Trees

Page 6: Ig ihr 2012

Literary Mapping of the Lakes

• British Academy funded pilot project with David Cooper and Sally Bushell

• Two tours of the Lake District – Thomas Gray, 1769 (9,000 words)

• Proto-Picturesque

– ST Coleridge, 1802 (10,000 words) • Romantic

• Aims: – Can we create a GIS of text?

– What can it offer to literary research?

• Method: – Texts typed up by hand

– Places tagged manually

– Conversion

– Analysis

Page 7: Ig ihr 2012

Place names coded in XML

<p in_text="Y">On Sunday Augt. 1st - half after 12 I had a Shirt, cravat, 2 pair of

Stockings, a little paper &amp; half a dozen Pens, a German Book (Voss's Poems)

&amp; a little Tea &amp; Sugar, with my Night Cap, packed up in my natty green oil-

skin, neatly squared, and put into my <format format_type="I">net</format>

Knapsack / and the Knap-sack on my back &amp; the Besom stick in my hand, which

for want of a better, and in spite of <person>Mrs C.</person> &amp;

<person>Mary</person>, who both raised their voices against it, especially as I left

the Besom scattered on the Kitchen Floor, off I sallied - over the

Bridge<my_comment><pl_name visited="Y">Greta Bridge,

Keswick</pl_name></my_comment>, thro' the Hop-Field, thro' the <pl_name

visited="Y">Prospect Bridge</pl_name> at <pl_name

visited="Y">Portinscale</pl_name>, so on by the tall Birch that grows out of the

center of the huge Oak, along into <pl_name visited="Y">Newlands</pl_name>--

<pl_name visited="Y">Newlands</pl_name>is indeed a lovely Place-the houses…

Page 8: Ig ihr 2012

Convert to a GIS

OS 1:50,000 gazetteer – all places on 1:50,000 maps

• Accuracy

• Spelling problems

• Disambiguation

Page 9: Ig ihr 2012

Coleridge & Gray in a GIS

Page 10: Ig ihr 2012

Smoothed surface of Gray’s places

All mentions Visits

Page 11: Ig ihr 2012

Smoothed surface of Coleridges’s places

All mentions Visits

Class intervals are 10 equal intervals of

the all mentions. Bandwidth=10km

Page 12: Ig ihr 2012

Comparing Coleridge and Gray

Green: Only in Gray

Yellow: Evenly in both

Red: Only in Coleridge

All mentions Visits

Page 13: Ig ihr 2012

Mapping Emotional Response

Gray Coleridge

Page 14: Ig ihr 2012

Physical Characteristics of Tours

0

100

200

300

400

500

600

700

STC Not visited STC Visited Grey Not visited Grey Visited

Po

p D

en

sit

y

Normal

1

10

100

1000

STC Not visited STC Visited Grey Not visited Grey VisitedP

op

. D

en

sit

y

Logged

Population density Altitude of mentions

0

10

20

30

40

50

60

70

0 to 99 100 to 199

200 to 299

300 to 399

400 to 499

500 to 599

600 to 699

700 to 799

800+

% o

f m

en

tio

ns

Height

Visited Didn't visit/Unclear

0

10

20

30

40

50

60

70

0 to 99 100 to 199

200 to 299

300 to 399

400 to 499

500 to 599

600 to 699

700 to 799

800+

% o

f m

en

tio

ns

Height

Visited Didn't visit/Unclear

Gra

y C

ole

rid

ge

Page 15: Ig ihr 2012

Close Reading with Internet Mapping

http://www.lancs.ac.uk/mappingthelakes http://www.lancs.ac.uk/mappingthelakes/v2

Page 16: Ig ihr 2012

The Histpop Collection

• Covers the printed reports published in the Census and the Registrar General’s Annual Reports, 1801-1937

• Nearly 13,000,000 words

• Georeferenced by C. Grover (University of Edinburgh)

• Just concerned with the Registrar General’s Reports, 1851-1911

• Total: 3,750,000 words

• England & Wales: 2,000,000 words

• http://www.histpop.org

Page 17: Ig ihr 2012

Dot maps of place-name instances

Page 18: Ig ihr 2012

Place-name instances, 1850s

Density Smoothing Cluster identification:

Standard deviations

of density www.histpop.org

Page 19: Ig ihr 2012

Extract place-names Word

Frequency Cnt Kernel

Density Density Cnt

North Shields 300 Bermondsey

.5849 6

London 294 Newington .5842 4

Durham 207 Spitalfields .5835 1

Nottingham 193 Whitechapel .5835

1

Liverpool 171 Stepney .5823 2

Hawarden 145 Rotherhithe .5809 5

Grantham 131 London .5803 294

Cardington 125 Shoreditch .5794 1

Linslade 121 Bethnal Green .5788 4

Wakefield 121 Camberwell .5787 12

58th: Southwick (nr Sunderland)

.3498 1

Page 20: Ig ihr 2012

Collocation

• “In Southwick and Monkwearmouth offensive nuisances abound.”

• “At Royton, in Oldham, where the drainage was imperfect, typhoid fever was prevalent”

• “The deaths in the Liverpool workhouse, in the Mount Pleasant sub-district of Liverpool, were above 100 more than in the same period of the two previous years, owing chiefly to an epidemic of measles among children of German emigrants temporarily located in this institution; there were also 101 deaths from typhus, nearly all of which occurred in the workhouse.”

Page 21: Ig ihr 2012

KWIC of “West Bromwich”

Page 22: Ig ihr 2012

Most common words in clusters • Uses Mutual Information scores – top 10 for each cluster, excluding place-names, numbers,

and punctuation

• 1 (North-East): Fog, took [changes in rainfall or temperature took place], largest [changes in weather], least [as largest], dense [weather related], greatest [weather], observatory, Asiatic [cholera], Halos [lunar or solar], thunder. WEATHER

• 2 (Wakefield): Falls, rain, seen [meteorological phenomena or “swallows”], reading, fell [snow or rain], number [met. readings], June, March. WEATHER

• 3 (South Lancs): declining [marriages, births or mortality], incorporated [boundary changes], noted [health or weather], cubic [cubic feet – earth movement for sanitation], workhouse, sail [Irish emigrants sailing from Liverpool], observatory, aurora, salutary [salutary effects that led to death], took [weather]. MIXED

• 4 (Oxon to Beds): cuckoo [was first heard], infirmary, Regius Professor, intermittent [intermittent fevers], sleet, solar, halos, least [rainfall or temperature], heard [thunder], thunder - WEATHER

• 5 (London): changed [changed water supply], anemometer, exclusively [supplied by one water company], hospital, command [front matter], Junction [Grand Junction Water Company], Company [almost always water company], pipes, Bills [Bills of Mortality], asylum, sewage – WATER SUPPLY

Page 23: Ig ihr 2012

“Company” in Cluster 5

Page 24: Ig ihr 2012

Mentions of diseases collocating to place-names

Diarrhoea Diphtheria Dysentery MeaslesScarlet-Fever

SmallpoxWhooping

-cough

Mentions_1850-1911 1555 1261 332 1513 964 333 23

0

200

400

600

800

1000

1200

1400

1600

Fre

qu

en

cyMentions of diseases from 1850 to 1910

0

100

200

300

400

500

600

700

1850 1860 1870 1880 1890 1900 1910

Me

nti

on

s

Decades

Diseases related to placenames

Whooping cough

Smallpox

Dysenterya

Scarlet Fever

Diphtheria

Measles

Diarrhoea

Page 25: Ig ihr 2012

Places that collocate with “measles”

www.histpop.org

Page 26: Ig ihr 2012

Comparing texts with statistics

0

10

20

30

40

1 2 3 4 5 6 7 8

%

Urban Level

Mentions of measles

Districts

Population

% national

pop (1911)

Sample areas

1 9.4 Stow on the Wold (Glou), Whitchurch (Hants.), Hexham (N’humb), Oakham (Rutland), Northallerton (N.Rid.), Holbeach (Lincs)

2 13.0 Cockermouth (Cumb), Chippenham (Wilts), Bridport (Dorset), Bangor (Carn), Alton (Hants), Pembroke (Pembs)

3 17.8 Guildford (Surrey), Redruth (Corn), York (E.Rid), Bucklow (Chesh), Chorley (Lancs), Maidstone (Kent)

4 18.7 Swansea, Canterbury, Hastings, Rochdale, Bolton, Wolverhampton

5 18.0 Sheffield, Leeds, Oxford, Southampton, Coventry, Edmonton (Mdlsex)

6 11.9 Exeter, Hull, Nottingham, Portsmouth, Leicester, Salford (Lancs)

7 9.0 Most of London, also Manchester, Liverpool and Birmingham

8 2.1 Only London, mainly East End

Page 27: Ig ihr 2012

Do mentions of “Diarrhoea, dysentery and cholera” correlate with deaths from these diseases?

IMRchdidy Mchdiady

Correlation Coefficient 1.000 .225**

Sig. (1-tailed) .000

N 626 626

Correlation Coefficient .225** 1.000

Sig. (1-tailed) .000

N 626 626

Correlation Coefficient 1.000 .290**

Sig. (1-tailed) .000

N 626 626

Correlation Coefficient .290** 1.000

Sig. (1-tailed) .000

N 626 626

**. Correlation is significant at the 0.01 level (1-tailed).

Kendall's tau_b IMRchdidy

Mchdiady

Spearman's rho IMRchdidy

Mchdiady

Page 28: Ig ihr 2012

Geographical Text Analysis

• Combination of Corpus Linguistics and GIS allows us to: – 1. Geographical approach:

• Ask where is this corpus talking about?

• Identify place-names in areas that the corpus concentrates on.

• Find out what it is saying about these places

– 2. Theme of interest approach:

• Find out which places are associated with our theme

• Find out what it is saying in relation to this theme

• Find out what other themes are associated with these places

• Compare geography of place-name mentions with statistical evidence to explore biases in sources

Page 29: Ig ihr 2012

Further work

• HistPop

• BL’s C19th Century Newspapers

• Other sources