Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Paolo Monella
In the Tower of BabelModelling primary sources of
multi-testimonial textual transmissions
Digital Classicist & Institute of Classical Studies Seminar 2012London, 20 July 2012
2
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer
3
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer
4
The 'Babel issue' in a nutshell
● Each primary source uses a different writing (encoding, semiotic) system
● We want to compare the texts they bear
5
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer
6
Comparing texts
● Each primary source uses a different writing (encoding, semiotic) system
● We want to compare the texts they bear– Textual Criticism
– Processing (e. g. cross-corpus search)
7
MS A MS B Print ed. Cui’ꝑ̱̱
Gperuius
Gpervius
G
Comparing texts
● Each primary source uses a different writing (encoding, semiotic) system
● We want to compare the texts they bear– Textual Criticism
– Processing (e. g. cross-corpus search)
8
Comparing texts
● Each primary source uses a different writing (encoding, semiotic) system
● We want to compare the texts they bear– Textual Criticism
– Processing (e. g. cross-corpus search)
MS A MS B Queryui’ꝑ̱̱
Gperuius
Gpervius
G
9
Comparing texts
● Simplification: same writing system (no 'Babel' issue)
– simple variant (at graphemic layer)
MS A wG
↔ wG
MS B
yG
↔ yG
fG
↔ fG
↮ fG
eG
↔ eG
wyfeGrapheme
wyffeG
MS A MS B
10
Comparing texts
● Simplification: same writing system (no 'Babel' issue)
– no variant (at linguistic layer)
wyfeGrapheme
wyffeG
MS A MS B
wyfeWord
wyfeLetter
wyffeL
11
Comparing texts
● The same text (reading)?– Comparing texts at different layers
wyfeGrapheme
wyffeG
MS A MS B
wyfeLetter
wyffeL
Comparing atgraphemic layer
wyfeWord
Comparing atlinguistic layer
12
Comparing texts
wyfeGrapheme
wyffeG
MS A MS B
wyfeLetter
wyffeL
Diplomatic edition (Pierazzo)“Archive” edition (Vanhoutte)
● Types of edition in the print age
wyfeWord
Critical edition (Pierazzo)“Museum” edition (Vanhoutte)
13
Comparing texts
wyfeGrapheme
wyffeG
MS A MS B
wyfeLetter
wyffeL
● Digital edition
wyfeWord
Digitaldiplomatic
edition(Pierazzo)
14
Comparing texts
wyfeGrapheme
wyffeG
MS A MS B
wyfeLetter
wyffeL
● Digital edition
wyfeWord
Open sourcecritical edit.
(Bodard-Garcés)
15
Comparing texts
wyfeGrapheme
wyffeG
MS A MS B
wyfeLetter
wyffeL
● Digital edition
wyfeWord
Digital edit.as model:
sub-systems(Orlandi)
16
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer
17
In the Tower
ui’ꝑ̱Grapheme
peruiusG
perviusG
perviusL
MS A PR CMS B
perUiUsLetter
● Different Writing Systems (Graphemes):
A: ꝑ̱G, u
G (no u
G/v
G distinction)
B: uG (no u
G/v
G distinction)
C: uG, v
G peruiusWord
18
In the Tower
ui’ꝑ̱Grapheme
peruiusG
perviusG
perviusL
MS A PR CMS B
perUiUsLetter
● Different Alphabets (Letters):
A: uL (no u
L/v
L distinction)
B: uL (no u
L/v
L distinction)
C: uL, v
L peruiusWord
19
In the Tower
ui’ꝑ̱Grapheme
peruiusG
perviusG
perviusL
MS A PR CMS B
perUiUsLetter
● Same text at Linguistic layer (Inflected word)
“Perius”, nom. sing. masc. of lemma “perius,
-a, um”
peruiusWord
22
In the Tower
&#A751; K E &us;
C
2 ...C
2 - 4 K E K I
C
2 - 4 L E K I
C
2 ...C
MS A PR C
● Digital edition: Formalisation
MS B
(some code)C
23
In the Tower
● Life in the Tower of Babel (recapitulation):– Each builder, a language
● Each primary source, a semiotic system
– ...yet we want builders to interact● Textual Criticism● Processing (e. g. cross-corpus search)
– ...at different layers● Graphs, allographs, graphemes, linguistic...
– … all the sudden, all builders are robots!● Formalisation (no human intuition)
24
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer
25
Two issues
A. Comparison at linguistic layer
– Substantial readings (wyfeW ≠ lyf
W)
– Critical edition
– The 'text' (Inflected words)
B. Comparison at graphemic layer
– Accidentals (wyfeG ≠ wyffe
G)
– Diplomatic edition
– The 'spelling' (orthography, Graphemes)
26
Two issues
A. Comparison at linguistic layer– How do you identify elements at linguistic
layer (Inflected words) digitally?● The Canterbury Tales Project:
“Regularized spelling”● Tito Orlandi:
Linguistic entities
27
Two issues
B. Comparison at graphemic layer– Can you compare graphemes through
different graphemic systems?● The Canterbury Tales Project:
Unicode; Corpus-wide modelling of the graphemic system
● Tito Orlandi:MS-wide complete modelling of the graphemic system (and of other systems at other textual layers)
28
Two issues
● The Canterbury Tales Project● Tito Orlandi, Informatica Testuale, Laterza:
Roma 2010● My own project of an experimental edition
from the Anthologia Latina
29
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer
30
A. Comparison at linguistic layer
● Why comparing Words (at linguistic layer)? The Canterbury Tales Project:
– Textual criticism● Relations between MSS
– Processing● Indexing (“Spelling database”)
wyfeG
wyffeG
MS B
wyfeW
MS A
31
A. Comparison at linguistic layer
● From Grapheme to Word, not Lemma
– Inflected word (wyfeW ≠ wyves
W)
● Not as element of language (langue)● But as as element of that text (parole)
– “Substantial reading”
– Critical edition
wyfeG
wyffeG
MS B
wyfeW
MS A
32
A. Comparison at linguistic layer
● From Grapheme to Word: semi-automatic– Not computable so far
– Linguistic competence required● Grapheme → Letter: ambiguous graphemes
IonG
→ JonL / Ion
L → Jon
W / Ion
W
doꝑ̱G → perdo
L / prodo
L → perdo
W / prodo
W
wyfeG
wyffeG
MS B
wyfeW
MS A
33
A. Comparison at linguistic layer
● From Grapheme to Word: semi-automatic– Not computable so far
– Linguistic competence required● Letter → Word: omographs
estG
→ estL → est (she is)
W / est (she
eats)W
wyfeG
wyffeG
MS B
wyfeW
MS A
34
A. Comparison at linguistic layer
● From Grapheme to Word: semi-automatic– Not computable so far
– Linguistic competence required● Letter → Word: different spellings
wyfew / wyffe
W →
wyfe
w / wyffe
W → wyfe
W
wyfeG
wyffeG
MS B
wyfeW
MS A
35
A. Comparison at linguistic layer
● From Grapheme to Word: semi-automatic– Not computable so far
– Linguistic competence required
– 'Context': interpretation of the whole text● Linguistic and higher-level competence
(Jon / Ion)
wyfeG
wyffeG
MS B
wyfeW
MS A
36
A. Comparison at linguistic layer
● From Grapheme to Word: semi-automatic– Semi-automatic procedures
(human-driven, computer-assisted)
– "The computer collation program we are using (Collate) permits regularization as part of the collation process” [...] “regularized spelling" (Robinson-Solopova →)
wyfeG
wyffeG
MS B
wyfeW
MS A
37
A. Comparison at linguistic layer
● How do you identify elements at linguistic layer (Inflected words) digitally?
– The Canterbury Tales Project:“Regularized spelling”
– Tito Orlandi:Linguistic entities
wyfeG
wyffeG
MS B
wyfeW
MS A
1000101C
0010100C
38
A. Comparison at linguistic layer
● Canterbury: “Regularized spelling”
wyfeG
wyffeG
MS B
wyfeW
MS A
wG_reg
yG_reg
fG_reg
eG_reg
wL y
L fL
eL
U0077C
U0079C
U0066C U0065
C
39
A. Comparison at linguistic layer
● Orlandi: Discrete linguistic entities in a database
wyfeG
wyffeG
MS B
wyfeW
MS A
ID_09834C
40
A. Comparison at linguistic layer
● Orlandi: Discrete linguistic entities in a database → How do you identify them?
wyfeG
wyffeG
MS B
wyfeW
MS A
Lemma [wyfe]Morph [sing]
C
41
A. Comparison at linguistic layer
● Orlandi: Discrete linguistic entities in a database → How do you identify them?
deorumG
devmG
MS B
deorumW
MS A
deumW
[deus1][gen.pl.masc]
C
[deus1][gen.pl.masc]
C
42
A. Comparison at linguistic layer
● Orlandi: Discrete linguistic entities in a database → How do you identify them?
deorumG
devmG
MS B
deorumW
MS A
deumW
[deus1][gen.pl.masc.1]
C
[deus1][gen.pl.masc.2]
C
43
A. Comparison at linguistic layer
● Orlandi: Discrete linguistic entities in a database → How do you identify them?
deorumG
devmG
MS B
deorumW
MS A
deumW
deumG_reg
→ U0064C U0065
C
U0075C U006D
C
[deus1][gen.pl.masc]
C
deorumG_reg
→ U0064C
U0065C
U006FC U0072
C U0075
C U006D
C
[deus1][gen.pl.masc]
C
44
A. Comparison at linguistic layer
● Orlandi/Monella: is it worth it?
deumG_reg
Canterbury
Orlandi/Monella
deumG_reg
[deus1] [gen.plur.masc]C
devmG
deumW
45
A. Comparison at linguistic layer
MS A] Pater devm MS B] Timeo devm
deumW
≠ deumW
deumG_reg
Canterbury, MS A
Orlandi/Monella, MS A
deumG_reg
[deus1] [gen.plur.masc]C
deumG_reg
Canterbury, MS B
Orlandi/Monella, MS B
deumG_reg
[deus1] [acc.sing.masc]C
≠
=
46
A. Comparison at linguistic layer
● I'ts only worth it
– if words are not their regularised spelling, i. e.
– if homograph words exist
deumW
≠ deumW
Pater deum Timeo deum
47
A. Comparison at linguistic layer
● I'ts only worth it
– if substantial readings are not their regularised spelling, i. e.
– if homograph substantial readings exist (do they? → Concept of “reading”)
Pater, Pater deum,devm Neptvnvm odi Neptunum odi
deum ≠ deum?
MS A MS B
48
A. Comparison at linguistic layer
Pater, Pater deum,devm Neptvnvm odi Neptunum odi
deum ≠ deum?
deumG_reg
Canterbury, MS A
Orlandi/Monella, MS A
deumG _reg
Canterbury, MS B
Orlandi/Monella, MS B
=
≠deum
G_reg
[deus1] [gen.plur.masc]C
deumG_reg
[deus1] [acc.sing.masc]C
MS A MS B
49
A. Comparison at linguistic layer
Pater, Pater deum,devm Neptvnvm odi Neptunum odi
deum = deum?
deumG _reg
Canterbury, MS A
Orlandi/Monella, MS A
deumG _reg
Canterbury, MS B
Orlandi/Monella, MS B
=
≠deum
G_reg
[deus1] [gen.plur.masc]C
deumG_reg
[deus1] [acc.sing.masc]C
MS A MS B
50
A. Comparison at linguistic layer
● Normally done via regularised spelling● Inflected words are not their regularised
spelling (omographs)● Are there omograph substantial readings
(deum)?● If so:
– No regularised spelling– But formal identifiers for inflected
words (spelling, lemma, identifier)
51
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer
52
B. Comparison at graphemic layer
● Why comparing spellings (at graphemic layer)?
– Historical Linguistics● Evidence for lexical, morphological and
phonetic evolution
53
B. Comparison at graphemic layer
● Why comparing spellings (at graphemic layer)?
– Palaeography● Indirect evidence for letters (Benskin: letter
þ thorn →)– Overlapping with Historical Linguistics
(Emiliano: “Scripto-linguistic change” →)● Though only graphemes
– Not allographs (graphetes, s/ſ) or graphs (bitmap)
54
B. Comparison at graphemic layer
● Why comparing spellings (at graphemic layer)?
– Textual Criticism● 'Orthographic' apparatus (The Hengwrt
Chaucer Digital Facsimile, Collation function →)
● “Although for most manuscripts collation of the regularized text will produce sufficient information to place those manuscripts in genetic relation to one another [...]” (Robinson-Solopova →)
55
B. Comparison at graphemic layer
● The loue of love● Loue: “A hill or mountain” ≠ Love: “Love”● Middle English alphabet: u ≠ v
aL
...u
L
vL
...
56
B. Comparison at graphemic layer
The loue of love The loue of loueMS BMS A
aL
uL
vL
57
B. Comparison at graphemic layer
The loue of loveG
The loue of loueG
MS BMS A
aL
uL
vL
58
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG
59
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● Back in the Tower
aL
uL
vL
aG
uG
vG
aG
uG
60
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● Back in the Tower– “Grapheme. (1) A minimally distinctive unit
of writing in the context of a particular writing system” (Unicode Glossary →).
aL
uL
vL
aG
uG
vG
aG
uG
61
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● Back in the Tower– De Saussure: relational nature of signs
within a semiotic system
aL
uL
vL
aG
uG
vG
aG
uG
62
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● Back in the Tower– De Saussure: relational nature of signs
within a semiotic system
– This is a different grapheme than this,
as the former is in contrast with this, while the latter is not
aL
uL
vL
aG
uG
vG
aG
uG
63
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● Back in the Tower– De Saussure: relational nature of signs
within a semiotic system
– This is a different grapheme than this,
as the former is in contrast with this, while the latter is not
aL
uL
vL
aG
uG
vG
aG
uG
64
aL
uL
vL
aG
uG
vG
aG
uG
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● Back in the Tower– De Saussure: relational nature of signs
within a semiotic system
– This is a different grapheme than this,
as the former is in contrast with this, while the latter is not
65
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● How do you encode graphemes digitally?
aL
uL
vL
aG
uG
vG
aG
uG
66
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● TEI– Unicode→ →
● The Canterbury Tales Project– Corpus-wide definition of the graphemic
system →
aL
uL
vL
aG
uG
vG
aG
uG
67
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG
U0075C
U0076C
U0075C
U0075C
68
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG
U0075C
U0076C
U0075C
U0075C
No match:different graphemes
69
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG
U0075C
U0076C
U0075C
U0075C
Match:same grapheme
70
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
● Orlandi– MS-wide complete definition of graphemic
system● Not corpus-wide (Canterbury)● Not world-wide (Unicode)
aL
uL
vL
aG
uG
vG
aG
uG
71
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
<teiHeader>
<encodingDesc>
<charDecl>
<char xml:id="uv"><glyph xml:id="long_s">
aL
uL
vL
aG
uG
vG
aG
uG
Graphemes
Allographs
72
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
<body>
<p>
<g ref="uv" />
<! or, better: >
<!ENTITY uv '<g ref="#uv" / >
&uv;
aL
uL
vL
aG
uG
vG
aG
uG
73
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG u
G
uL
vL
74
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG u
G
uL
vL
U0076C
&uvC
&uvC
<charDecl> <char xml:id= "a"> <char xml:id= "uv">
<charDecl> <char xml:id= "a"> <char xml:id= "u"> <char xml:id= "v">
MS B
U0075C
MS A
75
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG u
G
uL
vL
<charDecl> <char xml:id= "uv"> <charName>SMALL LATIN U OR V</charName> <desc>Ushaped when lowercase, Vshaped when uppercase. Content: either small letter Latin u or small letter Latin v</desc>
MS B
76
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG u
G
uL
vL
<charDecl> <char xml:id= "uv"> <charProp>
<localName>Expression</localName><value>U+0075</value><localName>Content</localName><value>u|v</value>
MS B
77
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
uG u
G
uL
vL
78
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
nG
nL
uL
uG
ıı
aL
uL
vL
aG
uG
vG
aG
uG
Two minims, lowercase
Expression (shape): indistinguishable. Content (letter): n or u.
The reader does not identify the grapheme from its shape, but guessing its content.
Graphematic information is not conveyed by graphic information, but by linguistic information (context), so the scribe was confident in our Linguistic competence to tell the graphemes apart.
79
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
u-uG
u-vG u-u
G
uL
vL
u-vG
u
80
<charDecl> <char xml:id= "a"> <char xml:id= "u"> <char xml:id= "v">
<charDecl> <char xml:id= "a"> <char xml:id= "uu"> <char xml:id= "uv">
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
aG
uG
vG
aG
u-uG
u-vG u-u
G
uL
vL
u-vG
MS A MS B
U0076C
&u-uC
&u-vC
U0075C
81
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
U0076C
&u-uC
&u-vC
U0075C
aL
uL
vL
aG
uG
vG
aG
u-uG
u-vG
● Fun● But how do you compare graphemes now?
82
<charDecl> <char xml:id= "a"> <char xml:id= "u"> <char xml:id= "v">
<charDecl> <char xml:id= "a"> <char xml:id= "uu"> <char xml:id= "uv">
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
MS A MS B
U0076C
&u-uC
&u-vC
U0075C
83
<charDecl> <char xml:id= "a"> <char xml:id= "u"> <char xml:id= "v">
<charDecl> <char xml:id= "a"> <char xml:id= "uu"> <char xml:id= "uv">
B. Comparison at graphemic layer
The louGe of lov
Ge The lou
Ge of lou
Ge
MS BMS A
aL
uL
vL
MS A MS B
U0076C
&u-uC
&u-vC
U0075C
RDF/OWL?(Or other
relationships-definingtechnology)
84
B. Comparison at graphemic layera
L
hL
tL
aG
hG
tG
● When the game gets tough...– Different alphabets
(Middle English, thorn letter)
aG
hG
tG
þG
aL
hL
tL
þL
85
B. Comparison at graphemic layera
L
hL
tL
aG
hG
tG
● When the game gets tough...– Different alphabets, shared phonetics
(Middle English, thorn letter)
aG
hG
tG
þG
aL
hL
tL
þL
aP
hP
tP
þP
86
aG
pG
eG
rG
aG
pG
eG
rG
ꝑ̱G̱
B. Comparison at graphemic layer
● When the game gets tougher...– No 1:1 grapheme / letter ratio (Huitfeldt →)
● Brevigraphs as graphemes (up to 40%)● “Grapheme. (1) A minimally distinctive unit
of writing in the context of a particular writing system” (Unicode Glossary →).
aL
pL
eL
rL
87
aG
pG
eG
rG
aG
pG
eG
rG
ꝑ̱G̱
B. Comparison at graphemic layer
● When the game gets tougher...– No 1:1 grapheme / letter ratio (Huitfeldt →)
● Brevigraphs as graphemes (up to 40%)● “Grapheme. (1) A minimally distinctive unit
of writing in the context of a particular writing system” (Unicode Glossary →).
aL
pL
eL
rL
88
B. Comparison at graphemic layera
L
uL
vL
aG
uG
vG
● When the game gets most tough...– (Classicists start to play)
– Different alphabets, different phonetics (Ancient Latin)
aP
uP w
P
aP
uP w
P
vP
aG
vG
aL
vL
89
B. Comparison at graphemic layera
L
uL
vL
aG
uG
vG
● When the game gets most tough...– (Classicists start to play)
– Different alphabets, different phonetics (Ancient Latin)
– Still possible to formalise relationships
aG
vG
aL
vL
90
B. Comparison at graphemic layer
● Description of graphemic system– MS-wide
– Complete
● Formalisation of relationships between graphemes in different graphemic systems
– Involving higher-level entities (letters, phonemes)
● Intelligent searches, intelligent results
91
Outline
● The 'Babel issue'– The 'Babel issue' in a nutshell
– Comparing texts
– In the Tower
● Two issues
A. Comparison at the linguistic layer
B. Comparison at the graphemic layer