21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of...
Preview:
Citation preview
- Slide 1
- 21st Century Classics Gregory Crane Professor and Chair
Department of Classics Adjunct Professor of Computer Science
Winnick Family Chair of Technology and Entrepreneurship
- Slide 2
- Changing Face of Education
- Slide 3
- The Gauntlet
- Slide 4
- Science Majors The Classes have grown more challenging and
demanding
- Slide 5
- Science Majors The Classes have grown more challenging and
demanding Education has focused upon recognizable problems of
various levels
- Slide 6
- Science Majors The Classes have grown more challenging and
demanding Education has focused upon recognizable problems of
various levels Concrete methods for contribution at multiple
levels
- Slide 7
- Laboratory Culture Freshman Term Paper 2009 Linguistic Patterns
in Cambridge, MA 1930 Atlas of the City Interviews digitally
recorded in 2009 Phonetic patterns analyzed with open source
software Study of linguistic change over 80 years Production of new
knowledge
- Slide 8
- The world is already digital
- Slide 9
- 2009 Tenure Track Job Candidates must have a strong record of
scholarship and teaching and a commitment to expand the use of
Greek and Latin within Tufts and beyond. Candidates should be
comfortable teaching classes at any level in either language. The
department seeks a candidate who can advance the study of classics
within an interdisciplinary context. We especially welcome
candidates who can support contributions to and original research
by undergraduates as well as MA students within the field of
Classics. Where are the digital humanities in this job
description?
- Slide 10
- 2009 Tenure Track Job Candidates must have a strong record of
scholarship and teaching and a commitment to expand the use of
Greek and Latin within Tufts and beyond. Candidates should be
comfortable teaching classes at any level in either language. The
department seeks a candidate who can advance the study of classics
within an interdisciplinary context. We especially welcome
candidates who can support contributions to and original research
by undergraduates as well as MA students within the field of
Classics. Where are the digital humanities in this job
description?
- Slide 11
- Changing scales of research
- Slide 12
- Depth Quality Scale Breadth
- Slide 13
- Quality
- Slide 14
- Depth
- Slide 15
- Machine actionable interpretation
- Slide 16
- Scale
- Slide 17
- How much Latin? Classical Latin (200BCE - 500 CE) PHI Disk c. 5
millon (through c. 200CE) Total corpus c. 50 million (pby less)
Current working collection 9,000 books from 27,000 Latin books
dated 380 million words that really are Latin Total corpus of Latin
through 1800 Billions of words
- Slide 18
- A Classic Lexicon Project
- Slide 19
- 1894 -- work begins
- Slide 20
- A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context
- Slide 21
- A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status
- Slide 22
- A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work
- Slide 23
- A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete
- Slide 24
- A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete 2050? -- completion of the project
- Slide 25
- A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete 2050? -- completion of the project What
do we do with a billion words?
- Slide 26
- A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete 2050? -- completion of the project What
do we do with a billion words? 10 billion words?
- Slide 27
- A Scalable Lexicon Project
- Slide 28
- Breadth
- Slide 29
- Time
- Slide 30
- Space
- Slide 31
- Digital Humanities balance two forces Absolute necessity to
work with far more content than we can ever read and far more
languages than we could ever learn.
- Slide 32
- Digital Humanities balance two forces Absolute necessity to
work with far more content than we can ever read and far more
languages than we could ever learn. The need to read slowly and to
think about every word and phrase from every angle.
- Slide 33
- Philological Reading Philology is that venerable art which
requires of those who honor her one thing above all: to turn aside,
to take one's time, to become still and slow.... Precisely for this
reason, she is more necessary today than ever, precisely on this
account, she attracts and enchants us most powerfully, in an age of
"work," which is to say, haste, the unseemly and sweating hurry
that wants to be "done" with everything right away, even with every
old and new book. She herself will not so easily be done with
anything, she instructs reading well, that means, slowly, deeply,
carefully, regardfully, looking forward and backward, with second
thoughts, with doors left open, reading with delicate fingers and
eyes.... F. Nietzsche, Morgenrte (1881)
- Slide 34
- One answer The re-emergence of editing as a primary activity
The definition tasks that have tangible value in the real world and
that begin to be accessible at an early stage
- Slide 35
- One answer The re-emergence of editing as a primary activity
The definition tasks that have tangible value in the real world and
that begin to be accessible at an early stage Example: the
commented edition and translation as undergraduate thesis
- Slide 36
- Venetus A MS of Homer
- Slide 37
- Diplomatic Edition by a class
- Slide 38
- What are new elements in editing?
- Slide 39
- Annotations not predicated on error
- Slide 40
- Editions --> Visualization: Perseus Herodotus --> Hestia
Proj.
- Slide 41
- Syntactic Analysis (Treebanks) (Homer, Il. 6.1)
- Slide 42
- Machine actionable interpretation
- Slide 43
- Slide 44
- Iliad 6 Treebank by a class
- Slide 45
- Another answer
- Slide 46
- Expository narrative Machine actionable annotations
- Slide 47
- 21st Century Classics
- Slide 48
- Who represents Greece and Rome?
- Slide 49
- Slide 50
- Slide 51
- Slide 52
- From Rabat to Kandahar
- Slide 53
- Who founded Kandahar and what was its original name?
- Slide 54
- Alexander the Great Alexandria
- Slide 55
- Who was the most important classicist of the 20th century?
- Slide 56
- Sometimes political philosophers do have an impact.. Platos
Republic and the Guardians The Islamic Republic of Iran and the
Guardianship of Islamic Jurists
- Slide 57
- How would you go about studying the impact of Plato in Islamic
thought?
- Slide 58
- Slide 59
- Slide 60
- Classics at the U of C
- Slide 61
- Particular emphasis on The School of Alexandria and its
influence The Translation Movement from Greek into Syriac and
Arabic The Relations of the Ancient Arabs and the Greco- Roman
World The Translation of Arabic into Latin and its effect upon the
literary Renaissance
- Slide 62
- Classicists
- Slide 63
- Hisham and Farouk at Furman
- Slide 64
- Where is the English?
- Slide 65
- Slide 66
- Goals Learner corpora -- how much have you mastered? How much
can you transfer to new material? Customized assessment of
corpus/competence User portfolios Aggregation of increasingly
sophisticated contributions Undergraduate research projects
Automatically linked to relevant texts, sites, objects
- Slide 67
- Goals for 2010/2011 Canonical Text Services Protocol middleware
for DuraCloud Open Greek and Latin exams for students in the
English speaking world based upon student defined corpora.
- Slide 68
- Thank you!
- Slide 69
- Categories of Development Transform existing research
Integrated Papyri, Homer Multitext Enable new areas of research
More people using papyrological data Physical access -- done
Intellectual access -- can be addressed
- Slide 70
- Transforming Classics Enhancing what scholars can do
- Slide 71
- Transforming Classics Enhancing what scholars can do Lowering
barriers to entry
- Slide 72
- Transforming Classics Enhancing what scholars can do Lowering
barriers to entry Developing a global, multilingual, multiethnic
intellectual community
- Slide 73
- Transforming Classics Enhancing what scholars can do Lowering
barriers to entry Developing a global, multilingual, multiethnic
intellectual community
- Slide 74
- Funded Projects Greek and Latin Treebanks (Cantus) Greco-Arabic
(Mellon) Mining a Million Books (NSF) Digging into Data
(NEH/JISC/SSHRC) Google Digital Humanities Hellespont: Arachne and
Perseus -- DFG/NEH
- Slide 75
- What can you do?
- Slide 76
- Build up a portfolio of what Greek and/or Latin you have
mastered
- Slide 77
- What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin
- Slide 78
- What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin Look for ways to make a tangible
contribution
- Slide 79
- What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin Look for ways to make a tangible
contribution Treebank -- how many sentences?
- Slide 80
- What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin Look for ways to make a tangible
contribution Treebank -- how many sentences? XML tagging? GIS
analysis?
- Slide 81
- What can you do? Think about an MA thesis that is a publishable
contribution.
- Slide 82
- What can you do? Think about an MA thesis that is a publishable
contribution. Publish an inscription, a medieval text, a canonical
work
- Slide 83
- What can you do? Think about an MA thesis that is a publishable
contribution. Publish an inscription, a medieval text, a canonical
work Analyze some data about a word, a text, a site, a topic
- Slide 84
- What can you do? Think about an MA thesis that is a publishable
contribution. Publish an inscription, a medieval text, a canonical
work Analyze some data about a word, a text, a site, a topic Do
something!
- Slide 85
- Good luck!
- Slide 86
- Treebanks and Parallel Text Analysis David Bamman The Perseus
Project
- Slide 87
- Parallel Text Analysis Driven in large part by statistical MT
for modern languages (French/English, German/English,
Arabic/English etc). Parliamentary proceedings (Canadian Hansards,
Europarl, UN) Legal/government docs (JRC Acquis) Historical texts
have often been translated many times into several different
languages. Perseus: 4.9M Greek/6.8M English; 3.4M Latin/5M
English.
- Slide 88
- Parallel Texts The Internet Archive alone contains editions of
Horaces Odes in eight different languages Latin: carpe diem quam
minimum credula postero (Horace, Ode 1.11) English: Seize the
present; trust tomorrow een as little as you may (Conington 1872)
French: Cueille le jour, et ne crois pas au lendemain (De Lisle
1887) Early Modern French: Jouissez donc en repos du jour present,
& ne vous attendez point au lendemain (Dacier 1681) Italian: tu
loggi goditi: e gli stolti al domani saffidino (Chiarini 1916)
Spanish: Coge este dia, dando muy poco credito al siguiente (Campos
and Minguez 1783) Portuguese: colhe o dia, do de amanh a mui pouco
confiando (Duriense 1807) German: Pflucke des Tags Blten, und nie
traue dem morgenden (Schmidt 1820)
- Slide 89
- Dynamic Lexicon http://nlp.perseus.tufts.edu/lexicon
- Slide 90
- Sense Discovery SMT based on Brown et al (1990) Different
senses for a word in one language are translated by different words
in another. Bank (English) financial institution = French banque
side of a river = French rive (e.g., la rive gauche)
- Slide 91
- Progressive Alignment Sentence level: Moores Bilingual Sentence
Aligner (Moore 2002) aligns sentences that are 1-1 translations of
each other w/ high precision (98.5% on a corpus of 10K
English-Hindi sentences) Word level: MGIZA++ (Gao and Vogel 2008)
parallel version of: GIZA++ (Och and Ney 2003) - implementation of
IBM Models 1-5.
- Slide 92
- Tufts cluster 40 nodes, each w/ two 2.83 Ghz Quad-Core Xeon
processors (= 320 cores) Impact Two 1M word alignments
(English->Greek, Greek-> English) on single 2 Ghz Mac Pro: 15
hours Two (simultaneous) 5M word alignments on computing cluster
using multi-threaded version (i.e., on one 8-core node): 45
minutes.
- Slide 93
- Multilingual Alignment Word-level alignment of Homers
Odyssey
- Slide 94
- Latin/Greek English Senses
- Slide 95
- English Greek/Latin Senses
- Slide 96
- Use #1: Automatic Bilingual Dictionaries
http://nlp.perseus.tufts.edu/lexicon
- Slide 97
- 97 Use #2: Interlinear translations
- Slide 98
- 98 Use #2: Interlinear translations
- Slide 99
- Use #3: Bootstrapping Multilingual Digital Library
http://www.perseus.tufts.edu
- Slide 100
- Multilingual Digital Libraries http://www.worldofdante.org
- Slide 101
- TEI XML Gallos ab Aquitanis Garumna flumen, a Belgis Matrona et
Sequana dividit. Horum omnium fortissimi sunt (The Garonne river
separates the Gauls from the Aquitani and the Marne and the Seine
(rivers) separate them from the Belgae. The bravest of all of these
are )
- Slide 102
- Solution: Markup Transfer + 1.Alignment of the source document
with the target document in a cascading process: document ->
sentence -> word 2.Projection of XML tags in the source document
to the target document in way that exploits the linguistic
similarity of the text pair.
- Slide 103
- Bootstrapping a Multilingual DL Expands depth of translations
in a collection to expand the reach of inquiry.
- Slide 104
- Treebanks Annotated corpora where the syntactic role and head
of each word in a sentence is made explicit.
- Slide 105
- Historical treebanks Most recent research and investment in
treebanks has focused on modern languages, but treebanks for
historical languages are now arising as well: Middle English (Kroch
and Taylor 2000) Medieval Portuguese (Rocio et al. 2000) Classical
Chinese (Huang et al. 2002) Old English (Taylor et al. 2003) Early
Modern English (Kroch et al. 2004) Latin (Bamman and Crane 2006,
Passarotti 2007) Ugaritic (Zemnek 2007) New Testament Greek, Latin,
Gothic, Armenian, Church Slavonic (Haug and Jhndal 2008)
- Slide 106
- Prague Arabic Dependency Treebank
- Slide 107
- Latin Dependency Treebank AuthorWords Caesar1,488 Cicero6,229
Sallust12,311 Vergil2,613 Jerome8,382 Ovid4,789 Petronius12,474
Propertius4,857 Total53,143
- Slide 108
- Ancient Greek Dependency Treebank WorkWords Aeschylus
(complete)48,158 Hesiod, Works and Days6,303 Homer, Iliad38,390
Homer, Odyssey99,353 Total192,204
- Slide 109
- Building Treebanks Solicit annotations from two independent
annotators; reconcile differences between them. Background: ranges
from advanced undergraduates to PhD and professors, with the
majority being students in graduate programs in Classics. Average
speed: 124 words per hour. Interannotator accuracy: attachment
(ATT), label (LAB), labeled attachment (LABATT): ATTLABLABATT
Hesiod, W&D85.1%85.9%79.5% Homer, Iliad87.1%83.2%79.3% Homer,
Odyssey87.5%85.7%80.9% Total87.4%85.3%80.6%
- Slide 110
- Student Contributions...
- Slide 111
- Syntax in the Dynamic Lexicon
- Slide 112
- URLs Treebank data
http://nlp.perseus.tufts.edu/syntax/treebank/ Treebank annotation
environment http://nlp.perseus.tufts.edu/hopper/ Translation
information http://nlp.perseus.tufts.edu/hopper/sense.jsp Greek
lexicon http://nlp.perseus.tufts.edu/lexicon/