21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of...
112
21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology and Entrepreneurship
21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology
21st Century Classics Gregory Crane Professor and Chair
Department of Classics Adjunct Professor of Computer Science
Winnick Family Chair of Technology and Entrepreneurship
Slide 2
Changing Face of Education
Slide 3
The Gauntlet
Slide 4
Science Majors The Classes have grown more challenging and
demanding
Slide 5
Science Majors The Classes have grown more challenging and
demanding Education has focused upon recognizable problems of
various levels
Slide 6
Science Majors The Classes have grown more challenging and
demanding Education has focused upon recognizable problems of
various levels Concrete methods for contribution at multiple
levels
Slide 7
Laboratory Culture Freshman Term Paper 2009 Linguistic Patterns
in Cambridge, MA 1930 Atlas of the City Interviews digitally
recorded in 2009 Phonetic patterns analyzed with open source
software Study of linguistic change over 80 years Production of new
knowledge
Slide 8
The world is already digital
Slide 9
2009 Tenure Track Job Candidates must have a strong record of
scholarship and teaching and a commitment to expand the use of
Greek and Latin within Tufts and beyond. Candidates should be
comfortable teaching classes at any level in either language. The
department seeks a candidate who can advance the study of classics
within an interdisciplinary context. We especially welcome
candidates who can support contributions to and original research
by undergraduates as well as MA students within the field of
Classics. Where are the digital humanities in this job
description?
Slide 10
2009 Tenure Track Job Candidates must have a strong record of
scholarship and teaching and a commitment to expand the use of
Greek and Latin within Tufts and beyond. Candidates should be
comfortable teaching classes at any level in either language. The
department seeks a candidate who can advance the study of classics
within an interdisciplinary context. We especially welcome
candidates who can support contributions to and original research
by undergraduates as well as MA students within the field of
Classics. Where are the digital humanities in this job
description?
Slide 11
Changing scales of research
Slide 12
Depth Quality Scale Breadth
Slide 13
Quality
Slide 14
Depth
Slide 15
Machine actionable interpretation
Slide 16
Scale
Slide 17
How much Latin? Classical Latin (200BCE - 500 CE) PHI Disk c. 5
millon (through c. 200CE) Total corpus c. 50 million (pby less)
Current working collection 9,000 books from 27,000 Latin books
dated 380 million words that really are Latin Total corpus of Latin
through 1800 Billions of words
Slide 18
A Classic Lexicon Project
Slide 19
1894 -- work begins
Slide 20
A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context
Slide 21
A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status
Slide 22
A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work
Slide 23
A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete
Slide 24
A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete 2050? -- completion of the project
Slide 25
A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete 2050? -- completion of the project What
do we do with a billion words?
Slide 26
A Classic Lexicon Project 1894 -- work begins 10 million slips
with keyword in context 2010 -- current status 20 FTE at work C.
67% of the lexicon complete 2050? -- completion of the project What
do we do with a billion words? 10 billion words?
Slide 27
A Scalable Lexicon Project
Slide 28
Breadth
Slide 29
Time
Slide 30
Space
Slide 31
Digital Humanities balance two forces Absolute necessity to
work with far more content than we can ever read and far more
languages than we could ever learn.
Slide 32
Digital Humanities balance two forces Absolute necessity to
work with far more content than we can ever read and far more
languages than we could ever learn. The need to read slowly and to
think about every word and phrase from every angle.
Slide 33
Philological Reading Philology is that venerable art which
requires of those who honor her one thing above all: to turn aside,
to take one's time, to become still and slow.... Precisely for this
reason, she is more necessary today than ever, precisely on this
account, she attracts and enchants us most powerfully, in an age of
"work," which is to say, haste, the unseemly and sweating hurry
that wants to be "done" with everything right away, even with every
old and new book. She herself will not so easily be done with
anything, she instructs reading well, that means, slowly, deeply,
carefully, regardfully, looking forward and backward, with second
thoughts, with doors left open, reading with delicate fingers and
eyes.... F. Nietzsche, Morgenrte (1881)
Slide 34
One answer The re-emergence of editing as a primary activity
The definition tasks that have tangible value in the real world and
that begin to be accessible at an early stage
Slide 35
One answer The re-emergence of editing as a primary activity
The definition tasks that have tangible value in the real world and
that begin to be accessible at an early stage Example: the
commented edition and translation as undergraduate thesis
Who founded Kandahar and what was its original name?
Slide 54
Alexander the Great Alexandria
Slide 55
Who was the most important classicist of the 20th century?
Slide 56
Sometimes political philosophers do have an impact.. Platos
Republic and the Guardians The Islamic Republic of Iran and the
Guardianship of Islamic Jurists
Slide 57
How would you go about studying the impact of Plato in Islamic
thought?
Slide 58
Slide 59
Slide 60
Classics at the U of C
Slide 61
Particular emphasis on The School of Alexandria and its
influence The Translation Movement from Greek into Syriac and
Arabic The Relations of the Ancient Arabs and the Greco- Roman
World The Translation of Arabic into Latin and its effect upon the
literary Renaissance
Slide 62
Classicists
Slide 63
Hisham and Farouk at Furman
Slide 64
Where is the English?
Slide 65
Slide 66
Goals Learner corpora -- how much have you mastered? How much
can you transfer to new material? Customized assessment of
corpus/competence User portfolios Aggregation of increasingly
sophisticated contributions Undergraduate research projects
Automatically linked to relevant texts, sites, objects
Slide 67
Goals for 2010/2011 Canonical Text Services Protocol middleware
for DuraCloud Open Greek and Latin exams for students in the
English speaking world based upon student defined corpora.
Slide 68
Thank you!
Slide 69
Categories of Development Transform existing research
Integrated Papyri, Homer Multitext Enable new areas of research
More people using papyrological data Physical access -- done
Intellectual access -- can be addressed
Slide 70
Transforming Classics Enhancing what scholars can do
Slide 71
Transforming Classics Enhancing what scholars can do Lowering
barriers to entry
Slide 72
Transforming Classics Enhancing what scholars can do Lowering
barriers to entry Developing a global, multilingual, multiethnic
intellectual community
Slide 73
Transforming Classics Enhancing what scholars can do Lowering
barriers to entry Developing a global, multilingual, multiethnic
intellectual community
Slide 74
Funded Projects Greek and Latin Treebanks (Cantus) Greco-Arabic
(Mellon) Mining a Million Books (NSF) Digging into Data
(NEH/JISC/SSHRC) Google Digital Humanities Hellespont: Arachne and
Perseus -- DFG/NEH
Slide 75
What can you do?
Slide 76
Build up a portfolio of what Greek and/or Latin you have
mastered
Slide 77
What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin
Slide 78
What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin Look for ways to make a tangible
contribution
Slide 79
What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin Look for ways to make a tangible
contribution Treebank -- how many sentences?
Slide 80
What can you do? Build up a portfolio of what Greek and/or
Latin you have mastered Ask for an evaluation of your knowledge of
this corpus and of Greek and Latin Look for ways to make a tangible
contribution Treebank -- how many sentences? XML tagging? GIS
analysis?
Slide 81
What can you do? Think about an MA thesis that is a publishable
contribution.
Slide 82
What can you do? Think about an MA thesis that is a publishable
contribution. Publish an inscription, a medieval text, a canonical
work
Slide 83
What can you do? Think about an MA thesis that is a publishable
contribution. Publish an inscription, a medieval text, a canonical
work Analyze some data about a word, a text, a site, a topic
Slide 84
What can you do? Think about an MA thesis that is a publishable
contribution. Publish an inscription, a medieval text, a canonical
work Analyze some data about a word, a text, a site, a topic Do
something!
Slide 85
Good luck!
Slide 86
Treebanks and Parallel Text Analysis David Bamman The Perseus
Project
Slide 87
Parallel Text Analysis Driven in large part by statistical MT
for modern languages (French/English, German/English,
Arabic/English etc). Parliamentary proceedings (Canadian Hansards,
Europarl, UN) Legal/government docs (JRC Acquis) Historical texts
have often been translated many times into several different
languages. Perseus: 4.9M Greek/6.8M English; 3.4M Latin/5M
English.
Slide 88
Parallel Texts The Internet Archive alone contains editions of
Horaces Odes in eight different languages Latin: carpe diem quam
minimum credula postero (Horace, Ode 1.11) English: Seize the
present; trust tomorrow een as little as you may (Conington 1872)
French: Cueille le jour, et ne crois pas au lendemain (De Lisle
1887) Early Modern French: Jouissez donc en repos du jour present,
& ne vous attendez point au lendemain (Dacier 1681) Italian: tu
loggi goditi: e gli stolti al domani saffidino (Chiarini 1916)
Spanish: Coge este dia, dando muy poco credito al siguiente (Campos
and Minguez 1783) Portuguese: colhe o dia, do de amanh a mui pouco
confiando (Duriense 1807) German: Pflucke des Tags Blten, und nie
traue dem morgenden (Schmidt 1820)
Sense Discovery SMT based on Brown et al (1990) Different
senses for a word in one language are translated by different words
in another. Bank (English) financial institution = French banque
side of a river = French rive (e.g., la rive gauche)
Slide 91
Progressive Alignment Sentence level: Moores Bilingual Sentence
Aligner (Moore 2002) aligns sentences that are 1-1 translations of
each other w/ high precision (98.5% on a corpus of 10K
English-Hindi sentences) Word level: MGIZA++ (Gao and Vogel 2008)
parallel version of: GIZA++ (Och and Ney 2003) - implementation of
IBM Models 1-5.
Slide 92
Tufts cluster 40 nodes, each w/ two 2.83 Ghz Quad-Core Xeon
processors (= 320 cores) Impact Two 1M word alignments
(English->Greek, Greek-> English) on single 2 Ghz Mac Pro: 15
hours Two (simultaneous) 5M word alignments on computing cluster
using multi-threaded version (i.e., on one 8-core node): 45
minutes.
Slide 93
Multilingual Alignment Word-level alignment of Homers
Odyssey
Slide 94
Latin/Greek English Senses
Slide 95
English Greek/Latin Senses
Slide 96
Use #1: Automatic Bilingual Dictionaries
http://nlp.perseus.tufts.edu/lexicon
Slide 97
97 Use #2: Interlinear translations
Slide 98
98 Use #2: Interlinear translations
Slide 99
Use #3: Bootstrapping Multilingual Digital Library
http://www.perseus.tufts.edu
Slide 100
Multilingual Digital Libraries http://www.worldofdante.org
Slide 101
TEI XML Gallos ab Aquitanis Garumna flumen, a Belgis Matrona et
Sequana dividit. Horum omnium fortissimi sunt (The Garonne river
separates the Gauls from the Aquitani and the Marne and the Seine
(rivers) separate them from the Belgae. The bravest of all of these
are )
Slide 102
Solution: Markup Transfer + 1.Alignment of the source document
with the target document in a cascading process: document ->
sentence -> word 2.Projection of XML tags in the source document
to the target document in way that exploits the linguistic
similarity of the text pair.
Slide 103
Bootstrapping a Multilingual DL Expands depth of translations
in a collection to expand the reach of inquiry.
Slide 104
Treebanks Annotated corpora where the syntactic role and head
of each word in a sentence is made explicit.
Slide 105
Historical treebanks Most recent research and investment in
treebanks has focused on modern languages, but treebanks for
historical languages are now arising as well: Middle English (Kroch
and Taylor 2000) Medieval Portuguese (Rocio et al. 2000) Classical
Chinese (Huang et al. 2002) Old English (Taylor et al. 2003) Early
Modern English (Kroch et al. 2004) Latin (Bamman and Crane 2006,
Passarotti 2007) Ugaritic (Zemnek 2007) New Testament Greek, Latin,
Gothic, Armenian, Church Slavonic (Haug and Jhndal 2008)
Ancient Greek Dependency Treebank WorkWords Aeschylus
(complete)48,158 Hesiod, Works and Days6,303 Homer, Iliad38,390
Homer, Odyssey99,353 Total192,204
Slide 109
Building Treebanks Solicit annotations from two independent
annotators; reconcile differences between them. Background: ranges
from advanced undergraduates to PhD and professors, with the
majority being students in graduate programs in Classics. Average
speed: 124 words per hour. Interannotator accuracy: attachment
(ATT), label (LAB), labeled attachment (LABATT): ATTLABLABATT
Hesiod, W&D85.1%85.9%79.5% Homer, Iliad87.1%83.2%79.3% Homer,
Odyssey87.5%85.7%80.9% Total87.4%85.3%80.6%
Slide 110
Student Contributions...
Slide 111
Syntax in the Dynamic Lexicon
Slide 112
URLs Treebank data
http://nlp.perseus.tufts.edu/syntax/treebank/ Treebank annotation
environment http://nlp.perseus.tufts.edu/hopper/ Translation
information http://nlp.perseus.tufts.edu/hopper/sense.jsp Greek
lexicon http://nlp.perseus.tufts.edu/lexicon/