112
21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology and Entrepreneurship

21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology

Embed Size (px)

Citation preview

  • Slide 1
  • 21st Century Classics Gregory Crane Professor and Chair Department of Classics Adjunct Professor of Computer Science Winnick Family Chair of Technology and Entrepreneurship
  • Slide 2
  • Changing Face of Education
  • Slide 3
  • The Gauntlet
  • Slide 4
  • Science Majors The Classes have grown more challenging and demanding
  • Slide 5
  • Science Majors The Classes have grown more challenging and demanding Education has focused upon recognizable problems of various levels
  • Slide 6
  • Science Majors The Classes have grown more challenging and demanding Education has focused upon recognizable problems of various levels Concrete methods for contribution at multiple levels
  • Slide 7
  • Laboratory Culture Freshman Term Paper 2009 Linguistic Patterns in Cambridge, MA 1930 Atlas of the City Interviews digitally recorded in 2009 Phonetic patterns analyzed with open source software Study of linguistic change over 80 years Production of new knowledge
  • Slide 8
  • The world is already digital
  • Slide 9
  • 2009 Tenure Track Job Candidates must have a strong record of scholarship and teaching and a commitment to expand the use of Greek and Latin within Tufts and beyond. Candidates should be comfortable teaching classes at any level in either language. The department seeks a candidate who can advance the study of classics within an interdisciplinary context. We especially welcome candidates who can support contributions to and original research by undergraduates as well as MA students within the field of Classics. Where are the digital humanities in this job description?
  • Slide 10
  • 2009 Tenure Track Job Candidates must have a strong record of scholarship and teaching and a commitment to expand the use of Greek and Latin within Tufts and beyond. Candidates should be comfortable teaching classes at any level in either language. The department seeks a candidate who can advance the study of classics within an interdisciplinary context. We especially welcome candidates who can support contributions to and original research by undergraduates as well as MA students within the field of Classics. Where are the digital humanities in this job description?
  • Slide 11
  • Changing scales of research
  • Slide 12
  • Depth Quality Scale Breadth
  • Slide 13
  • Quality
  • Slide 14
  • Depth
  • Slide 15
  • Machine actionable interpretation
  • Slide 16
  • Scale
  • Slide 17
  • How much Latin? Classical Latin (200BCE - 500 CE) PHI Disk c. 5 millon (through c. 200CE) Total corpus c. 50 million (pby less) Current working collection 9,000 books from 27,000 Latin books dated 380 million words that really are Latin Total corpus of Latin through 1800 Billions of words
  • Slide 18
  • A Classic Lexicon Project
  • Slide 19
  • 1894 -- work begins
  • Slide 20
  • A Classic Lexicon Project 1894 -- work begins 10 million slips with keyword in context
  • Slide 21
  • A Classic Lexicon Project 1894 -- work begins 10 million slips with keyword in context 2010 -- current status
  • Slide 22
  • A Classic Lexicon Project 1894 -- work begins 10 million slips with keyword in context 2010 -- current status 20 FTE at work
  • Slide 23
  • A Classic Lexicon Project 1894 -- work begins 10 million slips with keyword in context 2010 -- current status 20 FTE at work C. 67% of the lexicon complete
  • Slide 24
  • A Classic Lexicon Project 1894 -- work begins 10 million slips with keyword in context 2010 -- current status 20 FTE at work C. 67% of the lexicon complete 2050? -- completion of the project
  • Slide 25
  • A Classic Lexicon Project 1894 -- work begins 10 million slips with keyword in context 2010 -- current status 20 FTE at work C. 67% of the lexicon complete 2050? -- completion of the project What do we do with a billion words?
  • Slide 26
  • A Classic Lexicon Project 1894 -- work begins 10 million slips with keyword in context 2010 -- current status 20 FTE at work C. 67% of the lexicon complete 2050? -- completion of the project What do we do with a billion words? 10 billion words?
  • Slide 27
  • A Scalable Lexicon Project
  • Slide 28
  • Breadth
  • Slide 29
  • Time
  • Slide 30
  • Space
  • Slide 31
  • Digital Humanities balance two forces Absolute necessity to work with far more content than we can ever read and far more languages than we could ever learn.
  • Slide 32
  • Digital Humanities balance two forces Absolute necessity to work with far more content than we can ever read and far more languages than we could ever learn. The need to read slowly and to think about every word and phrase from every angle.
  • Slide 33
  • Philological Reading Philology is that venerable art which requires of those who honor her one thing above all: to turn aside, to take one's time, to become still and slow.... Precisely for this reason, she is more necessary today than ever, precisely on this account, she attracts and enchants us most powerfully, in an age of "work," which is to say, haste, the unseemly and sweating hurry that wants to be "done" with everything right away, even with every old and new book. She herself will not so easily be done with anything, she instructs reading well, that means, slowly, deeply, carefully, regardfully, looking forward and backward, with second thoughts, with doors left open, reading with delicate fingers and eyes.... F. Nietzsche, Morgenrte (1881)
  • Slide 34
  • One answer The re-emergence of editing as a primary activity The definition tasks that have tangible value in the real world and that begin to be accessible at an early stage
  • Slide 35
  • One answer The re-emergence of editing as a primary activity The definition tasks that have tangible value in the real world and that begin to be accessible at an early stage Example: the commented edition and translation as undergraduate thesis
  • Slide 36
  • Venetus A MS of Homer
  • Slide 37
  • Diplomatic Edition by a class
  • Slide 38
  • What are new elements in editing?
  • Slide 39
  • Annotations not predicated on error
  • Slide 40
  • Editions --> Visualization: Perseus Herodotus --> Hestia Proj.
  • Slide 41
  • Syntactic Analysis (Treebanks) (Homer, Il. 6.1)
  • Slide 42
  • Machine actionable interpretation
  • Slide 43
  • Slide 44
  • Iliad 6 Treebank by a class
  • Slide 45
  • Another answer
  • Slide 46
  • Expository narrative Machine actionable annotations
  • Slide 47
  • 21st Century Classics
  • Slide 48
  • Who represents Greece and Rome?
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • From Rabat to Kandahar
  • Slide 53
  • Who founded Kandahar and what was its original name?
  • Slide 54
  • Alexander the Great Alexandria
  • Slide 55
  • Who was the most important classicist of the 20th century?
  • Slide 56
  • Sometimes political philosophers do have an impact.. Platos Republic and the Guardians The Islamic Republic of Iran and the Guardianship of Islamic Jurists
  • Slide 57
  • How would you go about studying the impact of Plato in Islamic thought?
  • Slide 58
  • Slide 59
  • Slide 60
  • Classics at the U of C
  • Slide 61
  • Particular emphasis on The School of Alexandria and its influence The Translation Movement from Greek into Syriac and Arabic The Relations of the Ancient Arabs and the Greco- Roman World The Translation of Arabic into Latin and its effect upon the literary Renaissance
  • Slide 62
  • Classicists
  • Slide 63
  • Hisham and Farouk at Furman
  • Slide 64
  • Where is the English?
  • Slide 65
  • Slide 66
  • Goals Learner corpora -- how much have you mastered? How much can you transfer to new material? Customized assessment of corpus/competence User portfolios Aggregation of increasingly sophisticated contributions Undergraduate research projects Automatically linked to relevant texts, sites, objects
  • Slide 67
  • Goals for 2010/2011 Canonical Text Services Protocol middleware for DuraCloud Open Greek and Latin exams for students in the English speaking world based upon student defined corpora.
  • Slide 68
  • Thank you!
  • Slide 69
  • Categories of Development Transform existing research Integrated Papyri, Homer Multitext Enable new areas of research More people using papyrological data Physical access -- done Intellectual access -- can be addressed
  • Slide 70
  • Transforming Classics Enhancing what scholars can do
  • Slide 71
  • Transforming Classics Enhancing what scholars can do Lowering barriers to entry
  • Slide 72
  • Transforming Classics Enhancing what scholars can do Lowering barriers to entry Developing a global, multilingual, multiethnic intellectual community
  • Slide 73
  • Transforming Classics Enhancing what scholars can do Lowering barriers to entry Developing a global, multilingual, multiethnic intellectual community
  • Slide 74
  • Funded Projects Greek and Latin Treebanks (Cantus) Greco-Arabic (Mellon) Mining a Million Books (NSF) Digging into Data (NEH/JISC/SSHRC) Google Digital Humanities Hellespont: Arachne and Perseus -- DFG/NEH
  • Slide 75
  • What can you do?
  • Slide 76
  • Build up a portfolio of what Greek and/or Latin you have mastered
  • Slide 77
  • What can you do? Build up a portfolio of what Greek and/or Latin you have mastered Ask for an evaluation of your knowledge of this corpus and of Greek and Latin
  • Slide 78
  • What can you do? Build up a portfolio of what Greek and/or Latin you have mastered Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution
  • Slide 79
  • What can you do? Build up a portfolio of what Greek and/or Latin you have mastered Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution Treebank -- how many sentences?
  • Slide 80
  • What can you do? Build up a portfolio of what Greek and/or Latin you have mastered Ask for an evaluation of your knowledge of this corpus and of Greek and Latin Look for ways to make a tangible contribution Treebank -- how many sentences? XML tagging? GIS analysis?
  • Slide 81
  • What can you do? Think about an MA thesis that is a publishable contribution.
  • Slide 82
  • What can you do? Think about an MA thesis that is a publishable contribution. Publish an inscription, a medieval text, a canonical work
  • Slide 83
  • What can you do? Think about an MA thesis that is a publishable contribution. Publish an inscription, a medieval text, a canonical work Analyze some data about a word, a text, a site, a topic
  • Slide 84
  • What can you do? Think about an MA thesis that is a publishable contribution. Publish an inscription, a medieval text, a canonical work Analyze some data about a word, a text, a site, a topic Do something!
  • Slide 85
  • Good luck!
  • Slide 86
  • Treebanks and Parallel Text Analysis David Bamman The Perseus Project
  • Slide 87
  • Parallel Text Analysis Driven in large part by statistical MT for modern languages (French/English, German/English, Arabic/English etc). Parliamentary proceedings (Canadian Hansards, Europarl, UN) Legal/government docs (JRC Acquis) Historical texts have often been translated many times into several different languages. Perseus: 4.9M Greek/6.8M English; 3.4M Latin/5M English.
  • Slide 88
  • Parallel Texts The Internet Archive alone contains editions of Horaces Odes in eight different languages Latin: carpe diem quam minimum credula postero (Horace, Ode 1.11) English: Seize the present; trust tomorrow een as little as you may (Conington 1872) French: Cueille le jour, et ne crois pas au lendemain (De Lisle 1887) Early Modern French: Jouissez donc en repos du jour present, & ne vous attendez point au lendemain (Dacier 1681) Italian: tu loggi goditi: e gli stolti al domani saffidino (Chiarini 1916) Spanish: Coge este dia, dando muy poco credito al siguiente (Campos and Minguez 1783) Portuguese: colhe o dia, do de amanh a mui pouco confiando (Duriense 1807) German: Pflucke des Tags Blten, und nie traue dem morgenden (Schmidt 1820)
  • Slide 89
  • Dynamic Lexicon http://nlp.perseus.tufts.edu/lexicon
  • Slide 90
  • Sense Discovery SMT based on Brown et al (1990) Different senses for a word in one language are translated by different words in another. Bank (English) financial institution = French banque side of a river = French rive (e.g., la rive gauche)
  • Slide 91
  • Progressive Alignment Sentence level: Moores Bilingual Sentence Aligner (Moore 2002) aligns sentences that are 1-1 translations of each other w/ high precision (98.5% on a corpus of 10K English-Hindi sentences) Word level: MGIZA++ (Gao and Vogel 2008) parallel version of: GIZA++ (Och and Ney 2003) - implementation of IBM Models 1-5.
  • Slide 92
  • Tufts cluster 40 nodes, each w/ two 2.83 Ghz Quad-Core Xeon processors (= 320 cores) Impact Two 1M word alignments (English->Greek, Greek-> English) on single 2 Ghz Mac Pro: 15 hours Two (simultaneous) 5M word alignments on computing cluster using multi-threaded version (i.e., on one 8-core node): 45 minutes.
  • Slide 93
  • Multilingual Alignment Word-level alignment of Homers Odyssey
  • Slide 94
  • Latin/Greek English Senses
  • Slide 95
  • English Greek/Latin Senses
  • Slide 96
  • Use #1: Automatic Bilingual Dictionaries http://nlp.perseus.tufts.edu/lexicon
  • Slide 97
  • 97 Use #2: Interlinear translations
  • Slide 98
  • 98 Use #2: Interlinear translations
  • Slide 99
  • Use #3: Bootstrapping Multilingual Digital Library http://www.perseus.tufts.edu
  • Slide 100
  • Multilingual Digital Libraries http://www.worldofdante.org
  • Slide 101
  • TEI XML Gallos ab Aquitanis Garumna flumen, a Belgis Matrona et Sequana dividit. Horum omnium fortissimi sunt (The Garonne river separates the Gauls from the Aquitani and the Marne and the Seine (rivers) separate them from the Belgae. The bravest of all of these are )
  • Slide 102
  • Solution: Markup Transfer + 1.Alignment of the source document with the target document in a cascading process: document -> sentence -> word 2.Projection of XML tags in the source document to the target document in way that exploits the linguistic similarity of the text pair.
  • Slide 103
  • Bootstrapping a Multilingual DL Expands depth of translations in a collection to expand the reach of inquiry.
  • Slide 104
  • Treebanks Annotated corpora where the syntactic role and head of each word in a sentence is made explicit.
  • Slide 105
  • Historical treebanks Most recent research and investment in treebanks has focused on modern languages, but treebanks for historical languages are now arising as well: Middle English (Kroch and Taylor 2000) Medieval Portuguese (Rocio et al. 2000) Classical Chinese (Huang et al. 2002) Old English (Taylor et al. 2003) Early Modern English (Kroch et al. 2004) Latin (Bamman and Crane 2006, Passarotti 2007) Ugaritic (Zemnek 2007) New Testament Greek, Latin, Gothic, Armenian, Church Slavonic (Haug and Jhndal 2008)
  • Slide 106
  • Prague Arabic Dependency Treebank
  • Slide 107
  • Latin Dependency Treebank AuthorWords Caesar1,488 Cicero6,229 Sallust12,311 Vergil2,613 Jerome8,382 Ovid4,789 Petronius12,474 Propertius4,857 Total53,143
  • Slide 108
  • Ancient Greek Dependency Treebank WorkWords Aeschylus (complete)48,158 Hesiod, Works and Days6,303 Homer, Iliad38,390 Homer, Odyssey99,353 Total192,204
  • Slide 109
  • Building Treebanks Solicit annotations from two independent annotators; reconcile differences between them. Background: ranges from advanced undergraduates to PhD and professors, with the majority being students in graduate programs in Classics. Average speed: 124 words per hour. Interannotator accuracy: attachment (ATT), label (LAB), labeled attachment (LABATT): ATTLABLABATT Hesiod, W&D85.1%85.9%79.5% Homer, Iliad87.1%83.2%79.3% Homer, Odyssey87.5%85.7%80.9% Total87.4%85.3%80.6%
  • Slide 110
  • Student Contributions...
  • Slide 111
  • Syntax in the Dynamic Lexicon
  • Slide 112
  • URLs Treebank data http://nlp.perseus.tufts.edu/syntax/treebank/ Treebank annotation environment http://nlp.perseus.tufts.edu/hopper/ Translation information http://nlp.perseus.tufts.edu/hopper/sense.jsp Greek lexicon http://nlp.perseus.tufts.edu/lexicon/