50
Fieldwork and Grammaticography in a Digital World Joshua Wilbur Freiburg Research Group in Saami Studies • Universität Freiburg Descriptive Grammars and Typology • University of Helsinki 28 March 2019 1

Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

FieldworkandGrammaticographyinaDigitalWorld

JoshuaWilburFreiburgResearchGroupinSaamiStudies•UniversitätFreiburg

DescriptiveGrammarsandTypology•UniversityofHelsinki28March2019

1

Page 2: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

Overview•  background•  fieldwork•  grammaticography

•  otheradvances•  outlook

2

FieldworkandGrammaticographyinaDigitalWorld

Page 3: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

BACKGROUND(aka:contextualization)

3

Page 4: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

PiteSaami•  Uralic>Finno-Ugric>Saamic•  spokenby~40individualsfromArjeplog/ÁrjepluovveinSwedishLapland•  aka:Arjeplog-Saami,bidumsámegiella•  nearlyallspeakersareatleast50•  allspeakersarebilingual(PiteSaamiandSwedish/Arjeplogsmål)•  noofficialorthography(yet...),butaworkingstandard•  nomedia•  Swedishdominateseverydaylife•  hardlybeingpassedontoyoungergenerations

4

Page 5: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

5

PiteSaamilargerlinguisticstudies:•  Halász1893(inHungarian)•  Lagercrantz1926(inGerman)•  Ruong1943(inGerman)•  Lehtiranta1992(inFinnish)•  Wilbur2014(inEnglish)•  Sjaggo2015(inSwedish)

Page 6: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

6

PiteSaamilargerlinguisticstudies:•  Halász1893(inHungarian)•  Lagercrantz1926(inGerman)•  Ruong1943(inGerman)•  Lehtiranta1992(inFinnish)•  Wilbur2014(inEnglish)•  Sjaggo2015(inSwedish)

othermaterials:•  extensivecollectionofheritagematerials(ISOF,Uppsala)•  dictionary(PiteSaami->Swedish/English)

andproposedorthographicrules(2016)•  onlinelexicaldatabase•  onlineorthographicrules(includingspellchecker(inbeta))•  smartphoneappintheworks

Page 7: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

7

largerlinguisticstudies:•  Halász1893(inHungarian)•  Lagercrantz1926(inGerman)•  Ruong1943(inGerman)•  Lehtiranta1992(inFinnish)•  Wilbur2014(inEnglish)•  Sjaggo2015(inSwedish)

othermaterials:•  Extensivecollectionofheritagematerials(ISOF,Uppsala)•  dictionary(PiteSaami->Swedish/English)

andproposedorthographicrules(2016)•  onlinelexicaldatabase•  onlineorthographicrules(includingspellchecker(inbeta))•  smartphoneappintheworksrecentlinguisticsprojects:•  Documentation(2008-2015;materialsarchivedatELARandTLA)•  Lexicography(2016)•  Syntacticstructures(2016-present)

PiteSaami

Page 8: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

8

largerlinguisticstudies:•  Halász1893(inHungarian)•  Lagercrantz1926(inGerman)•  Ruong1943(inGerman)•  Lehtiranta1992(inFinnish)•  Wilbur2014(inEnglish)•  Sjaggo2015(inSwedish)

othermaterials:•  Extensivecollectionofheritagematerials(ISOF,Uppsala)•  dictionary(PiteSaami->Swedish/English)

andproposedorthographicrules(2016)•  onlinelexicaldatabase•  onlineorthographicrules(includingspellchecker(inbeta))•  smartphoneappintheworksrecentlinguisticsprojects:•  Documentation(2008-2015;materialsarchivedatELARandTLA)•  Lexicography(2016)•  Syntacticstructures(2016-present)

PiteSaami

Page 9: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

9

->eachfieldworksituationisunique!

PiteSaami

•  significantaspectsofmineinclude:•  anaccessiblemoderntechnologicalinfrastructureon-site•  aprevioushistoryoflinguisticswork•  extensivelanguagetechnologytoolsforclosely-relatedlanguages•  messybutextantorthographic“tradition”whenIstarted

Page 10: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

FIELDWORKinadigitalworld

10

Page 11: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

toolsforfieldwork

•  intheolddays:notebookandpencil•  nowadays:–  recordingequipment–  laptop–  digitalbackupcapacity(eveninthecloud)–  transcriptionsoftware(ELAN)– mobilephones–  socialmedia(e.g.:forstayingincontact,datasource)

–  grammaticographysoftware(e.g.FLExforinterlinearization)

–  languagetechnology… 11

Page 12: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

•  modern,affordabledigitalrecordingtechnologies(especiallyvideo)allowfieldworkerstocapturemuchmorethanjustlanguage,buttheentirehumanevent– morecompletedocumentation,potentiallyusefulbeyondlinguistics*

12

whynotuse:•  bodycameras•  drones•  surround-soundmicrophones•  360°cameras•  3-Dcameras... *cf.Rießler&Wilbur2017

datacollectionandfieldwork

Page 13: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

(re-)collectingolddata(heritageharvesting)

•  OCR(opticalcharacterrecognition)

13

*cf.Partanen&Rießler2019

•  embeddedtext(morethanjustscanning!)

•  canbeexported(e.g.toELAN)

•  canbepartofacorpus*

Page 14: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

(re-)collectingolddata(heritageharvesting)

•  HTR(handwrittentextrecognition)

14

•  embeddedtext(morethanjustscanning!)

•  canbeexported(e.g.toELAN)

•  canbepartofacorpus*•  muchmorecomplexthan

OCR,thusitcurrentlyrequiresmuchmoretrainingdatabeforeit’suseful

*cf.Transkribusproject(Kahle2017);alsoBloklandetal2019forabriefdiscussion

Page 15: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

GRAMMATICOGRAPHYinadigitalworld

15

Page 16: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

briefhistoryofgrammaticography

•  1/3oftheBoasiantrilogy…

•  Payne1997,Mosel2006,Aikhenvald2015,etc.

•  Nordhoff2008ElectronicReferenceGrammarsforTypology:ChallengesandSolutions

•  Implementedgrammars(incorporationincorpusandcomputationallinguistics)

16

Page 17: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

digitaltoolsforgrammaticography

17

•  goodforconcatenativemorphology•  play,play-s,play-ed,play-er,play-er-s

•  notsogoodfornon-linearmorphology•  sing,sing-s,sang,sung

Whatdoyoudowhennon-linearmorphologyisthedefaultinyourlanguage?

•  Toolbox,FLEx

Page 18: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

digitaltoolsforgrammaticography

18

SG PL

NOM juällge juolgeGEN juolge julgijACC juolgev julgijtILL juallgáj julgijda

INESS juolgen julgijnELAT juolgest julgijstCOM julgijna julgij

ABESS juolgedak juolgedagaESS juallgen

juällge‘foot/leg’

Whatdoyoudowhennon-linearmorphologyisthedefaultinyourlanguage?

•  Toolbox,FLEx

Page 19: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

digitaltoolsforgrammaticography

•  Toolbox,FLEx•  other,digitalapproaches...

19

SG PL

NOM juällge juolgeGEN juolge julgijACC juolgev julgijtILL juallgáj julgijda

INESS juolgen julgijnELAT juolgest julgijstCOM julgijna julgij

ABESS juolgedak juolgedagaESS juallgen

juällge‘foot/leg’

4stemallomorphs:juällg-juolg-juallg-julg-Whatdoyoudowhen

non-linearmorphologyisthedefaultinyourlanguage?

Page 20: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammars

•  aka“precise”grammars–  self-validating

•  computer-processable–  butonlyborderlinehuman-readable(atleastfromatraditionalistperspective)

–  computationallinguists,typicallyHPSG

•  analyzelinguisticstructures•  implementation-->parseandtagacorpus

20cf.newLanguageSciencePressseries“ImplementedGrammars”

Siegeletal.2016

Page 21: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

21

•  Giellateknoinfrastructure:– FST–FiniteStateTransducer1– CG–ConstraintGrammar2

•  automaticannotationsinELAN…

1Beesley&Karttunen2003;2Didriksen2007–2018,Karlsson1990;Karlssonetal.1995

theResearchgroupforSaamilanguagetechnologyatUniversityTromsø

implementedgrammar(FST/CG)forPiteSaami

Page 22: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

22

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

ConstraintGrammar(CG)→forremovingambiguitiesinFSToutput formalism:

•  lexc (lexicon,PoS,linearmorphology)

•  twolc (non-linearmorphology)

•  cg3 (syntax)

Usesorthographicstandard!

Page 23: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

23

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

formalism:

•  lexc (lexicon,PoS,linearmorphology)

•  twolc (non-linearmorphology)

Outputanalyses:

Page 24: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

24

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

input:wordform

output:wordformlemma+PoS+Morphology

juällge!juällge juällge+N+Sg+Nom!!julgijd!julgijd juällge+N+Pl+Acc!

Page 25: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

25

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

formalism:

•  lexc (lexicon,PoS,linearmorphology)juällge juällge+N+Sg+Nom!!julgijd juällge+N+Pl+Acc!

Page 26: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

26

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

formalism:

•  twolc (non-linearmorphology)juällge juällge+N+Sg+Nom!!julgijd juällge+N+Pl+Acc!

Page 27: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

27

infrastructure:

FiniteStateTransducer(FST)→forgeneratingwordforms

(itworksinbothdirections)

Page 28: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

28

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

BUT:howtodealwithmorphologicallyambiguouswordforms?(disambiguation)

Page 29: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

29

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

ConstraintGrammar(CG)→forremovingambiguitiesinFSToutput formalism:

•  lexc (lexicon,PoS,linearmorphology)

•  twolc (non-linearmorphology)

•  cg3 (syntax)

example:rulesdescribingdependencybetweenadpositionsandgenitivecase

Page 30: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammar(FST/CG)forPiteSaami

30

infrastructure:

FiniteStateTransducer(FST)→foranalyzingwordforms

ConstraintGrammar(CG)→forremovingambiguitiesinFSToutput formalism:

•  lexc (lexicon,PoS,linearmorphology)

•  twolc (non-linearmorphology)

•  cg3 (syntax)output(analyses)

Page 31: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

nala gähttjat tjurvij daj

disambiguationexample

31

Page 32: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

nala gähttjat tjurvij daj

onto look+INF antler+GEN+PLantler+COM+PL

DET+GEN+PLDET+COM+PLPRON+GEN+PLPRON+COM+PL

disambiguationexample

32

FSToutput:

Page 33: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

daj tjurvij nala gähttjat

‘tolookatthoseantlers’ [pit100405b.011]

disambiguationexample

33

FSToutput:

Page 34: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

daj tjurvij nala gähttjat

da-j tjurvi-j nala gähttja-t

DET-GEN.PL antler-GEN.PL onto look-INF

‘tolookatthoseantlers’ [pit100405b.011]

disambiguationexample

34

FSToutput: CGsyntacticdisambiguation:

•  postpositionsgoverngenitiveNPsSELECT Gen IF (*1C Po BARRIER NoNP);

•  pronounsarenotembeddedinanNPREMOVE Pron IF (*1C N BARRIER NPNH);

Page 35: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

implementedgrammarspros:•  entirelydigital(easycopying,versioning,etc.)•  computer-processable•  cananalyzeANDgenerate(usefulforpracticaltools,e.g.teaching

apps)•  accuracycanbetestedonrealempiricaldata•  prosecanbeincluded(as<!—comments-->)•  furtheruseinother,digitalapplications...

35

cons:•  requiressignificanttechnicalknowhowtolearnandtoimplement•  notveryhuman-readable,especiallyfornon-specialists

–  proseisonlyincludedas<!--comments-->–  notidealforstandardaveragetypologists–  notevenclosetoidealformostnon-linguists

Page 36: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

36

•  spell-checkers•  grammar-checkers

•  teachingmaterials(e.g.apps)

furtheruseinother,digitalapplications...

Page 37: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

37

•  spell-checkers•  grammar-checkers

•  teachingmaterials(e.g.apps)

•  indocumentarylinguistics/endangeredlanguagedescriptions– automatictokenizationandannotationforcorpora

furtheruseinother,digitalapplications...

Page 38: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

furtheruseinother,digitalapplications...

38

•  tierstructureinELANcorpora(Freiburg-style)

includingannotationsfor:•  Lemma•  Partofspeech•  Morphologicalcategories•  Gloss

Page 39: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

furtheruseinother,digitalapplications...

39

benefits:•  savestime•  avoidsinconsistencies•  canbeupdatedautomatically

corpusbuilding/extensionusingascript1that:1.  tokenizestheorthographicrepresentation

2.  sendseachtokenthroughFST3.  removesambiguitiesusingCG

4.  addsanEnglishgloss

5.  insertsthisoutputintoELAN

1cf.Bloklandetal.2015;Gerstenbergeretal.2016;Gerstenbergeretal.2017

•  tierstructureinELANcorpora(Freiburg-style)

Moredetailsintalkat11:30inroom13byBlokland,PartanenandRießler

Page 40: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

summaryofdigitalgrammaticography

40

requires:•  timetolearntheformalismandsetuptheinfrastructure•  understandingofgrammaticalstructures•  string-basedrepresentationoflanguage

mainbenefits:•  canbefreelyaccessibleonline•  possibilitytopublish(hopefullygettingacademicrecognition,cf.LangSciPressseries)•  exportdataforuseinothertoolsanddisciplines

•  spell-checker•  lexicographicmaterials(includingsmartphoneapps)•  corpusbuilding•  teachingmaterials•  increasedstatusforthelanguage•  moreaccessibletootherdisciplines,e.g.viatextsearch

maindrawbacks:•  notterriblyhuman-accessible•  nottaughttraditionallyinGeneralLinguisticsprograms

Page 41: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

OTHERADVANCESindigitaltechnologies

41

Page 42: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

newlanguagetechnologies•  automaticsegmentation,e.g.:– Autosegmenteerija2.0

•  Estonianautosegmentationforced-alignmenttestedonPiteSaamiwithsurprisinglyaccurateresults:

42

Page 43: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

newlanguagetechnologies•  speechrecognition,e.g.:– CommonVoice(moz://a)incommunitydevelopmentforanumberofsmallerlanguages(e.g.:Erzya,Komi-Zyrian,...)

43

Page 44: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

newlanguagetechnologies•  automaticimplementedgrammarproduction– LinGOGrammarMatrix

http://matrix.ling.washington.edu/customize/matrix.cgi

44

Page 45: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

newlanguagetechnologies•  automaticimplementedgrammarproduction– LinGOGrammarMatrix

http://matrix.ling.washington.edu/customize/matrix.cgi

45

Page 46: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

newspeechtechnologies

•  relevanttechnologiesbeingdevelopedcontinuously

•  leadingtoasignificantincreaseinefficiencyforcorpusbuilding

46

->bettergrammaticaldescriptions

Page 47: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

OUTLOOK

47

Page 48: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

outlook

•  digitaltoolscanprovidepowerfuladvantagesforbothfieldworkand(especially)grammaticographyanddocumentation

•  but:theyrequireknowhowthatgoesbeyondatypicallinguist’straining

•  I’mnotsayingthisisforeveryone,andrealisticallyonlypartswillberelevantforafew–thepointis:Digitaltechnologiesshouldbeconsidered,too!

48

Page 49: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

References

49

Aikhenvald,AlexandraY.(2015).Theartofgrammar.Apracticalguide.Oxford:OxfordUniversityPress.Beesley,KennethR.&LauriKarttunen(2003).FiniteStateMorphology.Stanford:CenterfortheStudyofLanguageandInformation.Blokland,Rogier,CiprianGerstenberger,MarinaFedina,NikoPartanen,MichaelRießler,&JoshuaWilbur(2015).“Languagedocumentationmeetslanguage

technology”.In:FirstInternationalWorkshoponComputationalLinguisticsforUralicLanguages,16thJanuary,2015,Tromsø,Norway.Proceedingsoftheworkshop.Ed.byTommiA.Pirinen,FrancisM.Tyers,&TrondTrosterud.SeptentrioConferenceSeries2015:2.Tromsø:TheUniversityLibraryofTromsø,pp.8–18.

Blokland,Rogier,NikoPartanen,MichaelRießler,&JoshuaWilbur(2019).“Usingcomputationalapproachestointegrateendangeredlanguagelegacydataintodocumentationcorpora.Pastexperiencesandchallengesahead”.In:ProceedingsoftheWorkshoponComputationalMethodsforEndangeredLanguages.Vol.2.Honolulu:AssociationforComputationalLinguistics,pp.24–30.

Didriksen,Tino(2007–2018).Constraintgrammarmanual.3rdversionoftheCGformalismvariant.GrammarSoftApS.Gerstenberger,Ciprian,NikoPartanen,MichaelRießler,&JoshuaWilbur(2016).“UtilizinglanguagetechnologyinthedocumentationofendangeredUralic

languages”.In:NorthernEuropeanJournalofLanguageTechnology4,pp.29–47.Gerstenberger,Ciprian,NikoPartanen,MichaelRießler,&JoshuaWilbur(2017).“Instantannotations.ApplyingNLPmethodstotheannotationofspokenlanguage

documentationcorpora”.In:InternationalWorkshoponComputationalLinguisticsforUraliclanguages(IWCLUL2017).Ed.byTommiA.Pirinen,MichaelRießler,TrondTrosterud,&FrancisM.Tyers.St.Petersburg:AssociationforComputationalLinguistics,pp.25–36.

Halász,Ignácz(1893).Népköltésigyűjtemény.APiteLappmarkarjepluogiegyházkerületéből.Vol.5.Svéd-LappNyelv.Budapest:Magyartudományosakadémia.Kahle,Philip,SebastianColutto,GünterHackl,&GüngerMühlberger(2017).“Transkribus.AServicePlatformforTranscription,RecognitionandRetrievalof

HistoricalDocuments”.In:201714thIAPRInternationalConferenceonDocumentAnalysisandRecognition(ICDAR).Vol.04,pp.19–24.Karlsson,Fred(1990).“ConstraintGrammarasaframeworkforparsingunrestrictedtext”.In:Proceedingsofthe13thInternationalConferenceofComputational

Linguistics.Ed.byHansKarlgren.Vol.3.Helsinki,pp.168–173.Karlsson,Fred,AtroVoutilainen,JuhaHeikkila,&ArtoAnttila,eds.(1995).ConstraintGrammar.Alanguage-independentsystemforparsingunrestrictedtext.

NaturalLanguageProcessing4.Berlin:MoutondeGruyter.Lagercrantz,Eliel(1926).SprachlehredesWestlappischennachderMundartvonArjeplog.Suomalais-ugrilaisenSeuranToimituksia55.Helsinki:Suomalais-

UgrilainenSeura.Lehtiranta,Juhani(1992).Arjeploginsaamenäänne-jataivutusopinpääpiirteet.Suomalais-ugrilaisenSeurantoimituksia212.Helsinki:Suomalais-UgrilainenSeura.Mosel,Ulrike(2006).“Grammaticography.Theartandcraftofwritinggrammars”.In:Catchinglanguage.Thestandingchallengeofgrammarwriting.Ed.byFelix

Ameka,AlanDench,&NicholasEvans.Trendsinlinguistics:studiesandmonographs167.Berlin:MoutondeGruyter,pp.41–68.Nordhoff,Sebastian(2008).“ElectronicReferenceGrammarsforTypology:ChallengesandSolutions”.In:LanguageDocumentationandConservation2.2,pp.296–

324.Partanen,Niko&MichaelRießler(2019).“AnOCRsystemfortheUnifiedNorthernAlphabet”.In:InternationalWorkshoponComputationalLinguisticsforUralic

languages(IWCLUL2019).Tartu:AssociationforComputationalLinguistics,pp.77–89.Payne,ThomasE.(1997).Describingmorphosyntax.Aguideforfieldlinguists.Cambridge:CambridgeUniversityPress.Rießler,Michael&JoshuaWilbur(2017).“DocumentingendangeredoralhistoriesoftheArctic.Aproposedsymbiosisforlanguagedocumentationandoralhistory

research,illustratedbySaamiandKomiexamples”.In:Oralhistorymeetslinguistics.Ed.byErichKasten,KatjaRoller,&JoshuaWilbur.ExhibitionsandSymposia.Fürstenberg:KulturstiftungSibirien,pp.31–64.

Ruong,Israel(1943).LappischeVerbalableitungdargestelltaufGrundlagedesPitelappischen.Uppsala:AlmqvistochWiksell.Siegel,Melanie,EmilyM.Bender,&FrancisBond(2016).Jacy.AnImplementedGrammarofJapanese.CSLIStudiesinComputationalLinguistics.Stanford:CSLI

Publications.Sjaggo,Ann-Charlotte(2015).Pitesamiskgrammatik.enjämförandestudiemedlulesamiska.Senterforsamiskestudiersskriftserie20.Tromsø:Septentrio

AcademicPublishing.Wilbur,Joshua(2014).AgrammarofPiteSaami.StudiesinDiversityLinguistics5.Berlin:LanguageSciencePress.Wilbur,Joshua,ed.(2016).Pitesamiskordboksamtstavningsregler.Samica2.Freiburg:Albert-Ludwigs-UniversitätFreiburg.

Page 50: Fieldwork and Grammaticography in a Digital World · 2019-04-15 · corpus building/extension using a script1 that: 1. tokenizes the orthographic representation 2. sends each token

Gijtovadnet!gijtov adnet

gijto-v adne-t

thank-ACC.SG have-PL.IMP

JoshuaWilburPiteSaamiSyntaxProject

FreiburgResearchGroupinSaamiStudiesjoshua.wilbur@skandinavistik.uni-freiburg.de

withspecialthankstoMichaelRießler,NikoPartanen,RogierBloklandandCiprianGerstenberger

forideas,collaborationandinspiration