33
Language Documentation & Archiving Heidi Johnson The Archive of the Indigenous Languages of Latin America (AILLA) The University of Texas at Austin

Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Language Documentation& Archiving

Heidi Johnson The Archive of the Indigenous Languages

of Latin America (AILLA)The University of Texas at Austin

Page 2: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

AcknowledgementsLanguage Digitization Project Conference 2003 EMELD Working Group on ResourceArchiving:Gary Holton, ANLCHeidi Johnson, AILLANick Thieberger, PARADISECGary Simons, SIL InternationalWallace Hooper, Indiana UniversitySusan Hooyenga, University of Michigan

http://emeld.org

Page 3: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

A little historyBoasian tradition: grammar, dictionary, collectionof textsLinguists gave field materials to museums &libraries, e.g. Smithsonian. Seeking a permanenthome for endangered language materials.M & L not really able to preserve recordings,other than by storing them in a cool dark place.

Page 4: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

History, cont.Anything that can be published was & is adistillation - the product of analysis.Secondary/tertiary resources.Hitherto no feasible means of preservingOR publishing primary materials.The new millenium: digital archives canpreserve and/or publish anything.

Page 5: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

What is an archive?Archive: a trusted repository created andmaintained by an institution with a demonstratedcommitment to permanence and the long-termpreservation of archived resources.Collection: the body of documentary materialscreated by researchers and native speakers.Serves as the basis for research & education. Willbe deposited in an archive.

Page 6: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Why should you archive?to preserve recordings of endangered/minoritylanguages for future generations.to facilitate the re-use of primary materials(recordings, databases, field notes) for:

language maintenance & revitalization programs; typological, historical, comparative studies; any kind of linguistic, anthropological, psychological,etc. study that you yourself won't do.

Page 7: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

More reasons to archiveto foster development of both oral andwritten literatures for endangeredlanguages.to make known what documentation thereis for which languages.to build your CV and get credit for all yourhard work.

Page 8: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Archiving is a form of publishingEven if the resources are restricted, themetadata is public.Get credit for fieldwork in the early stages:list Archived Resources on your CV.Cite data from archived resources.Give consultants proper credit for theirwork and their creations.

Page 9: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Citing archived resourcesSánchez Morales, Germán. (1994).

"Satornino y los soldados." [online] HeidiJohnson, (Res.)http://www.ailla.utexas.org: Archive ofthe Indigenous Languages of LatinAmerica. Access=public. ZOH001R010.

Page 10: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

What should you archive?Recordings of discourse - audio and/orvideo - in as wide a range of genres as yourcommunity employs.Always get permission for everything:

recording archiving excerpting, publishing, etc.

Page 11: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Things you should archivepublic events: ceremonies, oratory, dances, chantsnarratives: historical, traditional, myths, personal,children's stories, ...instructions: how to build a house, how to weavea mat, how to catch a fish, ...literature: oral or written, poetry, any creativeworkconversations: anything that's not gossip or toopersonal, e.g. what we did last spring festival

Page 12: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

More things you should archivetranscriptions, translations, & annotationsof recordingsfield notes, elicitation lists, orthographies -anything other people might find usefuldatasets, databases, spreadsheets - yoursecondary (unpublishable) materialssketches of all kinds: grammar, ethnographyphotographs

Page 13: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Things you should not archiveAnything that would cause injury, arrest, orembarassment to the speakers.Example: Pamela Munro's interviews withZapotecs in L.A. about entering the U.S.illegally.Sacred works with highly restricted uses.But talk to people about safe ways topreserve such works, if they want.

Page 14: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

How should you manage yourcollection?

Corpus management rule #1: Labeleverything you produce withRUTHLESS CONSISTENCY.Corpus management rule #2: Set up asystem before you leave & test it alongwith your equipment. (Tape your friendsand relatives to try things out.)

Page 15: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

1. Find an archive & get theirguidelines

DOBES, for their grant recipients:http://www.mpi.nl/DOBESRegional archives: AILLA, ANLC,PARADISEC, others? (See AILLA's Linkspage)Note: it's not either/or, it's both/all.If there isn't one, write to any one of us,we'll help you.

Page 16: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

2. Identify your archival objectsNot necessarily the same as a file or a tape.Language documentation materialstypically come in related sets, or bundles.Be aware of relations among materials asyou create them so you can label themcorrectly and keep them together.

Page 17: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Relations among itemsderivation: e.g. a transcription is derivedfrom a recordingseries: e.g. a long recording that spansseveral tapes/discspart-whole: e.g. video & audio recordingsmade simultaneously of the same eventassociation: (fuzzy) e.g. photographs of thenarrator of a recording, commentaries

Page 18: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

3. Labelling field materials

Nothing could possibly be moreimportant than labelling everysingle item you produce - track,tape, disc, notebook, file slip,digital file, photograph - withRUTHLESS CONSISTENCY.

Page 19: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Example 1: AILLA resource IDZOH001R040I001.mp3

ZOH = language code001 = deposit number (first deposit)R040 = 40th resource in that depositI001 = 1st item in that resource.mp3 = what kind of file

If you have an archive, write and ask themfor labelling guidelines.

Page 20: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Example 2: participant initialsplus a media type code

gsm1_au1 audio part 1gsm1_au2 audio part 2gsm1_db shoebox interlin of the audiogsm1_tx1 text, misc notesgsm1_ph1 photo of Germán

Page 21: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Example 3: label by media unit,recordings are primary

md1t1 - minidisc 1, track 1md1t1.db - shoebox database for that textnb1 - field notebook 1ds19.xls - spreadsheet dataset (e.g. verb

roots)

Page 22: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Metadata ICatalog information for digital resources.Supports

archive & collection management protection of sensitive materials searching use of resources by many people proper citation of archived resources

Page 23: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Metadata II : Minimum infoSpeakers' full names (plus alias if you want toanonymize in text).Language: Be specific! Zoque of San MiguelChimalapa, Oaxaca, Mexico.Date of creation: YYYY-MM-DD. Use theprimary (recording) date for the bundle.Place of creation: Be specific: village, state,country, or river valley, region, country…Access restrictions & instructions, if necessary.Genre keyword: dependent on choice of schema.

Page 24: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Metadata IIIChoose either IMDI or OLAC schema. Ifyou have an archive, use the one they tellyou.LABEL every metadata entry with thesame label you use for the resource. Listevery related item in the metadata.

Page 25: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

IMDI: www.mpi.nl/IMDISession bundle = resource

Title, date, place, descriptionDepositor (you): contact infoProject: name, director, sponsor, etc.Participants: role, demographic data, contactResources: provenance, formats, relations, etc.Content: context, genre, narrative description, etc.References: relevant publications

Page 26: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

OLAC:www.language-archives.org/Archival object definition is up to you

Contributors / creatorsTitle, date, descriptionResource info: formatsRelation to other objectsSubject - linguistic subfieldType.linguistic = genre

Page 27: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Corpus management toolsFrom MPI: IMDI Browser & IMDI Dataentry.I have a Shoebox 2.0 template that needsporting to Shoe 5.0 (?).Someday, we'll do a Filemaker Pro one.Otherwise, use any database or spreadsheetor Word template and create your own.

Page 28: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Intellectual property rightsDefine a policy concerning IPR and develop aconsistent practice for obtaining consent, e.g.,forms and/or recorded statements.Learn how to talk to your consultants about IPR.Ask other researchers who have worked in yourregion or language community.Note the IPR status of each resource and eachitem in the metadata.

Page 29: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Formats

??minidiscms /MS Word

working

??mp3pdf / htmlpresentation

mp2wav44.1/16

tiff / XMLarchival

Videoa film

Audioa recording

Texta grammar

Page 30: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Archive-quality formats are:non-proprietary; that is, the encoding is inthe public domain;supports forward migration to new formats;portable, re-useable, repurposeable;best possible reproduction of the original.

Page 31: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

When should you archive?As soon as you get back from the field:

to prevent accidental damage or loss; to get back handy presentation formats; to build your CV even before you are ready topublish results.

If not then, as soon as possible.At the very least, mention your data and anarchive in your will.

Page 32: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Archive your dataWe encourage you to archive recordingsASAP and add transcriptions, translations,annotations, etc. later.Secondary materials are generallyreproducible; the primary recordings arenot!Students should password-protect their datauntil they finish their theses.

Page 33: Language Documentation & Archiving · 2013. 7. 30. · Acknowledgements Language Digitization Project Conference 2003 EMELD Working Group on Resource Archiving: Gary Holton, ANLC

Useful websitesDELAMAN: http://www.delaman.org/IMDI: http://www.mpi.nl/ISLEOLAC: http://www.language_archives.orgEMELD: http://emeld.orgAILLA:http://www.ailla.utexas.org/links.htmlWrite to me: [email protected]