16
1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data management (part 2)

1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

Embed Size (px)

Citation preview

Page 1: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

1

LingDyFebruary 14, 2012

TUFS, Tokyo

David NathanEndangered Languages Archive

Hans Rausing Endangered Languages ProjectSOAS, University of London

Data management(part 2)

Page 2: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

2

Also (for Part 2)

creating a catalogue/inventory/index metadocumentation data/file versions transferring data sharing data backup character encoding

Page 3: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

3

Different types of metadata

there are many types of metadata different types of materials may have

different metadata eg metadata for photos and videos may

have technical parameters, lists of people appearing

e.g. metadata for transcriptions may have date, version, who transcribed, notes on progress

Page 4: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

4

Your collection catalogue

first, define your collection/corpus/project as some coherent (logical) set of materials

your collection catalogue/inventory/index is a type of metadata this should list and describe all files in your

collection it usually contains the categories of

information that are relevant for many files

Page 5: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

5

Your collection catalogue

you could have one large catalogue that covers every file, or

you could have a catalogue that is subdivided according to types of files, and/or groups of resources

there is no “one size fits all” solution!

Page 6: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

6

Examples

Page 7: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

7

Making an “active” catalogue

this is not necessary, but may be useful if you use a spreadsheet, you can embed links

to actual files to make using your collection easier

Page 8: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

8

Metadocumentation

you should keep an updated description of the methods, conventions, abbreviations you use

.. so somebody could fully understand (and use) your data and methods in your absence

example

Page 9: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

9

Data/file versions

need to distinguish or keep versions depends on purposes

by suffixing filename, eg fugu1.txt

fugu2.txt, or fugu_1.txt

fugu_2.txt which of the above methods is better?

Page 10: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

10

Data/file versions

fugu_14022013.txtfugu_20130214.txt14022013_fugu.txt20130214_fugu.txt

which of the above methods would be best?

note: do not rely on system dates!

Page 11: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

11

Data/file versions

do you need to keep every version? often, fine to keep “original” plus current

if information is regularly updated, corrected you can keep 1 filename and put dates in the document itself, or record dates in a catalogue/metadata file

a series of files may have inherent value, e.g. your transcriptions/annotations, as your understanding and analysis changes, so date and keep files use different tiers in ELAN?

Page 12: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

12

Transferring data

ensure your computer is not a “walled garden” you can use

drives/devices (but avoid DVDs!!) email upload (where available) send links “cloud” e.g. Dropbox

issues include cost, potential viruses, assuring integrity of copies, but generally little problem

Page 13: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

13

Sharing

can we work in a shared, collaborative space? Dropbox Google Docs blogs, Tumblr etc can have shared

“authors”, and contributors with controlled roles

Page 14: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

14

Character encoding

if your document contains anything other than those on a US keyboard, use UTF character encoding

how can I tell if characters in my MS Word document are encoded as UTF8? save as plain text and check options copy into plain text editor such as

Notepad++

Page 15: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

15

Character encoding

useful tools Notepad++ http://notepad-plus-plus.org/

SIL ViewGlyph http://scripts.sil.org/cms/scripts/page.php?item_id=ViewGlyph_home

BabelMap http://www.babelstone.co.uk/software/babelmap.html

ExSite9 http://www.intersect.org.au/exsite9

Page 16: 1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data

16

Your projects

discuss in groups what are the problems or weaknesses in

our “data management plan” or data management methods?