1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

Preview:

Citation preview

1

Report from MPI Team

Roman SkibaPeter Wittenburg

DOBES WorkshopFrankfurtApril 2003

2

Data types

• Tapes• Audio, Video (DV-PAL, DV-NTSC, VHS, DAT, MD)• other material: 8mm movies, reel to reel audio, slides, photos• DMFs (mpeg1, mpeg2, wav)• Metadata (IMDI-sessions, IMDI-corpusstructures)• Session media

• mpeg1, wav - for further processing • mpeg2 – for archiving• Html – as a container for text pictures and photos (jpeg)• PDF – as a container for text pictures and photos (jpeg)

• Info files (pdf, txt, html)• Annotations (EAF, shoebox)

DOBES WorkshopFrankfurtApril 2003

3

Statistics

Raw data: tapes, DMFs and other media.

DOBES WorkshopFrankfurtApril 2003

Project/Language Tapes DMFs Other data digitized, converted or delivered

AWETI 71 70 - CHACO 17 17 - KUIKURO 36 37 Slides LACANDON 22 22 - SA-MN 17 (+?) 17 (+?) VCDs SVAN 4 4 - TSOVA-TUSH 5 5 - UDI 2 2 - TEOP 25 25 Lexicon, grammar TOFA 42 42 PDF TRUMAI 88 90 "Reel to reel", 8mm movies, slides, grammar WAIMA 24 24 - WICHITA 7 7 Lexicon examples Total 360 372

4

Statistics II

Corpus units: meta data, media files, annotations .

DOBES WorkshopFrankfurtApril 2003

Project/Language IMDI-files sessions Integrated

imdi

Integrated

annotations

AWETI 35 (+?) 42 (+36) 35

KUIKURO 73 73 3 1

LACANDON 87 67 67

TEOP 31 30 11

TOFA 169 82 43 14

TRUMAI 189 181 187 34

TSOVA-TUSH 7 0 0

WICHITA 2 2 0

Total 593 513 346 49

5DOBES WorkshopFrankfurtApril 2003

6

Digitizing problems

• Recording problems

• due to non-continuous time code• due to long play mode• due to stills between moving pictures (!)

• Communication problems

• Maarten handles all comm with great care • Money problems (due to budget cuts we have to be more careful with expenses - less copying etc)

DOBES WorkshopFrankfurtApril 2003

7

Audio/Video Archiving• many discussions with archivists in particular about audio (Austrian/German audio/phonogram archive, EMELD)

• point at LREC meeting: MP3 and ATRAC (Minidisc) are not ideal, but are acceptable for listening to and normal analysis of speech (discussed type of reduction and effects)

• attitude now: • any MD/MP3 file is reformatted to PCM in the archive• strong recommendation to researchers to use 16 bit linear PCM HF• get best quality you can - new devices such as DENON• what is slightly higher costs for equipment in relation to total budget • miniaturization can be a problem

• DENON Recorder• 192 MB flash cards (or even more)• linear PCM 768 kbps stereo = 16 min / mono = 32 min• MP3 (MPEG2 layer 2) 64 kbps: factor 12 => mono ~ 6 h

DOBES WorkshopFrankfurtApril 2003

8

Video Digitization in the Field• audio no problem • video digitization at MPI was and is a success story • but slow cycle time - therefore digitization in the field

DV-Camera

DV-encoding3.4 MB/sec1h = 20 GBproprietary

limited sw support

MPEG1-encoding1.5 Mbps 1h = 1GB

to work with

MPEG2 copy (~6 Mbps) MPEG1 copy (~1 Mbps)MPEG4 copy (0.5 - …)etc

• MPEG2 widely accepted archive standard, various frontend codecs • still compressed - new standard will come in future • need your tapes (copies) and the MD file to create MPEG2 versions• use camera in continuous mode !!!! then batch segmentation • adapted workflows necessary

I-link

good old mail

conversionTsunami

tests withMPEG-Camera not ok

DOBES WorkshopFrankfurtApril 2003

9

Access to Archiveshort-term

DOBES WorkshopFrankfurtApril 2003

10

Access to the DoBeS archive I

Current state

• Digital data transport via

• Mail (DMF, session media)• FTP (all data) with password and User ID• Email (metadata, annotations, infos)• IMDI Browser (metadata, infos)

DOBES WorkshopFrankfurtApril 2003

11

Access to the DoBeS archive II

Testing new ways

• Digital data transport via

• IMDI Browser (all integrated data types) password and User ID

• HTML corpus (all data types) password and User ID

• Remote access

DOBES WorkshopFrankfurtApril 2003

12

Access to the DoBeS archive III

Future scenario

• Short term solution

• To open all data types of a team for the IMDI Browser (media, annotations etc.)

• Long term solution• File access (user IDs and passwords) administrated by the teams

DOBES WorkshopFrankfurtApril 2003

13

Access to Archivelong-term

DOBES WorkshopFrankfurtApril 2003

14

Archive Access Single Personthe single person solution - the (almost) ideal world

all in one single personal box

DOBES WorkshopFrankfurtApril 2003

15

Archive Access Single Institutethe single institute solution - the (almost) ideal world

all in one single big box for an institute

DOBES WorkshopFrankfurtApril 2003

little more tricky - not all may access everything but one controlling instance

fast networks available

16

Archive Access SI+Webthe single institute solution with Internet Access

the (almost) ideal world

all in one single big box for all

DOBES WorkshopFrankfurtApril 2003

much more tricky - not all may access everything still one controlling instance

but can be faked and slow networks for video control delegation necessary

17

Archive Access DOBES Goal

DOBES WorkshopFrankfurtApril 2003

even more tricky - not all may access everything and everywhere?several controlling instances - need trust mechanisms

control delegation even more necessary stability of paths???

AILLA

SOAS

DOBES

??

18

DOBES Archive Access

DOBES WorkshopFrankfurtApril 2003

resource domainstreaming servers http servers

URID - ACLmapping

URID-Pathmapping

client

URIDPID URL+

resource

users &groups

check whether user is allowed to access res

managementclients

check on valid ticket

19

DOBES Archive Access

essentials

• online archive managers have write (delete) access (consistency, otherwise complex check-in & versioning system)

• question: who has read access rights?• researchers/archivist define access policy - incl. management???

• access per usage request (temporary) or per person/group?• do we need person groups (team members, researchers, community members, …)?• access patterns per infotyp (MD, video, audio, annotations, others)

• as was stated - everyone has to accept CoC and copyright statement!• what about logo and watermarking?

DOBES WorkshopFrankfurtApril 2003

20

Collaborationsof

DOBES Archivist

DOBES WorkshopFrankfurtApril 2003

21

Collaborations I• DELAN (Digital Endangered Languages Archive Network) AILLA, DOBES, ELAR-SOAS, PARADISEC, … link to and support from UNESCO?

• joint web portal with links AILLA? general information, eNEWS Archiv• Electronic Newsletter DOBES• Electronic Preprint Server LL?• Advice+FAQ AILLA?• Training & Revitalization etc SOAS• E & L, CoC PARADISEC• Archive Access ?• Long-term Storage DOBES

• pressure group • joint fund raising activities • Adopt a Language activity ??

DOBES WorkshopFrankfurtApril 2003

22

Collaborations II• E-Meld

• joint developers workshop • joint CV editor by MPI • perhaps joint lexicon tool - interest on both sides (start after Easter with real person power at MPI)

• close exchange with Arizona group about Ontology (Terry & Scott) • joint international workshop on lexicon schemas and registries

• INTERA (Integrated European Language Resource Area)• integration of all metadata about all LR • automatic search for useful tools

• ECHO (European Cultural Heritage Onlie)• additional language resources from archives into MD pool• interoperability issues with domains such as Ethnology, …

• TYPOWEB (proposal to EU)• project to define an open distributed typology framework • inclusion of DOBES and SOAS teams as testers (if they like)• a number of excellent typologists, field linguists and 2 technology p

• LanguageWeb (proposal to EU) knowledge basis for lang tech• CHaSE (proposal to EU) open tech framework for cultural heritage• data-GRID initiatives (to come) network for fast data exchange

DOBES WorkshopFrankfurtApril 2003

23

DOBESTraining Course

DOBES WorkshopFrankfurtApril 2003

24

Training Courses • date 2-6 June

• everyone is invited - in particular new teams • all new teams showed interest - want much practical stuff • planning now content - any comment is welcome • will distribute the new schedule soon • “old” teams are invited to present topics / experience reports / …

• open to SOAS teams

• will carry out training courses in Germany together with GBS (Nikolaus Himmelmann)

DOBES WorkshopFrankfurtApril 2003

Recommended