Upload
bpfanpage
View
130
Download
0
Tags:
Embed Size (px)
Citation preview
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Toward Best Practice for Toward Best Practice for Language Resource Language Resource
ConversionConversion
EMELD 2003 Working Group EMELD 2003 Working Group on Resource Conversionon Resource Conversion
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Working GroupWorking Group
Baden Hughes, Chilin Shih (co-Baden Hughes, Chilin Shih (co-chairs)chairs)
Helen Aristar-Dry, Steven Bird, Helen Aristar-Dry, Steven Bird, Reinhard Hiss, Will Lewis, Barbara Reinhard Hiss, Will Lewis, Barbara Need, Steven WeinbergerNeed, Steven Weinberger
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
ObjectivesObjectives
Consider the methodology for and Consider the methodology for and make recommendations about the make recommendations about the conversion of legacy (possibly non-conversion of legacy (possibly non-digital) language resources into digital) language resources into enduring BP formatsenduring BP formats
Examine ongoing conversion Examine ongoing conversion processes and identify issues in the processes and identify issues in the conversion of digital language conversion of digital language resources in working contextsresources in working contexts
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
MethodologyMethodology
Focus on high level principles which Focus on high level principles which pervade general language resource pervade general language resource conversion problems rather than conversion problems rather than format-specific resource conversion format-specific resource conversion issuesissues
Acceptance that appropriate Acceptance that appropriate technical expertise probably already technical expertise probably already exists “somewhere” but needs to be exists “somewhere” but needs to be adapted to the EMELD contextadapted to the EMELD context
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Subject MatterSubject Matter
Content and StructureContent and Structure• MetadataMetadata• TextText• AudioAudio• VideoVideo• Still ImagesStill Images
Physical MediaPhysical Media Hardware / SoftwareHardware / Software
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Core ValuesCore Values
Bird & Simons (2003) “Seven Bird & Simons (2003) “Seven Dimensions …”: content, format, Dimensions …”: content, format, discovery and preservationdiscovery and preservation
Motivation to ensure persistence and Motivation to ensure persistence and longevity of archive quality digital longevity of archive quality digital objectsobjects
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Principles …1Principles …1
Ignorance is not bliss !Ignorance is not bliss ! Not every user needs to be a technical Not every user needs to be a technical
expert, but should be assisted their expert, but should be assisted their context and functional requirements and context and functional requirements and to access sufficient information to make an to access sufficient information to make an informed choiceinformed choice
Conversion issues will affect institutions Conversion issues will affect institutions and individuals at many levels – and individuals at many levels – particularly in terms of resources available particularly in terms of resources available to address issuesto address issues
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Principles …2Principles …2
Conversion and ArchivingConversion and Archiving• The best available copy should be archived The best available copy should be archived
according to BPaccording to BP• Format neutrality in respect to use involves Format neutrality in respect to use involves
effort but is essential to ensure long term effort but is essential to ensure long term viabilityviability
• Archiving practice will imply resource Archiving practice will imply resource conversion for preservation purposesconversion for preservation purposes
• Consistency in conversion methodology is Consistency in conversion methodology is inherently better than random variationinherently better than random variation
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Principles …3Principles …3 Conversion and Re-UseConversion and Re-Use
• Requirements for re-use vary between agents Requirements for re-use vary between agents and purposesand purposes
• Inherent in most (all?) conversion processes is Inherent in most (all?) conversion processes is some degree of information loss, thus the some degree of information loss, thus the absolute minimum number of format absolute minimum number of format conversions should be undertakenconversions should be undertaken
• Where possible, converted materials should Where possible, converted materials should include information about their digital lineageinclude information about their digital lineage
• Additional information pertaining to the Additional information pertaining to the language resource may be located separately language resource may be located separately from the resource itself and needs to be from the resource itself and needs to be preservedpreserved
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
A Pragmatic Approach to BP .. 1A Pragmatic Approach to BP .. 1 The lineage of digital language resources may The lineage of digital language resources may
have included processes which are less than have included processes which are less than optimal practicesoptimal practices
BP may not realistically be achievable in all BP may not realistically be achievable in all contexts (constraints such as time, money, contexts (constraints such as time, money, equipment, expertise, inclination …)equipment, expertise, inclination …)
Some practices have inherently higher potential Some practices have inherently higher potential to cause conversion and archiving issuesto cause conversion and archiving issues
Significant incentives need to be offered to induce Significant incentives need to be offered to induce change in language data management practices change in language data management practices towards BP – would you prefer to choose BP or be towards BP – would you prefer to choose BP or be forced to adopt BP when you lose data ?forced to adopt BP when you lose data ?
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
A Pragmatic Approach to BP .. 2A Pragmatic Approach to BP .. 2 Software choice will impact on the Software choice will impact on the
longevity of language resource data.longevity of language resource data. Ideological debates about software Ideological debates about software
development methodologies is often development methodologies is often misleading when considering longevity and misleading when considering longevity and preservationpreservation
Absolute ranking of practice on a scale of Absolute ranking of practice on a scale of worst to best is not transparent (context worst to best is not transparent (context sensitive, moving target …)sensitive, moving target …)
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Ongoing Work Items …1Ongoing Work Items …1
Identify and review core documents on BP Identify and review core documents on BP formats, including accessible formats, including accessible recommendations for different audiencesrecommendations for different audiences
Identify and review software tools which Identify and review software tools which enable conversion according to BP enable conversion according to BP principles (this is not necessarily a principles (this is not necessarily a democratic system!)democratic system!)
Develop accessible case studies of typical Develop accessible case studies of typical language resource conversion problems, language resource conversion problems, critique them and provide advice on how critique them and provide advice on how to achieve BP in these contextsto achieve BP in these contexts
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Ongoing Work Items … 2Ongoing Work Items … 2
Examine how physical media choices Examine how physical media choices can affect the retention or loss of can affect the retention or loss of information and implications for the information and implications for the language resource conversion language resource conversion processprocess
Promulgate resource conversion as a Promulgate resource conversion as a pervasive issue to be considered by pervasive issue to be considered by many other BP contextsmany other BP contexts
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Observations Relevant to Other Observations Relevant to Other Working GroupsWorking Groups
Resource ArchivingResource Archiving• Good archiving practice will consider resource Good archiving practice will consider resource
conversion as a fundamental issueconversion as a fundamental issue• Infrastructural constraints may significantly Infrastructural constraints may significantly
increase the risk of information loss increase the risk of information loss Resource CreationResource Creation
• BP at the data collection point reduces the risk BP at the data collection point reduces the risk of information loss in any conversion processof information loss in any conversion process
• Conversion implications need to be considered Conversion implications need to be considered when selecting an appropriate tool for the data when selecting an appropriate tool for the data and functionality types requiredand functionality types required
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Observations Relevant to EMELDObservations Relevant to EMELD
EMELD needs to consider the EMELD needs to consider the longevity and persistency longevity and persistency implications for ongoing archiving implications for ongoing archiving functions particularly in reference to functions particularly in reference to the “long term” – this may include the “long term” – this may include adequate financial resourcingadequate financial resourcing
EMELD Resource Conversion WG 20030713EMELD Resource Conversion WG 20030713
Logistical RecommendationsLogistical Recommendations
Creation of Communities of Expertise Creation of Communities of Expertise within EMELD framework to advise on within EMELD framework to advise on working group topics (cf. Ask-A-Linguist) working group topics (cf. Ask-A-Linguist) including experts from outside linguisticsincluding experts from outside linguistics
Creation of Working Groups email lists for Creation of Working Groups email lists for ongoing work in these areasongoing work in these areas
User reviews and solutions section for User reviews and solutions section for tools and processes within the EMELD tools and processes within the EMELD School site School site