Upload
impact-centre-of-competence
View
94
Download
1
Embed Size (px)
DESCRIPTION
Succeed WP3 Validation and Take-up of Tools at the "Succeed in Digitisation. Spreading Excellence" Conference.
Citation preview
Succeed WP3 – Validation and take-up of tools
Katrien Depuydt (INL) –Stefan Eickeler, Sebastian Kirch, (IAIS)
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Objectives
Many tools and linguistic resources were developed in research and
development programs supporting the digitisation of cultural heritage
Still, too few are used in the productive environments
Succeed’s approach to support the take-up of these tools:
1. Identify existing tools and resources
2. Identify libraries willing to use and evaluate tools
3. Define criteria to validate and evaluate tools
4. Provide training material for tools
5. Provide support to libraries using and evaluating tools
6. Blueprint for validation and take-up of tools
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Survey of tools
Training material Evaluation
1. SURVEY AND SELECTION OF TOOLS
Survey of tools
Brief description and goals
Produce a survey of existing
tools
ground truth data and
lexicon data for digitisation
Select candidate tools for implementation at cultural heritage institutions
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Survey of tools
Methodology used to achieve the
objectives
1. Taxonomy for categorisation based on
a simplified digitisation workflow
2. Definition of attributes e.g. how a tool
can be used in the digitisation process
3. Online Spreadsheet to collect and
organise tools
4. Assessment and further selection into a
shortlist of tools
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Selection of tools
First selection: knock-out criteria (three steps)
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Further selection: (expertise partners)
Task 1 Survey of tools
Summary of outcomes
Categorised list of 213 research and commercial tools
Available in an online database and frequently updated
Shortlist with the most relevant tools based on a quality assessment
An overview of existing ground truth material and lexicon data has
been produced.
http://impact.dlsi.ua.es/digitisation/tools-resources/tools-for-text-digitisation/
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
2. VALIDATION PARAMETERS
1st Project Review – WP3
Validation parameters
Brief description and goals Define validation parameters and procedures for the implementation of
tools in productive environments (per task carried out by using a tool)
Validate each tool (or group of tools) based on these criteria
Work out evaluation work plans and test scenarios in cooperation with libraries based on their requirements
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Validation parameters
Methodology used to achieve the objectives
1. Definition of evaluation template structure
2. Tool selection by libraries
3. Creation and compilation of evaluation material Separate evaluation forms
per task/tool type & common usability evaluation form
4. Distribution of evaluation material to participating libraries
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
1st Project Review – WP3
Validation parameters
Summary of outcomes
Described evaluation procedures
and produced 9 evaluation forms
per task
Worked out evaluation and test
scenarios as a “work plan” together
with the participating libraries
Blueprint for take-up and validation
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
3. TAKE-UP SUPPORT
Take-up support
Brief description and goals
Support the integration, take-up and validation of digitisation tools and
resources
Tool implementation at four participant libraries and nine external libraries
(16 potential external libraries at the start of the project > 9 retained)
Assistance for the adaptation/application of the tools to specific domains
and/or languages
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Take-up support
Methodology used to achieve the objectives
1. Each library installs, on average, two tools and tests their performance and
usability in a productive environment according to the predifined
validation criteria
2. Some consortium libraries will test existing linguistic resources for
enhancement of textual information retrieval
3. The technical partners (IAIS, INL, PSNC, UA) will provide online assistance
for the adaptation of the tools to specific domains and languages
4. The technical partners will report on the results based on the information
provided by the libraries
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
External Libraries
Library Country Selected Tools
Wielkopolska Biblioteka Cyfrowa Poland - Scan Tailor
- JHOVE2
- Image Magick
General Historical Library of
Salamanca
Spain - Gimp
- Omnipage
Wroclaw University Library Poland - Scan Tailor
- Tesseract OCR
University Library of Bratislava Slovak
Republic
- Scan Tailor
- ImageMagick
National Library of Finland Finland - Newspaper segmentation
- Korrektor
- Document Deskewer
Library of the University of Granada Spain - Scan Tailor
- Alchemy API
University Library of Leuven Belgium - Abbyy FRE
- NERT
University Library of Antwerp Belgium - NE Attestation tool,
- NLTK (NE),
- Stanford (NE)
University Library of Darmstadt Germany - Newspaper segmentation
- Korrektor
- Document Deskewer
Internal Libraries
Library Country Selected Tools
Biblioteca Virtual Miguel de Cervantes Spain - Abbyy FRE
- Geometric correction: Page Curl
- COBaLT
- Lexicon as Webservice
Bibliotèque nationale de France France - DBPedia Spotlight
- Evaluation Tool for OCR
- Lexicon as Webservice
Koninklijke Bibliotheek Netherlands - Lexicon as Webservice
- NLTK
- NERT
The British Library United
Kingdom
- Evaluation Tool for OCR
- Stanford (NE)
- Lexicon as Webservice
Take-up support
Summary of outcomes
Involved 9 external libraries in the
project to perform tool evaluation,
each of them committed to evaluate at
least 2 tools
Collected libraries’ digitisation
requirements
Consulted libraries in defining
interesting use cases for evaluation
Provided remote assistance for the
take-up of tools selected by the
libraries
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Take-up support
Remote assistance for technical support: Assistance for the integration and
adaptation of the tools to specific domains, languages and use cases
Implementation studies (final report): Elaboration of blueprint on validation
and take-up process for tools and resources
Case studies from the implementation experiences produced
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
4: TRAINING
Training
Brief description and goals
Produce documentation and training material for the tools to be validated. They must help the participating libraries to take-up the tools in their productive environment.
Provide training on specific tools to external stakeholders.
Organise on-site training workshops depending on libraries requirements
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Training
Methodology used to achieve the objectives
1. Document structure of training material
2. Tool selection by libraries
3. Distribution of Work: WP 3 partners according to expertise and knowledge
with the selected tools
4. Creation and compilation of training material
5. Distribution of training material to participating libraries
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Training
Summary of outcomes
Prepared training materials for 19 tools
(separate document, online SCORM +
DigitWiki)
Organized TPDL tutorial attracting
experts from digital libraries from
around the world
Participation in hackathons
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
5. CONCLUSIONS
Conclusions
Evaluation work of each participating library
> Presentations!
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Conclusions
Blueprint for evaluation
General recommendations for evaluation by libraries:
a. Translate requirements into detailed use case (including detailed
description of data + data format)
b. Acquire or produce test data
c. Determine tools
d. Produce work plan
e. Verify use case with internal and external experts (Tool providers, CoC)
If no test data can be produced, adapt use case
If plan breaks down in too many steps, adapt use case
If necessary, change tool selection
f. Documentation of the evaluation (evaluation forms)
g. Use experienced technical staff
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Conclusions
Blueprint for evaluation
General recommendations for tool providers:
a. Provide a clear description of the purpose of the tool
b. Provide a clear description of the formats the tool can handle
c. Provide a clear description of the type of material the tool can handle with
reasonable results; provide information on performance where possible
d. Provide a clear step by step description of the complete procedure that
should be followed to get the best possible result, including training and
tuning of parameters.
e. Provide compact documentation if possible
f. Minimize interdependency of parts of documentation
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
Thank you!
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.