Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
DELIVERABLE
Project Acronym: EFG1914
Grant Agreement number: 297266
Project Title: EFG1914
D6.3: Report on quantity and type of items made ava ilable to EFG
and Europeana
Revision 1.0 Date of submission 24 March 2014
Author(s) Franco Zoppi (CNR-ISTI)
Dissemination Level Public
2
Project co-funded by the European Commission within the ICT Policy Support Programme
REVISION HISTORY AND STATEMENT OF ORIGINALITY
Revision History
Revision No. Date Author Organisation Description
1.0 20.03.2014 Franco Zoppi CNR-ISTI
1.0 24.03.2014 Julia Welter DIF Introduction
Statement of originality:
This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both.
3
TABLE OF CONTENT
1 EXECUTIVE SUMMARY......................................................................................... 4
2 QUANTITY OF ITEMS DELIVERED TO EFG AND EUROPEANA . ....................... 5
2.1 AV materials .................................................................................................................................................. 5
2.2 NonAV materials .......................................................................................................................................... 6
3 METADATA SUBMISSION AND INGESTION WORKFLOW IN EFG /EFG1914 ... 7
4
1 Executive summary
This deliverable gives an overview of the type and quantity of content made available to EFG
and Europeana within the EFG1914 project. Aim of the EFG1914 project was to digitise and
give online access to 665 hours of film as well as to 5’600 non-Av items from and about the
First World War. The project started in February 2012 and it was foreseen that the digitised
content should be available completely in time for the project end in February 2014 and
therefore the start of the international commemoration of the 100th anniversary of World War
One.
The digitisation activities were successful and allowed to not only make 665 hours of content
available online, but 701 hours. An extra of 39 further hours – digitised outside the EFG1914
project – could also be aggregated and made available. In line with the goals specified in the
project’s Description of Work, EFG1914 was able to provide online access to the named
number of hours through www.europeanfilmgateway.eu. The 740 hours of film made
available equal 2’870 individual film titles (items). The majority of the data was delivered to
Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some
technical problems not the complete content could be published on Europeana yet, but it will
be possible to solve this issue with the next Europeana harvesting interval in April 2014.
In addition to making the content available through EFG and Europeana, the content
digitised in EFG1914 is also available on Europeana’s World War One-dedicated sub-portal
www.europeana1914-1018.eu, where it is displayed alongside World War One-related
material from the EFG1914 sister projects Europeana Collections 1914-19181 as well as
Europeana1914-19182. Together with the content from national libraries and private
persons, the materials collected within EFG1914 form one of the larges online repositories
related to the First World War, making in an attractive resource for researchers as well as the
general public.
The following chapter will provide a tabular overview of the number of hours and number of
items made available to EFG and Europeana. Information on the individual collections made
available by the archives can be found in D1.3 “Final Digitisation Progress Report”3 as well
as in D5.3 “Type and quantity of non-AV material digitised”4. In chapter three, the ingestion
workflow will be described and aims to sum up he individual steps taken by the archives
involved as well as project coordinator DIF and technical partner CNR-ISTI.
1 http://www.europeana-collections-1914-1918.eu/ 2 http://www.europeana1914-1918.eu 3 http://project.efg1914.eu/wp-content/uploads/2014/03/EFG1914_D1.3_Digitisation_progress_report_to_the_Commission_final.pdf 4 http://project.efg1914.eu/wp-content/uploads/2014/03/EFG1914_D5.1_Type_quantity_nonAV_material_final.pdf
5
2 Quantity of items delivered to EFG and Europeana
2.1 AV materials This table gives a detailed break-down of number of titles and number of hours delivered the
archives and ingested into EFG and Europeana by CNR-ISTI. All items listed below have the
item type VIDEO.
Partner Number of items
available on EFG
Number of
items
available
on
Europeana
Number
of hours
according
to DOW
Number
of hours
available
on EFG
Number of
hours
delivered
to
Europeana
Arhiva National de Filme
(ANF)
12 12 3,0 3,0 3,0
Cinetca di Bologna (CCB) 111 111 27,0 27,0 27,0
Cineteca Del Friuli (CDF) 32 32 11,00 8,8 8,8
Archives Francaises du Film
– Centre National de la
Cinematography (CNC)
141 141 29,00 30,2 30,2
Cinémathèque Royale de
Belgique (CRB)
178 178 60,0 60,4 60,4
Det Danske Filminstitut (DFI) 106 106 50,0 52,2 52,2
Deutsches Filminstitut (DIF) 173 0 38,0 39,5 0
Deutsche Kinemathek (DK) 41 41 40,0 39 39
Estonian Film Archives (EFA) 6 6 1,0 2,0 2,0
EYE Film Institute
Netherlands (EYE)
291 291 100,0 100,0 100,00
FondazioneCinetecaItaliana
(FCI)
77 77 20,0 21,6 21,6
Filmoteca Española (FE) 53 53 13,0 13,1 13,1
Filmarchiv Austria (FAA) 10 10 4,0 4,1 4,1
InstitutValencià de
l’Audiovisuali de la
Cinematografia (IVAC)
8 8 6,0 6,5 6,5
Imperial War Museums
(IWM)
1086 1086 190,0 196,5 196,5
JugoslovenskaKinoteka (JK) 63 63 30,0 30,5 30,5
Cinecittà Luce (LUCE) 25 25 15,0 15,0 15,0
6
Hungarian National Digital
Archive (MANDA)
11 11 3,0 3,0 3,0
Národní filmový archiv (NFA) 17 17 2,5 3,7 3,7
Nasjonalbiblioteket (NNB) 66 66 15,0 21,9 21,9
ÖsterreichischesFilmmuseum
(OFM)
48 48 8,0 8,0 8,0
TOTAL EFG1914
Digitisation
2’591 2’382 665,5 686,0 646,5
Imperial War Museums 171 171 - 38,0 38,0
Bundesarchiv Filmarchiv 37 0 - 7,0 0
Museo Nazionale del Cinema 18 18 - 4,0 4,0
Landesfilmsammlung Baden-
Württemberg
19 19 - 2,0 2,0
National Library of Scotland 34 34 - 3,2 3,2
TOTAL ADDITIONAL WWI
CONTENT AGGREGATED
279 242 - 54,2 47,2
TOTAL CONTENT
AGGREGATED FOR EFG
2’870 2’624 665,5 740,2 693,7
2.2 NonAV materials This table gives a detailed break-down of number of non-AV items delivered the archives and
ingested into EFG and Europeana by CNR-ISTI. All items listed below have the item type
IMAGE or TEXT.
Archive Item type Items to
deliver according to DOW
Items digitised
Items delivered to EFG
Items delivered to Europeana
EYE Film posters 850 916 916 916 DIF Film stills, set
photos 800 845 361 0
DIF Film Posters 0 8 8 0 DIF Articles from film
journals 100 245 245 0
DIF Other text material like film programmes, manuscripts or advertising material
0 144 144 0
DIF Stereoscopic 0 98 98 0
7
glass plates DFI Journal editions 150 161 161 161 FAA Articles from film
journals 100 100 0 0
IVAC Film programmes 20 2 2 2 NFA Photos 388 395 395 395 NFA Pages of
periodicals and other paper documents
1'800 116 116 116
EFA Photos 2’500 2’345 2’345 2’345
CCB Photos 500 500 500 500 DK Photos 200 209 209 209 DK Posters 40 99 99 99 DK Programmes 40 2 2 2 TOTAL 5’6885 6’185 5’601 4’745
As mentioned in the introduction, some of the content could not be harvested by Europeana
yet due to technical problems. It is expected that the remaining content will be harvested by
Europeana in April 2014.
3 Metadata submission and ingestion workflow in EFG/EFG1914
The following section provides an overview of how metadata are submitted and ingested into
EFG/EFG1914 (in the following, simply “EFG”) and Europeana. Specific guidelines and
manuals can be found in the “EFG Data Provider Handbook”6 developed in the EFG project.
These metadata are federated in the EFG Information Space which consists of the common
database and some software services. Archives submitting metadata to EFG deliver them via
XML exports or OAI-PMH harvesting. EFG only ingests and indexes institutions´ metadata
while digital objects remain on the website of the individual institutions. The collected or
“aggregated” metadata (data on digital resources, links to previews, filmographic and
biographic information) serve for several purposes. A reduced set of the collected metadata
is displayed in the portal (www.europeanfilmgateway.eu). Displayed are only information
about the digital objects themselves as well as names and titles related to them. Therefore,
the EFG metadata schema of the common database can store nearly all kinds of information
that can be found in film archival databases. Another main purpose is that the collected
metadata are converted and forwarded to the Europeana portal in a different format
(currently both “ESE - Europeana Semantic Elements” and “EDM - Europeana Data Model”).
Overview of the Data Ingestion Process 5The total number of 5’600 objects indicated in the DOW did not include the text objects from NFA as it was unclear to how many individual items the number of pages would translate. 6 http://www.efgproject.eu/downloads/EFG_DataProviderHandbook_final.zip
8
The next figure provides an overview of the entire ingestion process. The workflow is
introduced with a synthetic schema whose steps evidence the actors, the type of data
provided and the possible format used as well as the support and tools supplied to
implement those activities. The different steps of the ingestion process are described under
the following figure and define not only the ingestion process adopted for the EFG partner
archives, but also the workflow for new contributors:
Step 1: Prepare Content Delivery
The Deutsches Filminstitut (DIF) and the Data Provider (CA) define the type of data and
metadata to be exported, the scheduling of the exports, and the overall procedure to be
followed.
Step 2: Define Metadata Mapping Rules
The DP and DIF prepare a document which specifies how each native metadata element
must be mapped into the EFG metadata schema. The mapping rules are passed to CNR-
ISTI who verifies them for a preliminary control of their conformance to the EFG metadata
schema. It is possible that the checking of the mapping rules requires their revision and
direct interaction between CA, DIF and CNR-ISTI. CNR-ISTI provides an import filter which
performs the actual conversion of the native metadata structure into the EFG metadata
schema. This process is subject to several cycles, in order to achieve the best quality of the
conversion process.
Step 3: Define Matching Rules
This process aims at defining the rules to convert the metadata values used by each DP into
the values accepted by EFG controlled vocabularies. The DP prepare the matching tables
which define how the values from the individual archive shall be transformed into the
controlled EFG vocabularies. Guidelines for the vocabulary matching work can be found in
the “EFG Data Provider Handbook”.
9
START
Prepare Content Delivery
END
Data Quality Control
Define Metadata Mapping Rules
Define Matching Rules
Deliver Data and Metadata
Data Cleaning and Enrichment
Ingestion into EFG production Information Space
Check metadata records with the Content Checker
Modify records with the Metadata Editor
Step 1
Apply Vocabulary Checker to check vocabulary terms
Ingestion into the EFG pre-production Information Space
Export to Europeana
Apply Rotten Checker to check incorrect data values
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Step 9
The EFG Data Ingestion Process
Step 4: Deliver Data and Metadata in the Appropriat e Format
The DP delivers the exports either as XML files or through an OAI-PMH harvesting. CNR-
ISTI receives the data and metadata.
10
Step 5: Data Ingestion into the EFG Pre-Production Information Space
CNR-ISTI implements the software modules corresponding to the mapping and matching
rules defined in step 3 and 4, then ingests the metadata into the EFG pre-production
Information Space. The ingestion requires that the import filter(s) for the specific metadata
is/(are) applied. It also requires that thumbnails for image and text items, if not provided, are
automatically generated from the source data.
Step 6: Perform Data Quality Control
The matching tables established in step 3 are passed to CNR-ISTI in order to initiate the
controlled vocabulary check on the ingested metadata values through a preliminary
automatic conversion of the metadata values according to the matching rules. Incorrect
records and incorrect element values can be viewed through the Vocabulary Checker Tool.
Once the ingestion has been completed, CNR-ISTI reports that the metadata are available in
the EFG pre-production Information Space. The new metadata can be viewed through a
curation tool specifically developed for EFG: the EFG Content Checker Tool. With the help of
this tool, DIF and the DP check if metadata were properly ingested into the EFG pre-
production Information Space. Problems are immediately reported to CNR-ISTI. In addition to
this, DP and DIF check the quality of the converted values with the Vocabulary Checker Tool.
This tool allows to verify all metadata records containing elements that do not conform to the
EFG controlled vocabularies. If an error is identified, DIF arranges the revision of the
Mapping or Matching Rules. In this case the procedure is iterated starting from step 2 or to
step 3.
Step 7: Data Cleaning and Enrichment
As soon as metadata mapping and cleaning is completed, the final metadata cleaning and
enrichment is performed manually by the contributing institution. The DP can use the
Metadata Editor to correct the metadata values and to enrich the metadata manually. Via a
newly created Vocabulary Editor Tool new vocabulary terms can be applied. Terms can be
increasingly added to a vocabulary, modified or deleted. Synonyms are managed as well. In
addition, the Cleaning Rules Editor then allows the definition of rules to be associated to an
XPath and applied by the system to implement the cleaning phase of the aggregation
workflow.
Step 8: Data Ingestion into the EFG Information Spa ce
When the data enrichment and cleaning process is completed, the DP approves the export of
its data from the pre-production Information Space into the production Information Space.
From this moment the data become visible in the EFG portal and can also be exported to
Europeana.
NOTE: the iteration of steps 2-4 and 5-8 is particularly critical in case of new DPs. It
should be noted that 8 “extra” DP (not planned in the original DoW) were integrated
during this second year. This implied a considerable extra effort to be spent
accordingly.
11
Step 9: Export from EFG to Europeana
The export of data from EFG to Europeana is implemented by CNR-ISTI conforming to the
rules defined in the Europeana Data Provider Handbook7. It is performed through the OAI-
Publishing Service. This service transforms the EFG records into ESE or EDM records, and
supports OAI harvesting on demand.By using ESE/EDM the EFG aggregator functions as a
kind of "filter" through which a certain set of data elements from the EFG Data Provider is
forwarded. For an example how records from EFG Providers are being delivered to
Europeana please refer to the “EFG Data Provider Handbook”.
Since the release of EDM (2013) a new EFG OAI Publisher has been designed to be more
efficient during the phase of provision of OAI records and to implement a capability for the
generation of new metadata formats.
As to the efficiency, the new OAI Publisher uses a dedicated index instance with an
optimized metadata structure. These changes bring two main benefits:
• Reduce the load on the main index of the EFG Aggregator.
• Allow a more efficient management of the OAI records.
Another important feature is the customization of the generation of new metadata formats.
This is achieved by giving an XSLT program that transform from the main format (in case of
EFG the main metadata format is ESE) to a new format.
The OAI Publisher will apply run-time this XSLT without creating a new version of the same
item in the index, thus reducing the space usage on the disk. The new EDM mapping has
been implemented relying on this approach.
Finally, a user interface (see picture hereafter) has been provided to help the user in
configuring and customizing the OAI publisher
(reachable at: http://node2.d.efg.research-infrastructures.eu:8280/efg-is/mvc/ui/oaiConfig.do)
7 URL to the Europeana Data Provider Handbook: http://www.version1.europeana.eu/web/guest/technical-requirements/
12
Roles and Responsibility in the Data Ingestion Proc ess
The Deutsches Filminstitut informs the DP once its data have been published in the
Europeana portal (http://www.europeana.eu). The DP should perform a check and report
to the Deutsches Filminstitut if he encounters any problems on his data.
During the ingestion process, the DP closely interacts with Deutsches Filminstitut which is
responsible for:
• the assistance of the new provider the ingestion process
• the writing of the mapping rules
• the quality check of the mappings and matchings in the internal EFG database (pre-
production environment).
The technology partner CNR-ISTI is responsible for:
• the ingestion of the provided data
• the export of the data from EFG to Europeana.
The DP is responsible for:
• the approval of the submission method, mapping and matching to be implemented.
• the setup of the submission infrastructure and methods on his site.
• the delivery of the data sets.
• the drafting of a vocabulary matching.
• the approval for the correct representation of his data sets in the EFG and Europeana
portal.