12
1 DELIVERABLE Project Acronym: EFG1914 Grant Agreement number: 297266 Project Title: EFG1914 D6.3: Report on quantity and type of items made available to EFG and Europeana Revision 1.0 Date of submission 24 March 2014 Author(s) Franco Zoppi (CNR-ISTI) Dissemination Level Public

EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

1

DELIVERABLE

Project Acronym: EFG1914

Grant Agreement number: 297266

Project Title: EFG1914

D6.3: Report on quantity and type of items made ava ilable to EFG

and Europeana

Revision 1.0 Date of submission 24 March 2014

Author(s) Franco Zoppi (CNR-ISTI)

Dissemination Level Public

Page 2: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

2

Project co-funded by the European Commission within the ICT Policy Support Programme

REVISION HISTORY AND STATEMENT OF ORIGINALITY

Revision History

Revision No. Date Author Organisation Description

1.0 20.03.2014 Franco Zoppi CNR-ISTI

1.0 24.03.2014 Julia Welter DIF Introduction

Statement of originality:

This deliverable contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both.

Page 3: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

3

TABLE OF CONTENT

1 EXECUTIVE SUMMARY......................................................................................... 4

2 QUANTITY OF ITEMS DELIVERED TO EFG AND EUROPEANA . ....................... 5

2.1 AV materials .................................................................................................................................................. 5

2.2 NonAV materials .......................................................................................................................................... 6

3 METADATA SUBMISSION AND INGESTION WORKFLOW IN EFG /EFG1914 ... 7

Page 4: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

4

1 Executive summary

This deliverable gives an overview of the type and quantity of content made available to EFG

and Europeana within the EFG1914 project. Aim of the EFG1914 project was to digitise and

give online access to 665 hours of film as well as to 5’600 non-Av items from and about the

First World War. The project started in February 2012 and it was foreseen that the digitised

content should be available completely in time for the project end in February 2014 and

therefore the start of the international commemoration of the 100th anniversary of World War

One.

The digitisation activities were successful and allowed to not only make 665 hours of content

available online, but 701 hours. An extra of 39 further hours – digitised outside the EFG1914

project – could also be aggregated and made available. In line with the goals specified in the

project’s Description of Work, EFG1914 was able to provide online access to the named

number of hours through www.europeanfilmgateway.eu. The 740 hours of film made

available equal 2’870 individual film titles (items). The majority of the data was delivered to

Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

technical problems not the complete content could be published on Europeana yet, but it will

be possible to solve this issue with the next Europeana harvesting interval in April 2014.

In addition to making the content available through EFG and Europeana, the content

digitised in EFG1914 is also available on Europeana’s World War One-dedicated sub-portal

www.europeana1914-1018.eu, where it is displayed alongside World War One-related

material from the EFG1914 sister projects Europeana Collections 1914-19181 as well as

Europeana1914-19182. Together with the content from national libraries and private

persons, the materials collected within EFG1914 form one of the larges online repositories

related to the First World War, making in an attractive resource for researchers as well as the

general public.

The following chapter will provide a tabular overview of the number of hours and number of

items made available to EFG and Europeana. Information on the individual collections made

available by the archives can be found in D1.3 “Final Digitisation Progress Report”3 as well

as in D5.3 “Type and quantity of non-AV material digitised”4. In chapter three, the ingestion

workflow will be described and aims to sum up he individual steps taken by the archives

involved as well as project coordinator DIF and technical partner CNR-ISTI.

1 http://www.europeana-collections-1914-1918.eu/ 2 http://www.europeana1914-1918.eu 3 http://project.efg1914.eu/wp-content/uploads/2014/03/EFG1914_D1.3_Digitisation_progress_report_to_the_Commission_final.pdf 4 http://project.efg1914.eu/wp-content/uploads/2014/03/EFG1914_D5.1_Type_quantity_nonAV_material_final.pdf

Page 5: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

5

2 Quantity of items delivered to EFG and Europeana

2.1 AV materials This table gives a detailed break-down of number of titles and number of hours delivered the

archives and ingested into EFG and Europeana by CNR-ISTI. All items listed below have the

item type VIDEO.

Partner Number of items

available on EFG

Number of

items

available

on

Europeana

Number

of hours

according

to DOW

Number

of hours

available

on EFG

Number of

hours

delivered

to

Europeana

Arhiva National de Filme

(ANF)

12 12 3,0 3,0 3,0

Cinetca di Bologna (CCB) 111 111 27,0 27,0 27,0

Cineteca Del Friuli (CDF) 32 32 11,00 8,8 8,8

Archives Francaises du Film

– Centre National de la

Cinematography (CNC)

141 141 29,00 30,2 30,2

Cinémathèque Royale de

Belgique (CRB)

178 178 60,0 60,4 60,4

Det Danske Filminstitut (DFI) 106 106 50,0 52,2 52,2

Deutsches Filminstitut (DIF) 173 0 38,0 39,5 0

Deutsche Kinemathek (DK) 41 41 40,0 39 39

Estonian Film Archives (EFA) 6 6 1,0 2,0 2,0

EYE Film Institute

Netherlands (EYE)

291 291 100,0 100,0 100,00

FondazioneCinetecaItaliana

(FCI)

77 77 20,0 21,6 21,6

Filmoteca Española (FE) 53 53 13,0 13,1 13,1

Filmarchiv Austria (FAA) 10 10 4,0 4,1 4,1

InstitutValencià de

l’Audiovisuali de la

Cinematografia (IVAC)

8 8 6,0 6,5 6,5

Imperial War Museums

(IWM)

1086 1086 190,0 196,5 196,5

JugoslovenskaKinoteka (JK) 63 63 30,0 30,5 30,5

Cinecittà Luce (LUCE) 25 25 15,0 15,0 15,0

Page 6: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

6

Hungarian National Digital

Archive (MANDA)

11 11 3,0 3,0 3,0

Národní filmový archiv (NFA) 17 17 2,5 3,7 3,7

Nasjonalbiblioteket (NNB) 66 66 15,0 21,9 21,9

ÖsterreichischesFilmmuseum

(OFM)

48 48 8,0 8,0 8,0

TOTAL EFG1914

Digitisation

2’591 2’382 665,5 686,0 646,5

Imperial War Museums 171 171 - 38,0 38,0

Bundesarchiv Filmarchiv 37 0 - 7,0 0

Museo Nazionale del Cinema 18 18 - 4,0 4,0

Landesfilmsammlung Baden-

Württemberg

19 19 - 2,0 2,0

National Library of Scotland 34 34 - 3,2 3,2

TOTAL ADDITIONAL WWI

CONTENT AGGREGATED

279 242 - 54,2 47,2

TOTAL CONTENT

AGGREGATED FOR EFG

2’870 2’624 665,5 740,2 693,7

2.2 NonAV materials This table gives a detailed break-down of number of non-AV items delivered the archives and

ingested into EFG and Europeana by CNR-ISTI. All items listed below have the item type

IMAGE or TEXT.

Archive Item type Items to

deliver according to DOW

Items digitised

Items delivered to EFG

Items delivered to Europeana

EYE Film posters 850 916 916 916 DIF Film stills, set

photos 800 845 361 0

DIF Film Posters 0 8 8 0 DIF Articles from film

journals 100 245 245 0

DIF Other text material like film programmes, manuscripts or advertising material

0 144 144 0

DIF Stereoscopic 0 98 98 0

Page 7: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

7

glass plates DFI Journal editions 150 161 161 161 FAA Articles from film

journals 100 100 0 0

IVAC Film programmes 20 2 2 2 NFA Photos 388 395 395 395 NFA Pages of

periodicals and other paper documents

1'800 116 116 116

EFA Photos 2’500 2’345 2’345 2’345

CCB Photos 500 500 500 500 DK Photos 200 209 209 209 DK Posters 40 99 99 99 DK Programmes 40 2 2 2 TOTAL 5’6885 6’185 5’601 4’745

As mentioned in the introduction, some of the content could not be harvested by Europeana

yet due to technical problems. It is expected that the remaining content will be harvested by

Europeana in April 2014.

3 Metadata submission and ingestion workflow in EFG/EFG1914

The following section provides an overview of how metadata are submitted and ingested into

EFG/EFG1914 (in the following, simply “EFG”) and Europeana. Specific guidelines and

manuals can be found in the “EFG Data Provider Handbook”6 developed in the EFG project.

These metadata are federated in the EFG Information Space which consists of the common

database and some software services. Archives submitting metadata to EFG deliver them via

XML exports or OAI-PMH harvesting. EFG only ingests and indexes institutions´ metadata

while digital objects remain on the website of the individual institutions. The collected or

“aggregated” metadata (data on digital resources, links to previews, filmographic and

biographic information) serve for several purposes. A reduced set of the collected metadata

is displayed in the portal (www.europeanfilmgateway.eu). Displayed are only information

about the digital objects themselves as well as names and titles related to them. Therefore,

the EFG metadata schema of the common database can store nearly all kinds of information

that can be found in film archival databases. Another main purpose is that the collected

metadata are converted and forwarded to the Europeana portal in a different format

(currently both “ESE - Europeana Semantic Elements” and “EDM - Europeana Data Model”).

Overview of the Data Ingestion Process 5The total number of 5’600 objects indicated in the DOW did not include the text objects from NFA as it was unclear to how many individual items the number of pages would translate. 6 http://www.efgproject.eu/downloads/EFG_DataProviderHandbook_final.zip

Page 8: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

8

The next figure provides an overview of the entire ingestion process. The workflow is

introduced with a synthetic schema whose steps evidence the actors, the type of data

provided and the possible format used as well as the support and tools supplied to

implement those activities. The different steps of the ingestion process are described under

the following figure and define not only the ingestion process adopted for the EFG partner

archives, but also the workflow for new contributors:

Step 1: Prepare Content Delivery

The Deutsches Filminstitut (DIF) and the Data Provider (CA) define the type of data and

metadata to be exported, the scheduling of the exports, and the overall procedure to be

followed.

Step 2: Define Metadata Mapping Rules

The DP and DIF prepare a document which specifies how each native metadata element

must be mapped into the EFG metadata schema. The mapping rules are passed to CNR-

ISTI who verifies them for a preliminary control of their conformance to the EFG metadata

schema. It is possible that the checking of the mapping rules requires their revision and

direct interaction between CA, DIF and CNR-ISTI. CNR-ISTI provides an import filter which

performs the actual conversion of the native metadata structure into the EFG metadata

schema. This process is subject to several cycles, in order to achieve the best quality of the

conversion process.

Step 3: Define Matching Rules

This process aims at defining the rules to convert the metadata values used by each DP into

the values accepted by EFG controlled vocabularies. The DP prepare the matching tables

which define how the values from the individual archive shall be transformed into the

controlled EFG vocabularies. Guidelines for the vocabulary matching work can be found in

the “EFG Data Provider Handbook”.

Page 9: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

9

START

Prepare Content Delivery

END

Data Quality Control

Define Metadata Mapping Rules

Define Matching Rules

Deliver Data and Metadata

Data Cleaning and Enrichment

Ingestion into EFG production Information Space

Check metadata records with the Content Checker

Modify records with the Metadata Editor

Step 1

Apply Vocabulary Checker to check vocabulary terms

Ingestion into the EFG pre-production Information Space

Export to Europeana

Apply Rotten Checker to check incorrect data values

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

The EFG Data Ingestion Process

Step 4: Deliver Data and Metadata in the Appropriat e Format

The DP delivers the exports either as XML files or through an OAI-PMH harvesting. CNR-

ISTI receives the data and metadata.

Page 10: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

10

Step 5: Data Ingestion into the EFG Pre-Production Information Space

CNR-ISTI implements the software modules corresponding to the mapping and matching

rules defined in step 3 and 4, then ingests the metadata into the EFG pre-production

Information Space. The ingestion requires that the import filter(s) for the specific metadata

is/(are) applied. It also requires that thumbnails for image and text items, if not provided, are

automatically generated from the source data.

Step 6: Perform Data Quality Control

The matching tables established in step 3 are passed to CNR-ISTI in order to initiate the

controlled vocabulary check on the ingested metadata values through a preliminary

automatic conversion of the metadata values according to the matching rules. Incorrect

records and incorrect element values can be viewed through the Vocabulary Checker Tool.

Once the ingestion has been completed, CNR-ISTI reports that the metadata are available in

the EFG pre-production Information Space. The new metadata can be viewed through a

curation tool specifically developed for EFG: the EFG Content Checker Tool. With the help of

this tool, DIF and the DP check if metadata were properly ingested into the EFG pre-

production Information Space. Problems are immediately reported to CNR-ISTI. In addition to

this, DP and DIF check the quality of the converted values with the Vocabulary Checker Tool.

This tool allows to verify all metadata records containing elements that do not conform to the

EFG controlled vocabularies. If an error is identified, DIF arranges the revision of the

Mapping or Matching Rules. In this case the procedure is iterated starting from step 2 or to

step 3.

Step 7: Data Cleaning and Enrichment

As soon as metadata mapping and cleaning is completed, the final metadata cleaning and

enrichment is performed manually by the contributing institution. The DP can use the

Metadata Editor to correct the metadata values and to enrich the metadata manually. Via a

newly created Vocabulary Editor Tool new vocabulary terms can be applied. Terms can be

increasingly added to a vocabulary, modified or deleted. Synonyms are managed as well. In

addition, the Cleaning Rules Editor then allows the definition of rules to be associated to an

XPath and applied by the system to implement the cleaning phase of the aggregation

workflow.

Step 8: Data Ingestion into the EFG Information Spa ce

When the data enrichment and cleaning process is completed, the DP approves the export of

its data from the pre-production Information Space into the production Information Space.

From this moment the data become visible in the EFG portal and can also be exported to

Europeana.

NOTE: the iteration of steps 2-4 and 5-8 is particularly critical in case of new DPs. It

should be noted that 8 “extra” DP (not planned in the original DoW) were integrated

during this second year. This implied a considerable extra effort to be spent

accordingly.

Page 11: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

11

Step 9: Export from EFG to Europeana

The export of data from EFG to Europeana is implemented by CNR-ISTI conforming to the

rules defined in the Europeana Data Provider Handbook7. It is performed through the OAI-

Publishing Service. This service transforms the EFG records into ESE or EDM records, and

supports OAI harvesting on demand.By using ESE/EDM the EFG aggregator functions as a

kind of "filter" through which a certain set of data elements from the EFG Data Provider is

forwarded. For an example how records from EFG Providers are being delivered to

Europeana please refer to the “EFG Data Provider Handbook”.

Since the release of EDM (2013) a new EFG OAI Publisher has been designed to be more

efficient during the phase of provision of OAI records and to implement a capability for the

generation of new metadata formats.

As to the efficiency, the new OAI Publisher uses a dedicated index instance with an

optimized metadata structure. These changes bring two main benefits:

• Reduce the load on the main index of the EFG Aggregator.

• Allow a more efficient management of the OAI records.

Another important feature is the customization of the generation of new metadata formats.

This is achieved by giving an XSLT program that transform from the main format (in case of

EFG the main metadata format is ESE) to a new format.

The OAI Publisher will apply run-time this XSLT without creating a new version of the same

item in the index, thus reducing the space usage on the disk. The new EDM mapping has

been implemented relying on this approach.

Finally, a user interface (see picture hereafter) has been provided to help the user in

configuring and customizing the OAI publisher

(reachable at: http://node2.d.efg.research-infrastructures.eu:8280/efg-is/mvc/ui/oaiConfig.do)

7 URL to the Europeana Data Provider Handbook: http://www.version1.europeana.eu/web/guest/technical-requirements/

Page 12: EFG1914 D6.3 Type quantity items delivered...The majority of the data was delivered to Europeana by the end of the project as well (693,7 hours equalling 2’624 titles). Due to some

12

Roles and Responsibility in the Data Ingestion Proc ess

The Deutsches Filminstitut informs the DP once its data have been published in the

Europeana portal (http://www.europeana.eu). The DP should perform a check and report

to the Deutsches Filminstitut if he encounters any problems on his data.

During the ingestion process, the DP closely interacts with Deutsches Filminstitut which is

responsible for:

• the assistance of the new provider the ingestion process

• the writing of the mapping rules

• the quality check of the mappings and matchings in the internal EFG database (pre-

production environment).

The technology partner CNR-ISTI is responsible for:

• the ingestion of the provided data

• the export of the data from EFG to Europeana.

The DP is responsible for:

• the approval of the submission method, mapping and matching to be implemented.

• the setup of the submission infrastructure and methods on his site.

• the delivery of the data sets.

• the drafting of a vocabulary matching.

• the approval for the correct representation of his data sets in the EFG and Europeana

portal.