Upload
europeana
View
223
Download
1
Tags:
Embed Size (px)
Citation preview
Europeana’s and aggregators ingestion workflows
Cécile Devarenne Operations Officer
Aggregators Forum Workshop Den Haag, 18th May 2015
Content
• Europeana Aggregation workflow • Europeana Ingestion & data processing workflow
• Overview of tools • Data flows and tasks
• Workshop talk 1: common and specifics • Workshop talk 2: the good, the bad • Workshop talk 3: gathering ideas
Submission of data and publication cycles
• Operations officers work on a monthly cycle • Each month, data needs to be submitted by the 21st to
be included in the coming cycle and published by the 15/20th of the following month
• A dataset takes on average 40 mins to process • Around 200 datasets are processed by the Operations
officers for each cycle of publication • Datasets go through a full flow of operations before they
are production ready
Steps to get data ingested
• Data flows: • IMPORT • MAP/EDIT/TRANSFORM • VALIDATE • ENRICH • PUBLISH
IMPORT
• Manage/structure data for imported collections (CRM & UIM):
• Datasets entries created and monitored • Harvest data records grouped in datasets (Repox):
• OAI-PMH and http protocol; xml data • No incremental harvesting • Storage of data into Repox’s database
• Ongoing developments: • New version of Repox, which will be shared by
Europeana and The European Library • Repox’s harvested data stored in the Europeana Cloud
MAP/EDIT/TRANSFORM (Mint)
• Map and transform from source (ESE, EDM External) to target (EDM Internal) (Mint):
• User interface, drag and drop functionality • All mappings are stored • The last versions of the incoming and transformed
data are stored in Mint’s database • Clean data (Mint)
• User interface, functions are applied to the data • Statistics and preview functionality help for the quality
checks
MAP/EDIT/TRANSFORM (UIM)
• Itemize and create/manage Europeana identifiers for permalinks to your records in Europeana (UIM plugin)
• One record per ProvidedCHO • From http://data.theeuropeanlibrary.org/
BibliographicResource/3000118920655 to http://data.europeana.eu/item/9200338/BibliographicResource_3000118920655
VALIDATE
• Validate data automatically against the EDM Internal schema (Mint and UIM):
• XSD schema and schematron rules (mandatory elements, types of values)
• Only valid records are saved in UIM’s database: invalid records are discarded
• Checks on unique identifiers within a dataset: duplicate records are also discarded
ENRICH (semantic enrichment)
• Data enrichment against external datasets • Dereference (UIM plugin):
• generate additional contextual data in EDM from links to linked data exposed ontologies
• maintain mappings between the vocabularies to be dereferenced and EDM
• Enrich (UIM plugin): • generate additional contextual data (links to
external resources) from analysis of the provided data
• maintain the corpus of resources against which the EDM data is enriched; maintain enrichment rules
• Media links: • Cache thumbnails • Extract technical metadata from links to digital
objects • Extract hierarchies:
• Hierarchical data (several objects within a dataset related to one another to reflect a hierarchy, e.g. a book and its chapters) is processed through our hierarchical objects plugin
ENRICH
PUBLISH
• Deploy content monthly • New and updated data retrievable on europeana.eu
and API
• Do you agree with these broad categories and believe they also apply to your data workflow? Is it the case for all?
• Could you list tasks under each category illustrating your work? Example: creation of persistant identifiers, link checking
• Are these tasks all common to everyone at your table? • What are from your list the specific tasks that you do
not see represented at your table?
• 20 minutes
Workshop talk 1: common and specific
• What are the principal issues you encounter within your workflows?
• Are these issues shared with other partners at your table?
• What is the impact of the connections of workflows - data provider to aggregator to Europeana - on your data processing tasks?
• 20 minutes
Workshop talk 2: the good, the bad
• What kind of concrete changes from Europeana could improve your data processing work?
• What steps in your current workflow could you use help with? (validation, preview, …)
• Are there any tools you use already that you could recommend to everyone?
• All feedback and questions are welcome!
• 20 minutes
Workshop talk 3: gathering ideas
Guidance and help Europeana Professional:http://pro.europeana.eu/share-your-data/Content inbox – for all ingestion & metadata related matters [email protected]
Thank you!
Cécile Devarenne, Chiara Latronico, Marie-Claire Dangerfield, Pablo Uceda Gomez, Jeroen Cichy