Upload
evaminerva
View
283
Download
0
Embed Size (px)
Citation preview
The Mint Mapping tool The MoRe aggregator
Vassilis Tzouvaras, Dimitris Gavrilis
National Technical University of Athens
Digital Curation Unit - IMIS, Athena Research Center
LoCloud is funded by the European Commission's ICT Policy Support Programme
Cultural Heritage Content
• Diversity of cultural heritage content
– Numerous metadata schemas to annotate content (LIDO, CIDOC-CRM, EAD, METS )
• Massive digitization and annotation activities are in progress
• Need for interoperability
MINT Mapping Tool
• Provides users the ability to perform a mapping of their own metadata schemas to reference domain models
• Follows a typical web based architecture
• It was developed for ATHENA, but it is currently used for EUScreen, CARARE, Judaica, ECLAP, DCA and Linked Heritage
MINT 2 – What’s new?
• The backend was reconstructed for better performance
– File size for imports is extended
• The frontend was updated
– New interface
– Workflow is integrated in UI
– Facilitated browsing of input and target schema
MORe Overall Architecture
Registry
Apache Cassandra cluster
Fedora-commons
Temporary storage
Vocabulary services
Storage
JMS logging
Messaging
Core services
Enrichment service management
Entity matching / NLP
Geocoding / Historic Place names
RES
T
External enrichment services
Publish service management
OAI-PMH
RDF Store
Elastic Search
Archive
Cloud architecture
• De-centralized
• Scalable
• Four cloud environmets
– Storage
– Monitoring & logging
– Core services deployment
– Enrichment services deployment
Distributed
• Enrichment services run on:
– Austria
– Spain
– Greece
– Lithuania
– Slovenia
– Norway
• Scalability can be facilitated through a virtualization infrastructure
Workflow
OAI-PMH
LoCloud Collections
Wikimedia
MINT
Harvest
Ingest
Transform Enrich
Publish
OAI-PMH
Archive
RDF Store
SolR
Validate Index
Delete Reject
Omeka
Intermediate Schemas
Dublin Core
LIDO
CARARE
EAD
ESE
EDM
Dublin Core
LIDO
CARARE
EAD
ESE
EDM
OMEKA-XML
OGD
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Harvests content from metadata sources OAI-PMH repository MINT LoCloud Collections Wikimedia
Multiple schemas are supported OAI_DC CARARE CARARE 2.0 LIDO EAD EDM ESE
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Validates incoming information packages Executes validation schemes Validation micro-services
Structure Schema Linking Schematron rules
Flexible
How it is used in MoRe: Pre-validation Post-validation
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Ingest content into storage Uses storage layer API Pluggable drivers for attaching different technologies / repositories
Apache Cassandra Filesystem-based Fedora-commons
Versioning support Complex digital object support
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Content Model
Digital objects comprise data streams Each data stream can hold any kind of information
• XML/RDF, Image, Video, Documents, etc. Each different representation of an information object is stored as a different data stream
Each curation action generates a new version
• Transformation, Enrichment
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Transforms entire information packages into the Europeana Data Model (EDM), or any other schema Multiple transformation routines
Per schema Per project Per provider
User can attach rights statement
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
The generic enrichment service facilitates the execution of the enrichment micro-services
• Hides the complexity from the user by using enrichment plans
• Provides seamless integration with the UI of MORE
Virtual Enrichment driver
• Allows developers/creative industries to create their own enrichment services and declare/use them within MoRe
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Preview the XML record information for all datastreams
Preview the record in HTML (using the Europeana style sheet)
• Harvesting
• Validation
• Ingestion
• Transformation
• Enrichment
• Previewing
• Publishing
Core services
Publish transformed / enriched information
• Internal OAI-PMH provider
• XML export
• Publish directly to RDF repositories
• Sesame
• Virtuoso
• SolR index server
• Thematic – Thesauri collections – Vocabulary matching – Background links
• Spatial – Geo normalization – Geo coding – Reverse geo-coding – Historic place names
• Other
– Language identification
Enrichment micro-services
SKOS Thesauri
Geo-Names
DBPedia
Wikipedia
Enrichment Plan
• Enrichment micro-services are used within enrichment workflows:
– Enrichment plans
• Each enrichment plan applies to a specific schema
• Each enrichment plan executes enrichment micro-services in a specific order
Enrichment plans
Language identification
Vocabulary matching
Geo-normalization
Geo-coding
Enrichment Plan
• Each enrichment plan defines run-time parameters for specific services
– Content based
Enrichment plans
Language identification
Vocabulary matching
Geo-normalization
Geo-coding
Add subject collection A only if term X or Y
are matched