Upload
hugh-murphy
View
243
Download
0
Tags:
Embed Size (px)
Citation preview
The Mint Mapping toolThe MoRe aggregator
Vassilis Tzouvaras, Dimitris Gavrilis
National Technical University of AthensDigital Curation Unit - IMIS, Athena Research Center
LoCloud is funded by the European Commission's ICT Policy Support Programme
Cultural Heritage Content
• Diversity of cultural heritage content– Numerous metadata schemas to annotate content
(LIDO, CIDOC-CRM, EAD, METS ) • Massive digitization and annotation activities are in
progress• Need for interoperability
MINT Mapping Tool
• Provides users the ability to perform a mapping of their own metadata schemas to reference domain models
• Follows a typical web based architecture• It was developed for ATHENA, but it is currently used
for EUScreen, CARARE, Judaica, ECLAP, DCA and Linked Heritage
MINT 2 – What’s new?
• The backend was reconstructed for better performance– File size for imports is extended
• The frontend was updated– New interface– Workflow is integrated in UI– Facilitated browsing of input and target schema
MORe Overall Architecture
Registry
Apache Cassandra cluster
Fedora-commons
Temporary storage
Vocabulary services
Storage
JMS logging
Messaging
Core services
Enrichment service management
Entity matching / NLP
Geocoding / Historic Place names
REST
External enrichment services
Publish service management OAI-PMH
RDF Store
Elastic Search
Archive
Cloud architecture
• De-centralized• Scalable• Four cloud environmets– Storage– Monitoring & logging– Core services deployment– Enrichment services deployment
Distributed
• Enrichment services run on:– Austria– Spain– Greece– Lithuania– Slovenia– Norway
• Scalability can be facilitated through a virtualization infrastructure
Workflow
OAI-PMH
LoCloud Collections
Wikimedia
MINT
Harvest
Ingest
Transform Enrich
Publish
OAI-PMH
Archive
RDF Store
SolR
Validate Index
Delete Reject
Omeka
Intermediate Schemas
Dublin Core
LIDO
CARARE
EAD
ESE
EDM
Dublin Core
LIDO
CARARE
EAD
ESE
EDM
OMEKA-XML
OGD
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Harvests content from metadata sourcesOAI-PMH repositoryMINTLoCloud CollectionsWikimedia
Multiple schemas are supportedOAI_DCCARARECARARE 2.0LIDOEADEDMESE
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Validates incoming information packagesExecutes validation schemesValidation micro-services
StructureSchemaLinkingSchematron rules
Flexible
How it is used in MoRe:Pre-validation Post-validation
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Ingest content into storageUses storage layer APIPluggable drivers for attaching different technologies / repositories
Apache CassandraFilesystem-basedFedora-commons
Versioning supportComplex digital object support
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Content Model
Digital objects comprise data streams
Each data stream can hold any kind of information• XML/RDF, Image, Video, Documents, etc.
Each different representation of an information object is stored as a different data stream
Each curation action generates a new version• Transformation, Enrichment
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Transforms entire information packages into the Europeana Data Model (EDM), or any other schema
Multiple transformation routinesPer schemaPer projectPer provider
User can attach rights statement
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
The generic enrichment service facilitates the execution of the enrichment micro-services
• Hides the complexity from the user by using enrichment plans
• Provides seamless integration with the UI of MORE
Virtual Enrichment driver• Allows developers/creative industries to create
their own enrichment services and declare/use them within MoRe
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Preview the XML record information for all datastreams
Preview the record in HTML (using the Europeana style sheet)
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Publish transformed / enriched information• Internal OAI-PMH provider• XML export • Publish directly to RDF repositories
• Sesame• Virtuoso
• SolR index server
• Thematic– Thesauri collections– Vocabulary matching– Background links
• Spatial– Geo normalization– Geo coding– Reverse geo-coding– Historic place names
• Other– Language identification
Enrichment micro-services
SKOS Thesauri
Geo-Names
DBPedia
Wikipedia
Enrichment Plan
• Enrichment micro-services are used within enrichment workflows: – Enrichment plans
• Each enrichment plan applies to a specific schema
• Each enrichment plan executes enrichment micro-services in a specific order
Enrichment plans
Language identification
Vocabulary matching
Geo-normalization
Geo-coding
Enrichment Plan
• Each enrichment plan defines run-time parameters for specific services– Content based
Enrichment plans
Language identification
Vocabulary matching
Geo-normalization
Geo-coding
Add subject collection A only if term X or Y
are matched
Dashboard
Packages organization
Package overview
Package lifecycle overview
Preview
Metadata completeness & statistics
Enrichment services overview
Direct access to 27 thesauriCreate & (re)use subject collections