Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Metadatacuration:hands-onsession
CMDIandMetadataCurationTaskForces
CLARINCentre&Developersmeeting4-5June2018
Utrecht,TheNetherlands
CLARIN 1
Prerequisites
• Java1.8- https://java.com/en/download/
• Internetconnection
CLARIN 2
Menu
1. Viewyour recordsinthe VLO2. Viewyour harvest (and its log)3. Getyour records
1. From the tarball2. Harvest them
4. Curation module1. Lookatthe website2. Runlocally
5. CMDIbestpractices1. Checkyour profiles2. Checkyour records
6. Structural queries1. Loadyour records/validation reports into BaseX2. Some useful XQueries
7. Inspect the mapping8. VLO9. Fixingproblems,butwhere?10. What’s missing?
CLARIN 3
Viewyour recordsinthe VLO
• Filterthe recordsbased onyour endpoint:- _oaiEndpointURI:
• https://vlo.clarin.eu/search?q=_oaiEndpointURI:https://clarin-pl.eu/oai/request
- Endpoints?centres.clarin.eu/oai_pmh
• Filterthe recordsbased onaprofile:- _componentProfile:
• https://vlo.clarin.eu/search?q=_componentProfile:LINDAT_CLARIN• Note:use the profilenameinstead ofits ID!
CLARIN 4
Viewyour harvest (and its log)
• Not inproduction yet,butlocal preview- will replace https://vlo.clarin.eu/data/
• Paged lists• Filteronendpoints and/orrecords• Seethe logofaharvest
CLARIN 5
Getyour records
1. From the tarball1. https://vlo.clarin.eu/data/resultsets/2. tar xjf clarin.tar.bz2
results/cmdi/DANS_CMDI_Provider3. Note:just clicking the tarball might freeze your Mac!
2. Harvest them1. https://github.com/clarin-eric/oai-harvest-manager/releases2. Editproviderssectionofresources/config-test.xml3. run-harvester.sh workdir=`pwd`
resources/config-test.xml
CLARIN 6
Curation module
1. Lookatthe website1. https://clarin.oeaw.ac.at/curate/
2. Runlocally1. https://github.com/clarin-eric/clarin-curation-module2. curation.jar (goo.gl/Cx4h3N )3. Create your own specific copyofconfig.properties4. java -jar curation.jar -config
config.properties -c -path results/cmdi/ARCHE
CLARIN 7
CMDIbestpractices
1. https://www.clarin.eu/content/cmdi-best-practice-guide2. Schematron rules (schematron.com)
1. https://github.com/TheLanguageArchive/SchemAnon/releases2. Also supported by oXygen orother XMLeditors3. Also easyto define your own rules
3. Checkyour profiles1. Identify the profiles you’re using
1. https://github.com/clarin-eric/FindProfiles/releases2. java -jar findProfiles.jar -e=xml
clarin/results/cmdi/The_Language_Archive/2. wget -O profile.xml
https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.x/profiles/clarin.eu:cr1:p_1505397653795/xml && java -jar SchemAnon.jarhttps://raw.githubusercontent.com/clarin-eric/cmdi-toolkit/develop/src/main/resources/toolkit/sch/cmd-component-best-practices.sch profile.xml
CLARIN 8
CMDIbestpractices
4. Checkthe records1. java -jar SchemAnon.jar
https://raw.githubusercontent.com/clarin-eric/cmdi-toolkit/develop/src/main/resources/toolkit/sch/cmd-record-best-practices.schclarin/results/cmdi/The_Language_Archive/xml
2. Note:use the -s optionto savethe SVRLreport
5. Validate the records1. https://github.com/clarin-eric/cmdi-instance-validator/releases2. cmdi-validator results/cmdi/IMS_Repository/3. Note:use the -s optionto use another Schematron file
CLARIN 9
Structural queries
1. Loadyour records/validation reports into BaseX1. basex.org orbrew install basex
2. Create anewdatabaseand importyour records/reports
2. XQuery (w3.org/XML/Query)declare namespace cmd="http://www.clarin.eu/cmd/1";
declare namespace svrl="http://purl.oclc.org/dsdl/svrl";
…- goo.gl/CEZtTm
Notes1. You can use the namespace wildcard(*:element)to dealwith
(many)profilespecific namespaces2. You can use base-uri() to getthe filenameofamatchingrecord3. BaseX hasuseful modules,butalso FunctX (xqueryfunctions.com)4. Aproblem that occurs often might be acandidate for aSchematron
rule
CLARIN 10
Inspect the mapping
1. Identify the profiles you’re using1. https://github.com/clarin-eric/FindProfiles/releases2. java -jar findProfiles.jar -e=xml
clarin/results/cmdi/The_Language_Archive/
2. Inspect the mapping1. https://github.com/clarin-eric/VLO-mapping2. https://cmdi.clarin.eu/mapping/
CLARIN 11
VLO
1. Curation VLO(to be updated)1. https://vlo.minerva.arz.oeaw.ac.at/vlo
2. Request an importinthe beta VLO1. [email protected]
3. Doalocal VLOimport1. https://gitlab.com/CLARIN-ERIC/compose_vlo#run-the-
importer-to-ingest-cmdi-metadata-into-the-vlo
4. Runthe importer onone record1. https://github.com/clarin-eric/VLO/blob/master/vlo-
importer/src/main/java/eu/clarin/cmdi/vlo/importer/MetadataMapper.java• CLASSPATH="vlo-importer-4.2-SNAPSHOT-importer.jar" javaeu.clarin.cmdi.vlo.importer.MetadataMapper -c VloConfig.xml -r test.xml
CLARIN 12
Fixingproblems,butwhere?
• Your records- Typos inyour records- Inconsistencies inyour records
• Consider adopting acommon(CLARIN/CLAVAS)vocabulary- Facetmapping problems
• Can you fixthem inyour profile(s)?• Orprovide feedbackto the MetadataCuration TF([email protected])
- Valuemapping problems• Provide feedbackto the MetadataCuration TF ([email protected])
• Others records- reportthem viathe VLOfeedbackbutton
CLARIN 13
What’s missing?
• OAIViewer- history- include alocal curation run- general log- mailto technical contactwhen number ofharvested recordsdrop
• VLOimporter- reportshowing the mappings applied
• VLO- _componentProfileURI
• profilenamemight not be unique!- Centerfacet
• filterto all recordsfrom one center,possible multipleendpoints- showoriginal value
• More?
CLARIN 14
Questions
• MetadataCuration Taskforces- [email protected]
• CMDITaskforce- [email protected]
• CMDIfirstaid kit- clarin.eu/sites/default/files/CMDI-first-aid-kit.pdf
• MenzoWindhouwer- [email protected]
CLARIN 15