15
Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June 2018 Utrecht, The Netherlands CLARIN 1

Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Metadatacuration:hands-onsession

CMDIandMetadataCurationTaskForces

CLARINCentre&Developersmeeting4-5June2018

Utrecht,TheNetherlands

CLARIN 1

Page 2: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Prerequisites

• Java1.8- https://java.com/en/download/

• Internetconnection

CLARIN 2

Page 3: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Menu

1. Viewyour recordsinthe VLO2. Viewyour harvest (and its log)3. Getyour records

1. From the tarball2. Harvest them

4. Curation module1. Lookatthe website2. Runlocally

5. CMDIbestpractices1. Checkyour profiles2. Checkyour records

6. Structural queries1. Loadyour records/validation reports into BaseX2. Some useful XQueries

7. Inspect the mapping8. VLO9. Fixingproblems,butwhere?10. What’s missing?

CLARIN 3

Page 4: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Viewyour recordsinthe VLO

• Filterthe recordsbased onyour endpoint:- _oaiEndpointURI:

• https://vlo.clarin.eu/search?q=_oaiEndpointURI:https://clarin-pl.eu/oai/request

- Endpoints?centres.clarin.eu/oai_pmh

• Filterthe recordsbased onaprofile:- _componentProfile:

• https://vlo.clarin.eu/search?q=_componentProfile:LINDAT_CLARIN• Note:use the profilenameinstead ofits ID!

CLARIN 4

Page 5: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Viewyour harvest (and its log)

• Not inproduction yet,butlocal preview- will replace https://vlo.clarin.eu/data/

• Paged lists• Filteronendpoints and/orrecords• Seethe logofaharvest

CLARIN 5

Page 6: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Getyour records

1. From the tarball1. https://vlo.clarin.eu/data/resultsets/2. tar xjf clarin.tar.bz2

results/cmdi/DANS_CMDI_Provider3. Note:just clicking the tarball might freeze your Mac!

2. Harvest them1. https://github.com/clarin-eric/oai-harvest-manager/releases2. Editproviderssectionofresources/config-test.xml3. run-harvester.sh workdir=`pwd`

resources/config-test.xml

CLARIN 6

Page 7: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Curation module

1. Lookatthe website1. https://clarin.oeaw.ac.at/curate/

2. Runlocally1. https://github.com/clarin-eric/clarin-curation-module2. curation.jar (goo.gl/Cx4h3N )3. Create your own specific copyofconfig.properties4. java -jar curation.jar -config

config.properties -c -path results/cmdi/ARCHE

CLARIN 7

Page 8: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

CMDIbestpractices

1. https://www.clarin.eu/content/cmdi-best-practice-guide2. Schematron rules (schematron.com)

1. https://github.com/TheLanguageArchive/SchemAnon/releases2. Also supported by oXygen orother XMLeditors3. Also easyto define your own rules

3. Checkyour profiles1. Identify the profiles you’re using

1. https://github.com/clarin-eric/FindProfiles/releases2. java -jar findProfiles.jar -e=xml

clarin/results/cmdi/The_Language_Archive/2. wget -O profile.xml

https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.x/profiles/clarin.eu:cr1:p_1505397653795/xml && java -jar SchemAnon.jarhttps://raw.githubusercontent.com/clarin-eric/cmdi-toolkit/develop/src/main/resources/toolkit/sch/cmd-component-best-practices.sch profile.xml

CLARIN 8

Page 9: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

CMDIbestpractices

4. Checkthe records1. java -jar SchemAnon.jar

https://raw.githubusercontent.com/clarin-eric/cmdi-toolkit/develop/src/main/resources/toolkit/sch/cmd-record-best-practices.schclarin/results/cmdi/The_Language_Archive/xml

2. Note:use the -s optionto savethe SVRLreport

5. Validate the records1. https://github.com/clarin-eric/cmdi-instance-validator/releases2. cmdi-validator results/cmdi/IMS_Repository/3. Note:use the -s optionto use another Schematron file

CLARIN 9

Page 10: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Structural queries

1. Loadyour records/validation reports into BaseX1. basex.org orbrew install basex

2. Create anewdatabaseand importyour records/reports

2. XQuery (w3.org/XML/Query)declare namespace cmd="http://www.clarin.eu/cmd/1";

declare namespace svrl="http://purl.oclc.org/dsdl/svrl";

…- goo.gl/CEZtTm

Notes1. You can use the namespace wildcard(*:element)to dealwith

(many)profilespecific namespaces2. You can use base-uri() to getthe filenameofamatchingrecord3. BaseX hasuseful modules,butalso FunctX (xqueryfunctions.com)4. Aproblem that occurs often might be acandidate for aSchematron

rule

CLARIN 10

Page 11: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Inspect the mapping

1. Identify the profiles you’re using1. https://github.com/clarin-eric/FindProfiles/releases2. java -jar findProfiles.jar -e=xml

clarin/results/cmdi/The_Language_Archive/

2. Inspect the mapping1. https://github.com/clarin-eric/VLO-mapping2. https://cmdi.clarin.eu/mapping/

CLARIN 11

Page 12: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

VLO

1. Curation VLO(to be updated)1. https://vlo.minerva.arz.oeaw.ac.at/vlo

2. Request an importinthe beta VLO1. [email protected]

3. Doalocal VLOimport1. https://gitlab.com/CLARIN-ERIC/compose_vlo#run-the-

importer-to-ingest-cmdi-metadata-into-the-vlo

4. Runthe importer onone record1. https://github.com/clarin-eric/VLO/blob/master/vlo-

importer/src/main/java/eu/clarin/cmdi/vlo/importer/MetadataMapper.java• CLASSPATH="vlo-importer-4.2-SNAPSHOT-importer.jar" javaeu.clarin.cmdi.vlo.importer.MetadataMapper -c VloConfig.xml -r test.xml

CLARIN 12

Page 13: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Fixingproblems,butwhere?

• Your records- Typos inyour records- Inconsistencies inyour records

• Consider adopting acommon(CLARIN/CLAVAS)vocabulary- Facetmapping problems

• Can you fixthem inyour profile(s)?• Orprovide feedbackto the MetadataCuration TF([email protected])

- Valuemapping problems• Provide feedbackto the MetadataCuration TF ([email protected])

• Others records- reportthem viathe VLOfeedbackbutton

CLARIN 13

Page 14: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

What’s missing?

• OAIViewer- history- include alocal curation run- general log- mailto technical contactwhen number ofharvested recordsdrop

• VLOimporter- reportshowing the mappings applied

• VLO- _componentProfileURI

• profilenamemight not be unique!- Centerfacet

• filterto all recordsfrom one center,possible multipleendpoints- showoriginal value

• More?

CLARIN 14

Page 15: Metadata curation: hands-on session - CLARIN · 6/5/2018  · Metadata curation: hands-on session CMDI and Metadata Curation Task Forces CLARIN Centre & Developers meeting 4-5 June

Questions

• MetadataCuration Taskforces- [email protected]

• CMDITaskforce- [email protected]

• CMDIfirstaid kit- clarin.eu/sites/default/files/CMDI-first-aid-kit.pdf

• MenzoWindhouwer- [email protected]

CLARIN 15