Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Public Page0of54
Interoperability Standards and Specifications
Report December11,2016
DeliverableCode:D5.2
Version:1.2Disseminationlevel:Public
First version of the interoperability standards and specification report that guidesinteroperabilityconsiderationswithinandbeyondtheOpenMinTeDproject.
H2020-EINFRA-2014-2015/H2020-EINFRA-2014-2Topic:EINFRA-1-2014Managing,preservingandcomputingwithbigresearchdataResearch&InnovationactionGrantAgreement654021
InteroperabilityStandardsandSpecificationsReport
•••
Public Page1of54
Document Description D5.2–InteroperabilityStandardsandSpecificationsReport
WP5–InteroperabilityFramework WPparticipatingorganisations:UKP-TUDA,ARC,UNIMAN,INRA,OU,USFD,UvA,UoG,GESIS
ContractualDeliveryDate:07/2016 ActualDeliveryDate:12/2016
Nature:Report Version:1.2PublicDeliverable
Preparation slip Name Organisation Date
From RichardEckartdeCastilhoMouhamadouBaPennyLabropoulouThomasMargoniGiuliaDoreWimPetersMatthewShardlowPiotrPrzybylaJacobCarter
UKP-TUDAINRAARCUoGUoGUSFDUNIMANUNIMANUNIMAN
28/07/2016
Editedby RichardEckartdeCastilho UKP-TUDA 11/12/2016Reviewedby RobertBossy
VangelisFlorosJohnMcNaught
INRAGRNETUNIMAN
25/10/2016
Approvedby NataliaManola ARC 11/12/2016Fordelivery MikeHatzopoulos ARC 13/12/2016
Document change record Issue Item ReasonforChange Author/Editor Organisation
V0.1 Draftversion Initialversionsentforcomments RichardEckartdeCastilho
UKP-TUDA
V0.2 Draftversion Incorporatedcommentsfromreviewers RichardEckartdeCastilho
UKP-TUDA
V1.0 Firstversion Finalizingdocument RichardEckartdeCastilho
UKP-TUDA
V1.1 Revisedversion
Incorporatingfeedbackfromtheapprovinginstance
RichardEckartdeCastilho
UKP-TUDA
V1.2 Revisedversion
Incorporatingfeedbackfromtheapprovinginstance
RichardEckartdeCastilho
UKP-TUDA
InteroperabilityStandardsandSpecificationsReport
•••
Public Page2of54
1. Table of Contents
1. INTRODUCTION.........................................................................................................................................8
1.1 METHODOLOGY..........................................................................................................................................8 1.2 WORKINGGROUPS......................................................................................................................................8
2. SUMMARYREPORTS.................................................................................................................................9
2.1 WG1–RESOURCEMETADATA.......................................................................................................................9 2.2 WG2–KNOWLEDGERESOURCES..................................................................................................................12 2.3 WG3–IPRANDLICENSING.........................................................................................................................13 2.4 WG4–ANNOTATIONANDWORKFLOWS........................................................................................................16 2.5 PUBLICATIONS..........................................................................................................................................17 2.6 EXTERNALEXPERTS....................................................................................................................................19
3. SCENARIOS..............................................................................................................................................20
4. REQUIREMENTS.......................................................................................................................................21
4.1 REQUIREMENTSTRUCTURE..........................................................................................................................21 4.2 REQUIREMENTSOVERVIEW..........................................................................................................................22
5. COMPLIANCE...........................................................................................................................................27
5.1 COMPLIANCELEVELS...................................................................................................................................27 5.2 COMPLIANCEASSESSMENTS.........................................................................................................................27
6. ACTIONS..................................................................................................................................................29
6.1 WG1.....................................................................................................................................................29 6.2 WG2.....................................................................................................................................................31 6.3 WG3.....................................................................................................................................................32 6.4 WG4.....................................................................................................................................................33
7. LISTOFATTACHMENTS............................................................................................................................36
8. APPENDIX................................................................................................................................................37
8.1 OPENMINTEDCOMPONENTCLASSIFICATION(DRAFT)......................................................................................37 8.2 WG1-INVENTORYOFMETADATASCHEMASANDRELATEDEFFORTS.....................................................................41 8.3 WG3–COMPATIBILITYMATRIX:SUMMARY...................................................................................................46
InteroperabilityStandardsandSpecificationsReport
•••
Public Page3of54
2. Table of Tables Table 1 - Requirements in status "draft" ........................................................................................................................................................... 23 Table 2 - Requirements in status "final" ............................................................................................................................................................ 24 Table 3 – Requirements in status “deprecated” ............................................................................................................................................... 26 Table 4 - Assessed products and consulted sources ........................................................................................................................................ 28 Table 5 – WG 1 summary of actions to improve compliance ...................................................................................................................... 30 Table 6 – WG 2 summary of actions to improve compliance ...................................................................................................................... 32 Table 7 – WG 3 summary of actions to improve compliance ...................................................................................................................... 33 Table 8 – WG 4 summary of actions to improve compliance ...................................................................................................................... 34 Table 9 - Compatibility Matrix (draft version 1.0): Contents ...................................................................................................................... 50 Table 10 - Compatibility Matrix (draft version 1.0): Software ................................................................................................................... 51 Table 11 - Compatibility Matrix (draft version 1.0): Terms of Service ...................................................................................................... 51 Table 12 - Compatibility Matrix (draft version 2.0): Concent ..................................................................................................................... 52 Table 13 - Compatibility Matrix (draft version 2.0): Software ................................................................................................................... 53 Table 14 – Compatibility Matrix (draft version 2.0): Terms of Service ..................................................................................................... 53
InteroperabilityStandardsandSpecificationsReport
•••
Public Page4of54
Disclaimer Thisdocumentcontainsdescriptionof theOpenMinTeDproject findings,workandproducts.Certainpartsof itmightbeunderpartnerIntellectualPropertyRight(IPR)rulesso,priortousingitscontentpleasecontacttheconsortiumheadforapproval.In case you believe that this document harms in any way IPR held by you as a person or as arepresentativeofanentity,pleasedonotifyusimmediately.Theauthorsofthisdocumenthavetakenanyavailablemeasureinorderforitscontenttobeaccurate,consistentandlawful.However,neithertheprojectconsortiumasawholenortheindividualpartnersthatimplicitlyorexplicitlyparticipatedinthecreationandpublicationofthisdocumentholdanysortofresponsibilitythatmightoccurasaresultofusingitscontent.This publication has beenproducedwith the assistance of the EuropeanUnion. The content of thispublicationisthesoleresponsibilityoftheOpenMinTeDconsortiumandcaninnowaybetakentoreflecttheviewsoftheEuropeanUnion.TheEuropeanUnionisestablishedinaccordancewiththeTreatyonEuropeanUnion(Maastricht).Therearecurrently28MemberStatesoftheUnion.ItisbasedontheEuropeanCommunitiesandthememberstatescooperationinthefieldsofCommonForeignandSecurityPolicyand JusticeandHomeAffairs. The fivemaininstitutionsoftheEuropeanUnionaretheEuropeanParliament,theCouncilofMinisters,theEuropeanCommission,theCourtofJusticeandtheCourtofAuditors.(http://europa.eu.int/)
OpenMinTeDisaprojectfundedbytheEuropeanUnion(GrantAgreementNo654021).
InteroperabilityStandardsandSpecificationsReport
•••
Public Page5of54
Acronyms ARC AthenaResearchCenter;seeILSPCC CreativeCommons
(ttps://creativecommons.org)CCR CLARINConceptRegistry
(https://www.clarin.eu/ccr)CLARIN CommonLanguageResourcesandTechnologyInfrastructure
(https://www.clarin.eu)CM CompatibilityMatrixD(number) (Project)deliverableELRA EuropeanLanguageResourcesAssociation
(http://www.elra.info)FOSS Freeandopen-sourcesoftware(https://en.wikipedia.org/wiki/Free_and_open-
source_software)INRA FrenchNationalInstituteforAgriculturalResearchILSP InstituteforLanguageandSpeechProcessing(ILSP/"Athena"R.C.)akaARC,
GreeceJATS JournalArticleTagSuite
(https://jats.nlm.nih.gov)KR KnowledgeresourceNACTEM NationalCentreforTextMining,UniversityofManchester,UKNIST NationalInstituteofStandardsandTechnology,USA
(https://www.nist.gov)NLP NaturalLanguageProcessingM(number) MonthcountingfromprojectstartMS(number) (Project)milestoneODRL OpenDigitalRightsLanguage
(https://www.w3.org/community/odrl/)OLIA OntologiesofLinguisticAnnotation
(http://www.acoli.informatik.uni-frankfurt.de/resources/olia/)OWL WebOntologyLanguage
(https://en.wikipedia.org/wiki/Web_Ontology_Language)LAPPSGrid LanguageApplicationGrid
(http://www.lappsgrid.org)
InteroperabilityStandardsandSpecificationsReport
•••
Public Page6of54
LDC LinguisticDataConsortium(https://www.ldc.upenn.edu)
LR LanguageResourceLT LanguageTechnologyRDF ResourceDescriptionFramework
(https://en.wikipedia.org/wiki/Resource_Description_Framework)SKOS SKOS-SimpleKnowledgeOrganisationSystem
(https://en.wikipedia.org/wiki/Simple_Knowledge_Organisation_System)TDM TextandDataMiningTheSOZ ThesaurusfortheSocialSciences
(http://lod.gesis.org/thesoz/de.html)UIMA UnstructuredInformationManagementArchitecture;usuallyreferringtothe
referenceimplementationApacheUIMA(https://uima.apache.org)UNIMAN UniversityofManchester,UKUKP-TUDA UbiquitousKnowledgeProcessing(UKP)Lab,TechnischeUniversitätDarmstadt,
GermanyUoG UniversityofGlasgow,UKUSFD UniversityofSheffield,UKWG WorkinggroupWP WorkpackageXSD XMLSchemaDefinition(https://en.wikipedia.org/wiki/XML_Schema_(W3C))
InteroperabilityStandardsandSpecificationsReport
•••
Public Page7of54
Publishable Summary The goal of the Interoperability Standards and Specifications report is to assess and improveinteroperabilitybetweenrelevantproductsfromtheTDMandNLPdomains,inparticularthoseinvolvedand associatedwith theOpenMinTeD project. The process underlying the document is designed tocloselyinvolveinternalandexternalstakeholdersinthedefinitionofrequirementsnecessarytoachievebetter interoperability, with the aim also of committing these stakeholders to actually perform thenecessaryadjustmentstotheirrespectivesystems.Thisdocumentisthefirstinaseriesofthree.ItwillbeupdatedinM20(D5.3)andM26(D5.4).Thisreportfocussesonpresentingahigh-leveloverviewoftheprogressachievedwithinthereportingperiodandonactionsplannedforthenextperiod.Theactualworkdocumentsreleasedduringthereportingperiodareprovidedasattachmentstothisdeliverable.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page8of54
1. Introduction Inthissection,webrieflyrevisitthemethodologybywhichWP5.2operatesandtheconstitutionoftheinteroperabilityworkinggroups.
1.1 Methodology In themilestone documentMS5 “Working groups external experts list andworkmethodology”,weoutlined themethodology for building the interoperability specification. SinceMS5 is presently onlyavailableontheproject-internalwiki,werepeatthekeyconceptshere:
1. Scenariodefinition–TheWGshavecreatedasetof17interoperabilityscenariosthathighlightdifferentaspectsofinteroperabilityfromtheperspectiveoftheindividualWGs(Section3).
2. Analysis–ThediscussionofthesescenariostogetherwithexternalexpertswasthefocusofthefirstOpenMinTeDInteroperabilityWorkshop(cf.MS10“WorkinggroupsinterimmeetingreportI”).Furthermore,thescenarioshavebeenanalysed intheWGstogeneratetherequirementspresentedinSection4.2.
3. Prototype – Several efforts are being undertaken to assess the feasibility and effort ofimplementingtheproposedrequirementsandtoprovide further insight for their refinement.These efforts are listed in the summary reports (Section 2) of the respective WGs in thesubsection“Progressincurrentperiod”.
4. Evaluation–Inordertoevaluatetheproposedrequirements,weidentifiedrelevantproducts(e.g.TDMframeworksoftheinvolvedprojectpartners)andassessedtheircompliancewithourrequirements. Based on the evaluation, we generated a set of actions designed to improvecompliance with the interoperability requirements. These actions are meant to serve as aroadmapforthenextreportingperiod.MoredetailsonthecomplianceassessmentareprovidedinSection4.3.
5. Specification – The requirements specification is a living document, continually updated asrequirementsarecreated,refined,anddeprecated.Section4.1providesadditionalinformationabouttherequirementlifecycle.
Theprocesswasdesignedtoensuretheparticipationofallstakeholders.Italsopaysattentionontightlyinvolvingthosestakeholdersthatlatermayneedtoadjusttheirproductsinordertobecomecompliantwithourrequirements.
1.2 Working groups Fourworkinggroups(WG)consistingofprojectmembersandexternalexpertsarecontributingtotheOpenMinTeDInteroperabilityStandardsandSpecificationseriesofdeliverables.TheseWGsare:
• WG1–Resourcemetadata• WG2–Knowledgeresources• WG3–IPRandlicensing• WG4–Annotationworkflows
InteroperabilityStandardsandSpecificationsReport
•••
Public Page9of54
2. Summary Reports In this section, we provide short summary reports for each of the interoperability working groupscoveringthefollowingaspects:
• Missionstatement–shortupdatedsummaryoftheworkinggroup’smissionstatement• Modeofoperation–eachoftheworkinggroupsoptedforslightlydifferentmodesofoperations
duetotheheterogeneousscientificbackgroundsandworkinghabits• Progressincurrentperiod–shortsummaryoftheprogressandachievementsfromthecurrent
reportingperiod• Tasksplannedfornextperiod–summaryroadmapofthetasksplannedforthenextreporting
period• Stateofoperations–shortself-assessmentofthecurrentstateofoperations
Theinteroperabilityworkinggroupsare:• WG1–Resourcemetadata• WG2–Knowledgeresources• WG3–IPRandlicensing• WG4–Annotationandworkflows
2.1 WG1 – Resource metadata
2.1.1 Mission statement ThefocusofWG1liesonthemetadatarequiredfordescribingresourcestargetedbytheOpenMinTeDprojectinordertoensuretheirdiscoverabilityandachieveinteroperabilitybetweenthem.TDMinvolvesawiderangeofresourcetypes:theresourcestobemined(scholarlypublicationsintheproject),thetextmining/languageprocessingsoftwareperseandancillaryknowledgeresourcesusedforitsoperation(e.g.annotationschemas,linguistictagsets,ontologicalresourcesusedforannotatingtheresourcestobemined,annotatedtextualcorpora).Todescribe these resourcesa core setofmetadataelementscanbeused tocapture their commonproperties(e.g.administrativeinformation,suchascontactdetailsandidentificationdata),whilevarioussetsofelementsencodetheparticularpropertiestheydisplay(e.g.sizeandformatforcontentresourcesvs. input specifications for components). Since processing activities involve the interaction of theseresources,asubsetoftheresources'propertiesneedtobedescribedwiththesamevocabulary(e.g.thelanguageofthecontentsofapublicationandthelanguageatoolorservicecanprocess,orthedomainofathesaurusandthedomainofapublication).Thedefinitionandharmonisationofthesemetadataelements is themain objective ofWG1. This endeavour is further hampered by the fact that theseresourcesaretheobjectofworkforexpertscomingfromdifferentdisciplines,withdifferenttheoreticalbackgroundsandconceptualisationoftheirwork,oftenusingdifferenttermsforthesameorsimilarconcepts. The clarification of these concepts and their semanticmapping as ameans to establish a"common"vocabularyposesachallengeforWG1.InteroperabilityforWG1is,therefore,soughtattwolevels:
• perresourcetype-i.e.mappingmetadataelementsusedbydifferentschemastodescribethesameproperty,
InteroperabilityStandardsandSpecificationsReport
•••
Public Page10of54
• acrossresourcetypes-i.e.ensuringthatthesamemetadataelementsareusedtodescribetheirintersectingfeatures.
2.1.2 Mode of operation WG1 brings together experts from the different communities involved in the project, combiningexpertise invarious fields:publishers,aggregatorsofscholarlypublications, infrastructurespecialists,developers of language processing and/or text mining services, experts in the creation and/orrepresentationoflanguageandknowledgeresources,metadataspecialists,legalexpertsetc.Thegroupholdsregularteleconferencecallswhereinternalandexternalexpertsareinvited;dependingonthetopicofthediscussion,theattendancevarieswithanucleusoftheexpertsalwayspresentandafurthersetjoiningwhenthediscussionrelatestotheirparticularexpertise.RepresentativesfromtheotherOpenMinTeDWPs(e.g.onusecases)alsoattendthemeetingswheninlinewiththeirinterests.Extra-regularmeetingsdedicatedtospecific issues(e.g.metadataschemasofpublications)havealsobeenheld.Inaddition,closecollaborationisalsosoughtwiththeotherthreeworkinggroupstoensurethat their requirements as regards metadata encoding are properly met; attendance of theirteleconferencecallsandworkingdocumentsprovidestheappropriateinput.Thegrouphascreatedaninventoryofmetadataschemasandrelatedefforts(takingintoaccounttheDeliverable D5.1 – Interoperability Landscaping Report) that present more interest for the WG1objectives-cf.Section8.2.ThediscussionsofthegrouphavefocusedonthecontentsoftheseschemasandontheinteroperabilityrequirementsthatwereextractedfromWP5.2scenarios(cf.Section4).Finally,adocumentintheformofaworkingreport,iscollectivelydrafted.
2.1.3 Progress in current period WG1hasproducedthefollowing:
• asetof21requirementsforinteroperability1ofthemetadatadescriptionsbetweenthevariousresourcetypes.8additionalrequirementsweregeneratedbut leftforreconsiderationforthenextreportingperiod;
• aselectionofresources,thatwillbedirectlyinvolvedintheprojectgiventhattheybelongtothepartnersoridentifiedbythemasstandardforourpurposes,hasbeenassessedforcompliancetotherequirements:
o OpenAIRE2,CORE3andFrontiersschemasfordescribingpublications;o TheSOZ4,AGROVOC5,JATS6,OLIA7andLAPPSGrid8asknowledgeresources;o asetofstandardlicences(e.g.CC,FOSSlicences);
1https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-spec.html#WG12https://www.openaire.eu3https://core.ac.uk4http://lod.gesis.org/thesoz/en.html5http://aims.fao.org/vest-registry/vocabularies/agrovoc-multilingual-agricultural-thesaurus6https://jats.nlm.nih.gov7http://www.acoli.informatik.uni-frankfurt.de/resources/olia/8http://www.lappsgrid.org
InteroperabilityStandardsandSpecificationsReport
•••
Public Page11of54
o AlvisNLP1,Argo2,DKProCore3andILSPsoftwarecomponents4;• an inventoryofmetadata schemas, vocabularies andontologies used for thedescriptionof
theseresourcetypes(cf.Section8.2)• discussionstakingasabasethemainmetadataschemasusedbytheconsortiumpartners,i.e.
META-SHAREforcorpora,knowledgeresourcesandsoftwarecomponents,andOpenAIREandCORE for publications; these, together with the WG1 interoperability requirements, thedescriptionofcomponentsperformedbyWG4andtherequirementsofWG2,havebeenusedasthebasisfortheReferenceMetadataSchemaofOpenMinTeD;
• aproposalfortheclassificationofsoftwarecomponentsusedinTDMprocesses(Section8.1);• thefirstversionoftheOpenMinTeDReferenceMetadataSchemainXSD5;theschemacovers
inaharmonizedwayalltheresourcetypesofOpenMinTeDandcatersfortheirsatelliteentities;theschemaiscurrentlyunderreviewbythemembersofallWGsandplannedtobeusedfortheregistryfirstversion.
• Publication (jointly with WG3): Legal Interoperability Issues in the Framework of theOpenMinTeDProject:AMethodologicalOverview(fordetailsonthepublicationandalink,seeSection2.5)
2.1.4 Tasks planned for next period WG1willcontinuetoworkalongtheactionlinesthatithasalreadyinitiated:
• finalisetheinteroperabilityrequirements:asetofrequirementshasalreadybeenidentifiedbutthediscussionshowedthattheyarenotmatureyettoformulatewithaunanimousconsent;weexpect also that analysis of other sources (e.g.WP4 requirements, use of the schema in theregistry)willgeneratemorerequirements;
• updatetheinventoryofschemasandvocabularies,asrequired;• continuethecomplianceassessmentsandrecommendwaysofimprovingmetadatadescriptions
basedontheiroutcomes;• evaluatethereferencemetadataschemabycreatingdescriptionsfromscratchorbymappings
fromthecurrentlyusedschemasfortheresourcesoftheconsortiumtopopulatetheregistryandimproveitaccordingly;
• document theschema inauser-friendlywayand formulate recommendationsandguidelinesthatwillpromoteinteroperabilityasidentifiedintherequirements;
• workonmappingsandwalkthroughswiththemostpopularmetadataschemas,accordingtotheprinciplessetbyWG2.
2.1.5 State of operations WG1hasestablishedaverygoodlevelofoperationandisonschedule.
1http://www.quaero.org/module_technologique/alvis-nlp-alvis-natural-language-processing/2http://argo.nactem.ac.uk3https://dkpro.github.io/dkpro-core/4http://nlp.ilsp.gr/ws/5https://openminted.github.io/openminted-site/releases/omtd-share/1.0.0/html/index.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page12of54
2.2 WG2 – Knowledge resources
2.2.1 Mission statement WorkingGroup2targetstheinteroperabilityofknowledgeresources.Knowledgeisspecificinformationthat is relevant for the linguistic and conceptual interpretation of text and the content exchangebetweenTDMmodules.ThisinformationiseitherexploitedorproducedbyTDMmodulesandtools.Thedefinitionencompassesavarietyofresourcesubtypes:
• Languageresourcessuchasannotatedtextcorpora;• Ancillary resources for conceptual/linguistic interpretationwithin the TDMworkflow such as
lexicons,termbanks,ontologies,thesaurianddictionaries.• Processing resources that produce knowledge such as textmining tools/services like part of
speechtags,dependencyrelations,vocabularylookupandstatisticalclassifiers.
WG2aimsto:• Tackle semantic interoperability issues when integrating knowledge resources (linguistic,
terminologicalandontologicalresources)withTDMworkflows.Theseissuesarisea)whenthesamedomainconceptmaybedefined indifferentways inknowledgeresources,b)whentheconceptismodelleddifferentlywithinaTDMcomponentandarelatedknowledgeresource.Oneexample would be two ontologies on chemical compounds specifying these compounds atdifferentlevelofdetailandusingdifferentcontrolledvocabularies.Anotherexamplewouldbethatthepart-of-speechtagnecessarytodisambiguateawordduringadictionarylookupisusingadifferentsetoftagsinthedictionarythanitisproducedbyanautomaticpart-of-speechtagger.
• Define a specification for the representation of knowledge resource types such as lexicons,terminologicalsources,thesauri,ontologies,annotatedcorporaandtooloutputs.
• Establishinteroperabilityacrossdifferentresources/toolsforthepurposeoftheirexploitationbytextanddatamining(TDM)applications.
Thefocusofthisgroupisonensuringthediscoverability,interoperabilityandconsistencyoflinguistic,terminological and ontological content at the granular representation level of individual knowledgeelements.Thisknowledgeiseithercontainedwithinresourcesorproducedbylanguageprocessingandtextminingtools.Itsinteroperabilitywillfostercommonunderstanding,datasharingandreuse.Forthispurpose,thegroupwillestablishanetworkof(defacto)standardreferencevocabulariesfortherepresentationand linkingof informationelements required for interoperable textconsumptionandprocessing.
2.2.2 Mode of operation The group consists of a number of internal experts, a subset of whom attends regular monthlyteleconferences.Theworkthatisdiscussedanddistributedovertheparticipatingpartnersisdonebymeansofthecollaborativedraftingofadocumentwhosestructurereflectsthatofthepresentreportand its future iterations.We also use collaborativelymaintained spreadsheets for the collection ofknowledgeresourceschemas,linkinginformation,andrequirementformulation.Externalexpertsaremostlyconsultedonapersonalbasis,andtargetedfortheirparticularexpertise.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page13of54
2.2.3 Progress in current period WG2hasworkedontheinteroperabilityofKnowledgeResources(KRs):resourcescontaining,producingorrepresentingknowledge.Afirstsetofschemashasbeenselected,whichcanbedividedupintoseveralsubtypes:
• SchemasassociatedwithOpenMinTeDcontributedbyOpenMinTeDpartners(i.e.UIMA-basedplatforms,ALVIS,andGATE)
• Strategical, i.e. widely used and interconnected (de facto) standard vocabularies forlinguistic/terminological/ontologicalmetadata
• Representativesetofusecasedrivenschemas• Publication:TacklingResourceInteroperability:Principles,StrategiesandModels(fordetailson
thepublicationandalink,seeSection2.5)
InteroperabilitybetweentheseschemasisintheprocessofbeingdefinedintermsofanumberoflinkingrelationsbasedonOWL1/RDF2andSKOS3relations.Once completed, the linked vocabularies form a reference network of linguistic, terminological andontologicalconceptualelementsthatformsthecorevocabularyforTDMinformationexchange.Requirements forKRoperationalisationwithinOpenMinTeDhavebeen formulated.Eightof theKRslistedabovehavebeencheckedwithrespecttotheircompliancewiththerequirementsidentifiedsofar(cf.Section5.2).
2.2.4 Tasks planned for next period • Selectionofastandardforannotationinteroperabilityatthelevelofend-to-endsystemoutput• The furtherselectionofparticularcandidatestandards for theprovisionofacoresetofdata
categoryelementsfordataintegrationandexchange.• Thecontinuingcreationoflinksbetweentheelementsofthesevocabularies• Requirementextension/adjustment• Experimentationwiththeoperationalisationoftheschemanetwork• Incrementalandcollaborativecreationofadraftspecificationreport.
2.2.5 State of operations Thegroupoperatesefficientlyandproducesexpectedoutputsaccordingtotheplan.TheselectionofusecasedrivenresourceschemaswasperformedonthebasisoftheWG2memberexpertise.Alignmentwithotherworkpackages,specificallyWP4,hasbeenestablished.
2.3 WG3 – IPR and licensing
2.3.1 Mission statement ThegoalofWG3"IPRandlicensing"istostudyandidentifycopyrightandrelatedrights(e.g.suigenerisdatabaseright)restrictionsandexceptionstotheuseandreuseofsources(bothtextualsourcesand
1https://www.w3.org/OWL/2https://www.w3.org/RDF/3http://www.w3.org/2004/02/skos/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page14of54
text-mining services) in TDMactivities.On this basis theWGwill also identify contractual tools andschemes(e.g.licences)thatcanbestservetheneedsofTDMservices.Inparticular,itwillexaminewhichexceptionsarecurrentlyavailable(e.g.thenewlyimplementedTDMexceptionintheUK),whichareupcomingandwhetherthecurrent/proposedsolutionsembracealltheneedsofthescientificandacademicsector(e.g.isthenon-commerciallimitationnecessary?).Theworkinggroupalsofocusesontheissueoflegalcompatibilityandinteroperabilityoflicenses,aimingatdeterminewhethermultiplelicensesthatapplytodifferentcomponentscanbedeemedcompatibleand legal interoperable, particularly when there is the need to assess if the result (combination ofcomponentsunderdifferencelicensingterms)canberedistributedornot.Additionally,openlicensingmodelsforboththescientificrelatedtextualsourcesandthetext-miningserviceswillbeexploredandevaluated,bymeansofspecifictoolssuchasgraphicalrepresentationsoflicensescompatibility(tobeidentifiedascompatibilitymatrix)andworkflowsthatwillguidetheendusers to choose the best applicable license and determine what licensing restrictions or rightsstatementslimitations,ifany,applytospecificuses.
2.3.2 Mode of operation WG3,while focusingon legal interoperability issues,brings togetherexperts fromavarietyof fields.Theseinclude:legalstudies,publishers,technicalexperts(computerscientists,metadataexperts,etc.),policymakers,academics,representativeofdifferentcommunities,groupsandinitiativesinternationallyactiveinthefield.WG3regularlyorganiseconferencecallswithexternalexperts(onceamonth),internalexperts(onceamonth)anddedicatedconferencecallswithWG1plusselectedexpertstodiscussthespecificissueoflicence/right statements and metadata representation (again once a month), for a total of threeconferencecallsamonth.Agendaitems,minutesandsummariesofallconferencecallsaremaintainedonthededicatedwebsite.WG3maintainsanupdatedlistofworkingdocumentswhichincludeaninventoryoflicencesandtermsofusesubmittedbyalltheconsortiummembers,whichformthebasisforanotherdocumentdedicatedtothecompatibilityofthelicencesandtermsofuse.Adetailedbibliographyofscholarlypublicationsandpolicy documents is alsomaintained.Additionally, a glossary collecting themost recurring legalconceptswithabriefexplanationislikewiseavailable.
2.3.3 Progress in current period Thegrouphastwomaingoals:favouringlicencecompatibilityandclarifyingthelegallandscapeinthefieldofTDM.Allthis,withaviewtotheneedsofTDMresearchers,whichimpliestheneedtodevelopdocumentsandtoolsthatcanbereadilyusedbylaymen.Thefollowingitemssummarisetheaccomplishmentachievedsofarare:• Licence-relatedinteroperabilityrequirements1• LicenceCompatibilitymatrix(seeSection8.3)-aschematicrepresentationthatrepresents(a)the
typeofdata(contents,softwareandtermsofservices)and(b)thetypeof licencesand/orrightsstatements,todeterminewhetherornotthereiscompatibilitybetweenresourcesunderrespectivelicences. This matrix aims at facilitating the choice for users for the best licence to use and
1https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-spec.html#WG3
InteroperabilityStandardsandSpecificationsReport
•••
Public Page15of54
share/distribute resourcesandparticularTDMworkflowresultswhichmayhavebeengeneratedfrommultiplesourcesunderheterogeneouslicences.Whenmultiplelicencescouldbeapplied,alsodisplaysinbriefwhatcouldbethelegalimplicationsofchoosingoneortheotherlicence.
• Legalmetadataandrightsstatement(thisdocumentisstillworkinprogressandnotincludedwiththepresentdeliverable)-adocumentdraftedincollaborationwithWG1thatillustratestheroadmaptoaddresstheinferenceoflegalmetadataelementsandrightsstatementsforthepurposeofTDMactivities. This roadmap articulates in the following actions: (a) identifying applicable rightsstatementsandbuildanOpenMinTeDinventory;(b)categorisingthesefindingsandcomparingthemwithsimilarinventories(e.g.Europeana,OpenAIREandCORE,butalsoCLARINandMETA-SHARE);(c)identifyingacommonvocabularywhilealso(d)contemplatingtheirmachine-readability.
• Licences<->RightsStatements(thisdocumentisstillworkinprogressandnotincludedwiththepresentdeliverable)-asyntheticrepresentationoflicencesandrightsstatements’conditionstohelpusersunderstandwhatagivenlicenseorasetoflicensesallowthemtodo–includingwhattheyarerequiredtodotoproperlyperformtheiractivitiesunderthoselicensingterms–andwhatlimitationsorrestrictionsmayapplytotheusetheywishtomakeoftheresource..
• Publication(jointlywithWG1):LegalInteroperabilityIssuesintheFrameworkoftheOpenMinTeDProject:AMethodologicalOverview(fordetailsonthepublicationandalink,seeSection2.5)
• Publication:WhyWeNeedaTextandDataMiningException(ButitisNotEnough)(conferenceextendedabstract)(fordetailsonthepublicationandalink,seeSection2.5)
2.3.4 Tasks planned for next period The main question to be addressed together with the other WG regards the “granularity” of therepresentationoflegalinformation.Inotherwords,whetherthelegalrulesandtheconnectedmetadatashouldberepresentedatthe licence level,ordeconstructedfurtherat the levelofrightstatements.There is an ongoing discussion with internal and external experts (including the representatives ofinternationalprojectsactive inthisfield)aboutthedesirabilityandfeasibilityofa“rightsstatement”implementation.On the basis of the outcome of this analysis,WG3 will implement the connected licence or rightsstatement compatibility table both at the “horizontal” level as already in draft version in the listeddocuments,aswellasatthe“multi-layer”levelexplainedinthecitedpapers.TheextendedabstractaboutTDMexceptionwillbedevelopedintoafullpaper.Theglossarywillbeaccordinglyexpanded.Manyofthedevelopedresources(casescenarios,bibliography,glossary,etc.),willformthebasisforadditionaltrainingmaterialasrequestedbyotherWP(e.g.FAQs).
2.3.5 State of operations Thework ofWG3 is on schedule. The discussion about “granularity” is revealing to bemuchmorecomplexthanoriginallythought,butthishasnotcausedmajordelays.Intheeventualityinwhichthediscussionwillnotfindanacceptablesolutionwithinreasonabletime,ariskreductionplanhasalreadybeenconsidered.Theoriginallyintendedlicencecompatibilitytoolswillbedeveloped,inparalleltothediscussionregardingrightsstatement.Giventhemodularityof thecompatibilitytableandthemulti-layerapproach,aneventualimplementationofarightsstatementcompatibilitytablewithintheexistinglicencecompatibilitycanbeeasilyachievedinlegal,technicalandscientificterms.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page16of54
2.4 WG4 – Annotation and workflows
2.4.1 Mission statement This working group studies interoperability aspects of text annotation and workflows. It includessupportedinput/outputformats,annotationencodingmodels,workflowarchitectures,serviceaccessmodes,typesystemalignmentandothers.Aninterfacebetweenworkflowmanagementsystemsandcomponents is a key interoperability issue, as it includes the problems of how their functionality ispackaged,whatmetadataareincludedandhowtheyareinterpretedbyasystem,butisalsorelatedtowhat type of information is processed and how it is represented, serialised as input/output files ortransmitted.
2.4.2 Mode of operation Thegroupactivitiesarebasedon theexpertiseof itsmembers, representing institutionsdevelopingsomeoftheleadingtextminingframeworks:UniversityofManchester(ARGO,U-Compare),UniversityofSheffield(GATE,AnnoMarket),UniversityofDarmstadt(DKProCore),FrenchNational InstituteforAgriculturalResearch (Alvis) andAthenaResearchand InnovationCenter (ILSP). Thegroupmeets atregularconferencecallseverytwoweeksandifnecessaryconsultsexternalexperts,representingothermajor TM centres. A cycle of technical presentations on workflow systems has also been initiated,startingwithadescriptionanddiscussionondistributedexecutioninArgo.
2.4.3 Progress in current period Sofarthegrouphasproducedthefollowingresources:
• Asetof33requirementsforcomponents1toassureworkflowinteroperability.Theyhavebeencreatedbasedonexperiencesofgroupmembersandanalysisofinteroperabilityscenarios.
• Analignmentof6typesystems2usedinexistingplatforms(Alvis,Argo,DKProCore,GATE,ILSP,LAPPS Grid). The alignmentmaps equivalent types and features, which shows concepts andapproaches that are consistent or overlapping. On the other hand, it also help to identifydifferencesandseewhethertheycomefromdifferentfocus(e.g.concentratingonbiomedicalconceptsmissingfromothersystems)ordifferentconventionsofdatarepresentation.
• An initial directory of 556 components 3 currently available in libraries of the consideredworkflow systems, including their short description, automatically assigned categories,parametersandmachine-readabledescriptorsinMETA-SHAREformat.Thedirectoryiscreatedthroughanautomaticprocessthataggregatesmetadatafrommultiplesources.Theaggregationprocessesisworkinprogressandcontinuallybeingimproved.
• Initialworkonaprototypesolutionallowingtobuildworkflowsincludingcomponentscomingfromdifferentplatforms (initiallyDKProCore (UIMA)andGATE,also looking intoAlvis,Argo,ILSP, LAPPS Grid) in the form if the domain-specific programming language “OpenMinTeDScript”.4This prototype serves as a sandbox to investigate interoperability issues in terms of
1https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-spec.html#WG42https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/typealignment.html3https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/components.html4https://github.com/openminted/openminted-script
InteroperabilityStandardsandSpecificationsReport
•••
Public Page17of54
componentlifecycle,deployment,anddatatransformation.Inparticular,itallowsustogeneratediscussionsand insightson these topics independentlyof theOpenMinTeDWorkflowservicewhich will be delivered later in the project. In fact, we expect that lessons learned fromOpenMinTeDScriptwillhaveanimpactonthedesignoftheOpenMinTeDWorkflowservice–potentially parts of OpenMinTeD Script can even evolve to be integrated into theworkflowservice,e.g.thedatatransformationfunctionality.
• Publication:Interoperabilityofcorpusprocessingwork-flowengines:thecaseofAlvisNLP/MLinOpenMinTeD(fordetailsonthepublicationandalink,seeSection2.5)
2.4.4 Tasks planned for next period Thefollowingtasksarenecessarytoimprovetherequirementsset:
• Finalisingallrequirements:someoftheproposedrequirementshavesparkedoffadiscussionthathasn’tbeenconcludedbyunanimousagreementyet.These requirementshave received‘draft’statusandnowneedtobefurtherdiscussedandfinalised.
• Creatingconcreterequirements:sofarallofthecreatedrequirementsare‘abstract’, i.e.theydescribesomedesiredfunctionality(e.g.componentsshouldbedescribedbymachine-readablemetadata), but without technical details (e.g. a format of the metadata). For each abstractrequirement,atleastoneconcretecounterpartshouldbecreatedinthenextperiod.Theprocessofcreatingtheconcreterequirementswillalsoinformtheinteroperabilityguidelinedeliverables(D5.5andD5.6).
2.4.5 State of operations Thegroupoperatesefficientlyandproducesexpectedoutputsaccordingtotheplan.
2.5 Publications Thissectionlistspeer-reviewedpublicationsrelevanttothisdeliverablefromprojectpartnerswithinthereportingperiod.Allthepublicationsareavailableonline1asopenaccessunderCC-BY-NClicence2.• P.LabropoulouandS.PiperidisandT.Margoni,2016.LegalInteroperabilityIssuesintheFramework
oftheOpenMinTeDProject:aMethodologicalOverview.InProceedingsoftheWorkshoponCross-Platform TextMining and Natural Language Processing Interoperability (INTEROP 2016) at LREC2016,p.60-63,Portorož,Slovenia,DOI10.5281/zenodo.182497
• T.MargoniandG.Dore,2016.WhyWeNeedaTextandDataMiningException(butitisnotenough)(ExtendedAbstract). InProceedingsof theWorkshoponCross-PlatformTextMiningandNaturalLanguageProcessingInteroperability(INTEROP2016)atLREC2016,p.57-59,Portorož,Slovenia
• W. Peters, 2016. Tackling Resource Interoperability: Principles, Strategies and Models. InProceedings of the Workshop on Cross-Platform Text Mining and Natural Language ProcessingInteroperability(INTEROP2016)atLREC2016,p.34-37,Portorož,Slovenia
1http://interop2016.github.io//Programandhttp://www.lrec-conf.org/proceedings/lrec2016/index.html2http://lrec2016.lrec-conf.org/en/submission/authors-kit/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page18of54
• M.BaandR.Bossy, 2016. Interoperabilityof corpusprocessingwork-flowengines: the caseofAlvisNLP/MLinOpenMinTeD.InProceedingsoftheWorkshoponCross-PlatformTextMiningandNatural Language Processing Interoperability (INTEROP 2016) at LREC 2016, p.15-18, Portorož,Slovenia
• P.KnothandN.Pontika,2016.AggregatingResearchPapersfromPublishers'SystemstoSupportTextandDataMining:DeliberateLackofInteroperabilityorNot?.InProceedingsoftheWorkshoponCross-PlatformTextMiningandNaturalLanguageProcessingInteroperability(INTEROP2016)atLREC2016,p.1-4,Portorož,Slovenia,DOI10.5281/zenodo.194788
• R.EckartdeCastilho,2016.Interoperability=f(community,divisionoflabour).InProceedingsofthe Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability(INTEROP2016)atLREC2016,p.24-28,Portorož,Slovenia,DOI10.5281/zenodo.161848
InteroperabilityStandardsandSpecificationsReport
•••
Public Page19of54
2.6 External Experts Thefollowingpersonsactasexternalexpertsononeormoreoftheinteroperabilityworkinggroups.Name Affiliation WG1 WG2 WG3 WG3AndreasKempf DeutscheZentralbibiothekfür
Wirtschaftswissenschaften,Germany X
ChristianChiarcos Goethe-UniversitätFrankfurtamMain,Germany X ChristopherCieri LDC,USA X DaanBroeder MPIforPsycholinguistics,Netherlands X DianePeters CreativeCommonsHQ X DominiqueEstival WesternSydneyUniversity,Australia X XEnriqueAlonso ConsejodeEstado X EricNyberg CarnegieMellonUniversity,USA XFedericoMorando NexaCenterforInternet&Society,Italia X GeoffreyBilder Crossref X X GiuliaAjmoneMarsan TheOrganisationforEconomicCo-operationand
Development(OECD) X
GwenFranck CreativeCommons,EIFL X InekeSchuurman CCL,UniversityofLeuven X Jin-DongKim DatabaseCenterforLifeScience,Research
OrganisationofInformationandSystems X
JochenSchirrwagen UniversitätBielefeld,Germany X JohnMcCrae NationalUniversityofIreland,Galway,Ireland X KeithSuderman VassarCollege,USA(LAPPSGrid) X X XKristoferErickson CREATe X LarsBjørnshauge SPARCEurope X LiamEarney JISC,UK X LukaszBolikowski UniversityofWarsaw,Poland X XMaartenvanGompel RadboudUniversityNijmegen,NL X MaartenZeinstra Kennisland,NL X MarcVerhagen BrandeisUniversity,USA(LAPPSGrid) XMarkPerry UniversityofnewEngland,Australia X MaurizioBorghi BournemouthUniversity,UK X MenzoWindhouwer MPIforPsycholinguistics,Netherlands X NancyIde VassarCollege,USA(LAPPSGrid) X XPaulKeller Kennisland,NL X PaulUhlir NationalAcademyofSciences X PawelKamocki InstitutfürDeutscheSprache,Germany X PeterSuber BerkmanKleinCentre,HarvardUniversity X PiekVossen VUUniversityAmsterdam,Netherlands XProdromosTsiavos TheMediaInstitute X RafalRak UberResearch,UK XSteveCassidy MacquarieUniversitySydney,Australia X XThiloGötz IBM,Germany X
InteroperabilityStandardsandSpecificationsReport
•••
Public Page20of54
3. Scenarios Inpreparationofgenerating interoperabilityrequirements(Section4), theWGspreparedasetof17scenarios.ThesescenarioshighlightedparticularaspectsofinteroperabilityfromtheperspectiveoftherespectiveWG. Theywere identified anddescribedby the participants from the respectiveworkinggroups through introspection and subsequently described and refined in a collaborative processinvolvingexternalexperts,cross-WGcommunication,aswellascommunicationwithWP4.Inparticular,thefirstOpenMinTeDInteroperabilityWorkshopheldonNov12,2015inTheHague,NLrevolvedaroundtheinteroperabilityscenariosandfocussedonderivingafirstseedsetofinteroperabilityrequirementsfromthemwhichwaslaterelaborated.WG1
• Scenario1 — Discoverresourcesofvarioustypesatvariouslocations• Scenario2 — SMErunningresearchanalyticsforfunderswithintheEuropeanResearchArea• Scenario3–-Acontentproviderusingtextminingtoolstoenrichtheircontent• Scenario4 — Providecomprehensivestatisticalmetadataforresources• Scenario5 — Domain specific researcherusinga textmining toolor service topromote their
researchoruseappliedresearchresultswithintheirsetting.
WG2• Scenario1 — Combiningheterogeneousresourcesforinformationextraction• Scenario2 — IncludingCustomKnowledge• Scenario3 — Therelationbetweendocumentsandknowledgebasesthroughkeywords
WG3• Scenario1 — Legalstatusofaggregations:focusoncontent• Scenario2 — FocusonTDMtoolsandTDMservices• Scenario3 — ThetypeandnatureofTDMresults(orHowfardocopyrightandSGDRreach)?
WG4• Scenario1 — Transferabilityofcomponentsbetweenecosystems• Scenario2 — Comparisonofcompetingcomponentsorparameters• Scenario3 — Non-expertproviderofTDMresource• Scenario4 — ReproducibilityofTDM-relatedresearch• Scenario5 — IntegrationofaTDMworkflowinaservice/embeddinginanapplication• Scenario6 — DevelopmentofTDMresources
Afulldescriptionofthescenariosisomittedinthisdocument.TheyareprovidedasanattachmentaswellasonourpubliclyGitHubrepository.1
1https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-scenarios.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page21of54
4. Requirements Thissectionoutlinesthestructureofrequirementsandpresentstherequirementsgeneratedsofar.Thesectionprovidesanoverviewoftherequirement’sstructure(Section4.1)andahigh-leveloverviewoftheactualrequirements(Section4.2).
4.1 Requirement Structure ID-EveryrequirementhasanID.Westartcountingfrom1andeverynewrequirementincrementstheIDby1.TheIDisencodedintherequirementfilename,e.g.1.adoc.Concreteness - TheOpenMinTeD infrastructureaims tobeopen, sustainable,andable tocopewithchangeinthecommunityandintechnology.Assuch,itneedstobeabletosupportmultiplepopulartechnologiesandstandards.Aspopularityischangingovertimeandasnewstandardsandtechnologiesare evolving, OpenMinTeD will have to evolve as well. As supporting too many technologies andstandards in parallel is also unsustainable. Thus, the supported byOpenMinTeD at any timewill belimitedtoafew.However,third-partiesthatwouldliketodevelopandmaintainexternalmodulesforOpenMinTeDtosupportadditionaltechnologiesandstandardsarewelcomeandthesethird-partiescanrefertotheinteroperabilityrequirementstoestimatethefeasibilityofcreatingsuchanexternalmodule.Thedistinctionbetweenabstractandconcreteinteroperabilityrequirementsthatwemakehereallowsustoanswertwoquestions:
• HowdifficultisitforanewtechnologyorstandardtobeincorporatedintoOpenMinTeD?• Howdifficult is itto integratenewcomponentsbasedonalreadysupportedtechnologiesand
standardsintoOpenMinTeD?
Abstract requirements are agnostic to concrete technologies and standards and help assessingcompliance with them; helps answering the first question. Concrete requirements refer to specificimplementationdetailsandhelpansweringthesecondquestion.Requirementconcretenessvalues
• Abstract-therequirementspecifiesaneed,butdoesnotgointodetailshowthisneedmustbefulfilled.Therequirementmayprovideexamplesoftechniquesorimplementationsthatfulfiltherequirement,butdoesnotmandatetheiruse.
• Concrete - the requirement specifies a need and prescribes the use of specific techniques,standards,implementations,etc.
Strength-Requirementstrengthvalues• Mandatory-compliancewithamandatoryrequirementisobligatory.Non-compliancewithany
mandatoryrequiremententailsnon-compliancewiththespecificationasawhole.• Recommended - compliancewitha recommendedrequirement isnotobligatorybutstrongly
desired.• Optional-compliancewithanoptionalrequirementisnotobligatoryandnotstronglydesired,
butconsideredbeneficial.
Status-Therequirementstatusindicateshowfarithasproceededinitslifecycle.Ifandwhichchangesmaybemadetoarequirementdependsonthisstatus.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page22of54
Requirementstatusvalues• Draft-therequirementisasuggestionandcanbechangedsubstantiallyinanyrespect.• Final-therequirementisreadyforrelease.Changestoafinalrequirementareonlyallowedif
theydonotaffectthecompliancestatusofanyproduct,component,format,etc.thathasalreadybeenevaluatedagainsttherequirementspecification.Ifachangewouldtriggerachangeinanycompliance status, insteadof changing an existing requirement, a new requirementmust becreated under a new ID and compliance must be evaluated against this new requirementspecification in the next iteration. The previous requirementmust bemoved to deprecatedstatus.
• Deprecated - the requirement is no longer to be used for compliance assessment. Therequirementspecificationmustnotbechanged.Exceptionsareamendmentsaddingpointerstopotentialnewversionsoftherequirementandprovidingarationaleforthedeprecation.
Category - The category of a requirement is used to anchor it in the document structure of theinteroperabilityspecification.Arequirementmaybeassociatedwithmultiplecategories.
4.2 Requirements Overview This section provides a high-level overview of the interoperability requirements that have beengeneratedduringthereportingperiod.Atotalof72requirementshavebeengeneratedbytheWGs,manyofwhichareapplicableacrosstheWGs(WG1:21,WG2:17,WG3:23,WG4:33).Thesecanbebrokendownbystatus:
• 22requirementsinstatus“draft”(Table1)• 40requirementsinstatus“final”(Table2)• 10requirementsinstatus“deprecated”(Table3)
Here, we provide only a tabular overview over the requirements generated so far. Each of theserequirementshasamoredetaileddescriptionwhichcanbefoundintheinteroperabilityspecificationdocument1thatisalsoattachedtothepresentdeliverable.ThegenerationofrequirementshappensperWG.ItispossiblethatverysimilarrequirementsarebeinggeneratedinmultipleWGs.Whenthishappened,wekeptonoftherequirementsanddeprecatedtheothers,mergingcomplianceassessmentintotheremainingrequirementifnecessary–thisisanongoingprocessandcontinuesasmorerequirementsareaddedandasexistingrequirementsbecomebetterunderstood. Several requirements that were generated by the WGs were later considered to befunctionalrequirementsforoneoftheOpenMinTeDservices(e.g.theregistryserviceortheworkflowservices)ratherthan interoperabilityrequirements.ThesehavealsobeenmarkedasdeprecatedandscheduledforinclusioninthefunctionalspecificationdocumentD4.3.Mostoftherequirementsarerecommendations(41),acoresetofrequirementsismandatory(6),andafewareoptional(6).Weprovideinthisdocumentonlytherequirementoverviewwiththeirshorttitles.ThefullrequirementspecificationisprovidedasanattachmenttothisdocumentandisalsopubliclyhostedonourGitHub
1https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-spec.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page23of54
repository.BrowsingtherequirementshostedonGitHubisthepreferredmethodasitisahighlycross-referencedhypertext.Table1-Requirementsinstatus"draft"
ID Requirement Concreteness Strength WG’s5 Components should detail all their
environmentalrequirementsforexecutionabstract mandatory WG4
6 Componentsshouldhaveauniqueidentifierandaversionnumber
abstract mandatory WG4
10 Componentsshouldspecifythetypesoftheannotationsthattheyinputandoutput
abstract mandatory WG4,WG2
11 Components should declare whether theycanbescaledwithinaworkflow
abstract mandatory WG4
13 Citationinformationforcomponent abstract recommended WG1,WG414 Components must maintain Licence
informationabstract mandatory WG4
18 Workflows should be described using anuniformlanguage
abstract recommended WG4
51 Licenceshouldbeattached abstract recommended WG353 Licensormustbeentitledtograntlicence abstract recommended WG354 Licenseesshouldremainwithacopyof the
licenceabstract recommended WG3
55 Standardlicencesshouldbeused abstract recommended WG356 Licenceshouldbemachinereadable abstract recommended WG357 Licence should be understandable by non-
lawyersabstract recommended WG3
58 TDMmustbeexplicitlyallowed abstract recommended WG359 Rightfor(temporary)reproductionmustbe
grantedabstract recommended WG3
60 Boundaryforderivativeworkmustbeclearlydefined
abstract recommended WG3
61 NorestrictionsonTDMresultswhicharenotderivedworks
abstract recommended WG3
62 World-wideandirrevocablelicencegrant abstract recommended WG363 LicencemustqualifyforOpenAccessrights abstract recommended WG364 LicencemustqualifyforOpenAccessuses abstract recommended WG365 Licencemust qualify forOpenAccessmust
notrestrictuseinanywayabstract recommended WG3
66 Licence must qualify for Open Access mayincludeattributionrequirements
abstract recommended WG3
InteroperabilityStandardsandSpecificationsReport
•••
Public Page24of54
Table2-Requirementsinstatus"final"
ID Requirement Concreteness Strength WG’s1 Components should be described by
machine-readablemetadataabstract mandatory WG4
2 Component metadata should be embeddedintothecomponentsourcecode
abstract recommended WG4
3 Componentmetadata is separable from thecomponent
abstract mandatory WG4
4 URLtoactualcontentmustbediscoverable abstract mandatory WG1,WG2,WG3
7 Components should have a fully qualifiedname that follows the Java class namingconventions
concrete mandatory WG4
8 Components should associate themselveswithcategoriesdefinedbytheOpenMinTeDproject
abstract mandatory WG4
9 Componentsshoulddeclaretheirannotationschemadependencies
abstract mandatory WG4
12 Components should provide documentationdescribingtheirfunctionality
abstract recommended WG4
15 Human readable information should beprovidedbyeachresource
abstract recommended WG1,WG4
16 Models/resources should be useable acrossdifferentcomponentcollections/platforms
abstract recommended WG4
17 Componentsshouldbestateless concrete recommended WG421 Configurationandparametrisableoptionsof
the components should be identified anddocumented
abstract recommended WG4
24 Using/treatingworkflowsascomponents abstract mandatory WG426 Ability to determine source of an
annotation/assignedcategoryabstract recommended WG4
27 Componentsshouldhandlefailuresgracefully abstract recommended WG428 Processing components should be
downloadableabstract recommended WG4
30 Metrics for theconfidence levelof theTDMoperationshouldbeincludedinthemetadata
abstract optional WG1,WG4
31 Metrics for the performance of the TDMoperationshouldbeincludedinthemetadata
abstract optional WG1,WG4
32 Version must be included in the metadatadescriptionforallresources
abstract mandatory WG1,WG2,WG3,WG4
33 Licensinginformationmustbeincludedinthemetadata
abstract mandatory WG1,WG3
InteroperabilityStandardsandSpecificationsReport
•••
Public Page25of54
ID Requirement Concreteness Strength WG’s34 Licensinginformationshouldbeexpressedin
amachine-readableformabstract recommended WG1,WG3
35 Allresourcesmustincludeauniquepersistentidentifier
abstract mandatory WG1,WG2,WG3,WG4
36 Classification metadata should be included,whereapplicable, in themetadatarecordoftheresource
abstract recommended WG1,WG2
37 Information on the structural annotation(layout) of resources should be included inthemetadataoftheresource
abstract recommended WG1
38 Accessmodeofresourcesmustbeincludedinthemetadata
abstract mandatory WG1,WG2,WG4
39 Contentresourcesmustincludemetadataontheirformat(e.g.XML,DOCXetc.)
abstract mandatory WG1
40 Component metadata must includestandardisedcategories/tagsthatmakethemeasytodiscover
abstract mandatory WG1,WG4
41 Contentresourcesmustincludemetadataontheirlanguage(s)
abstract mandatory WG1,WG2
43 S/W (tools, web services, workflows) mustindicate whether they are language-independent or the language(s) of theresourcestheytakeasinputandoutput
abstract mandatory WG1,WG4
44 Statisticalmetadatathatallowmonitoringofresourceversionsmayaccompanyresources
abstract optional WG1,WG2
45 S/W (tools, web services, workflows) mustindicateformatoftheiroutput
abstract mandatory WG1,WG4
47 Informationonfundingofresourcesmaybeincludedinthemetadata
abstract optional WG1,WG2,WG3,WG4
48 Allresourcemetadatarecordsmustincludeareference to themetadata schemaused fortheirdescription
abstract mandatory WG1,WG2,WG3,WG4
50 Documentation references should beversioned
abstract recommended WG1,WG2,WG3,WG4
67 KnowledgeResourceElementId abstract recommended WG2
68 DataCategoryLinkingVocabulary abstract recommended WG269 Interoperability between elements from
different knowledge resource schemasshouldbeexpressedthroughRDFstatements.
abstract recommended WG2
70 AllKRcontentelementsneedtobeaddedastextannotationswithinaTDMworkflow.
abstract mandatory WG2
InteroperabilityStandardsandSpecificationsReport
•••
Public Page26of54
ID Requirement Concreteness Strength WG’s71 TheKRshouldbeingestiblethroughaURI abstract recommended WG272 TheKRformatshouldbeinastandardformat
suchasXML,JSONorRDF.abstract recommended WG2
Table3–Requirementsinstatus“deprecated”
ID Requirement Concreteness Strength WG’s19 Components that use external knowledge
resources should delegate access to aresource adapter instead of handling itthemselves
abstract optional WG2,WG4
20 Workflowenginesshouldnot require toseedata
concrete recommended WG2,WG4
22 TheWorkflow Engine Should Permit SavingExperimentalConditionsinaWorkflow
abstract recommended WG1,WG4
23 TheWorkflowEngine shouldpermit LicenceAggregationinWorkflows
abstract recommended WG3,WG4
25 Incorporationofmultipleresourcesinparallel abstract recommended WG429 The actual content of all content resources
mustbediscoverableabstract mandatory WG1,WG2,
WG342 Themetadatacanincludetheinformationon
which projects/workflows involve theresource
abstract optional WG1,WG2,WG3,WG4
46 Outputresourcesofwebservices/workflowsmust be accompanied by provenancemetadata
abstract mandatory WG1,WG4
49 Metadataoftoolsshouldcontaininformationaboutthemodelsavailableforthem
abstract recommended WG1,WG4
52 Licenceinformationmustbeinmetadata abstract recommended WG1,WG3
InteroperabilityStandardsandSpecificationsReport
•••
Public Page27of54
5. Compliance Intheprevioussection,wediscussedtherequirementsforinteroperabilitythatWGsinOpenMinTeDhave identified so far. But unless relevant products are compliantwith these, the requirements areineffective. Inthissection,weanalysethecompliancewiththerequirementssofar.ThisprovidesuswithabasisfordetermininghowtoeffectivelyimprovecomplianceandthusinteroperabilitybetweentherelevantproductsaswellaswiththeOpenMinTeDinfrastructure.
5.1 Compliance levels Aspartofthecomplianceassessmentprocess,thefollowingcompliancelevelsareassigned:
• Full-fullycompliant• Partial-partiallycompliant.E.g.somepartsofaproductarecompliantbutnotall.Thisis
typicallythecaseifaproductisinastateoftransitionfromanon-complianttoacompliantstate.
• No-notcompliant.• N/A-notapplicable.Thisisexpectedtooccurmainlyforconcreterequirementsifacertain
requirementisnotapplicableforacertainimplementation,e.g.arequirementonremoteAPIaccessonatoolwhichdoesnotofferaremoteAPI.Abstractrequirementsshouldbeformulatedinsuchawaythattheyarealwaysapplicable.
When a requirement is changed, compliance assessments may have to be updated as well. Thus,complianceassessmentsshouldonlybemadeonrequirementsthathavebeenmarkedas“final”,i.e.whosedescriptionmustnolongerbechanged.However,inpreparationofthepresentdeliverable,wehavealsoperformedcomplianceassessmentsforthoserequirementswhicharestill in“draft”status.Thoseassessmentswillhavetobeupdatedwhentherequirementsarepromotedtothe“final”status.
5.2 Compliance assessments In this section, we list the products taken into account for the compliance assessment. For everyinteroperabilityrequirement,therearerelevantclassesofproducts:
• Resources that have been developed by the consortium partners andwhere the creation ofmetadata is the responsibility of the respective partners (Frontiers, Alvis, Argo/U-Compare,DKProCore,ILSP)
• Resources that are already used in TDM processes and/or are being examined for use inOpenMinTeDandare,therefore,notdirectlyresponsibleforthemetadatadescriptions(TheSOZ,AGROVOC,JATS,OLIA,LAPPSGrid,licences)
• Resourcesthatarebeingcollectedfromtheoriginaldataproviderswhoalsosupplythemetadatadescriptions(CORE,OpenAIRE).
AnoverviewoftheassessedproductscanbefoundinTable4.ThedetailedassessmentcanbefoundintheDetailedInteroperabilitySpecificationv11.1https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-spec.html#_compliance
InteroperabilityStandardsandSpecificationsReport
•••
Public Page28of54
Table4-Assessedproductsandconsultedsources
Product Assessedrequirements(excl.deprecated)
Source
ARGO 35 http://argo.nactem.ac.uk/AGROVOC 16 http://aims.fao.org/aos/agrovoc/void.ttlAlvis 36 http://www.quaero.org/module_technologique/alvis-
nlp-alvis-natural-language-processing/CLARINCR 5 https://www.clarin.eu/ccr
CORE 11 https://core.ac.uk
DKProCore 36 https://dkpro.github.io/dkpro-core/documentation/
Frontiers 11 http://home.frontiersin.org/about/author-guidelines
GATE 35 https://gate.ac.uk/sale/tao/split.html
ILSP 13 https://inventory.clarin.gr/
JATS 15 http://jats.nlm.nih.gov/about.html
LAPPSGrid 15 http://vocab.lappsgrid.org/
Licences 6 varietyofstandardlicences,suchasCCandFOSS
OLiA 15 http://acoli.cs.uni-frankfurt.de/resources/olia/
Ontolex 5 https://www.w3.org/community/ontolex/
OpenAIRE 11 https://guidelines.openaire.eu/en/latest/
TheSOZ 16 http://lod.gesis.org/thesoz/de.html
ApacheUIMA 1 https://uima.apache.org
schema.org 5 http://schema.org
InteroperabilityStandardsandSpecificationsReport
•••
Public Page29of54
6. Actions Basedonthecomplianceassessment,eachWGhasidentifiedactionsthatneedtobeperformedinordertoimprovethecomplianceofrelevantproductswiththeOpenMinTeDinteroperabilityrequirements.TheseactionsshallguidetheworkoftheWGsinthenextreportingperiod(s),willprovideinputtoT5.4“Alignment of service and content provider systems” and T5.5 “Data interoperability toolkit forrepositories,publishers’systemsandOpenMinTeDsubsystems”andshallalsobetakenintoaccountfortheimplementationofOpenMinTeDservices(WP6).Most of the requirements generated so far (69 out of 72) are “abstract”, i.e. endorsedways to becompliantwiththeserequirementsthroughtheuseofspecificstandards,havenotyetbeenspecified.Nevertheless, various relevant products are already compliant with these abstract requirements,althoughpotentiallyinverydifferentways.A major focus across all WGs for the next reporting period will be to add suitable “concrete”requirements explicating the specific standards and mechanisms endorsed and supported byOpenMinTeD.Wherenosuitablestandardsandmechanismsexist,theWGswill-incollaborationwithWP6(Implementation)-proposetomakeuseofrespectivemechanismspioneeredandimplementedbyOpenMinTeDandincludetheirrespectivespecificationsinfutureversionsofthisdeliverable.Asecondmeasureofensuringtheapplicability,practicability,andcompletenessoftheinteroperabilityrequirements going forward is the continued development of interoperability prototypes. TheseprototypesarealsomeanttobecarriedoverintotheactualimplementationofOpenMinTeD.
6.1 WG 1 Withoneexception,when themetadatadescriptions fallunder the responsibilityof the consortiummembers,theresultsoftheassessmentswererathersatisfactory,whentherequirementappliedtothespecificresourcetype.Strategicactions–needtobeundertakentoresolvetheseissuesincludeatahigherlevel:
• Promotingandsupportingthecreationandenrichmentofformalmetadatadescriptions• Standardising, where possible, the metadata elements and values and recommending best
practicesforfillingthemin.
Immediate actions – the immediate actions that can and should be taken in the OpenMinTeDframework,toensureinteroperabilityatleastfortheproject’spurposes:
• Createformalmetadatadescriptionsforalltheresources;forthosethatarenotdevelopedbytheconsortiumpartners,thiswillbeallocatedtorelevantqualifiedmembers
• Conversion of the existing metadata descriptions to the reference metadata schema andenrichment thereofwith the lackingmetadata elements; thiswill also help in spotting otherinteroperabilityissuesaswell.
Adding/improving formal metadata descriptions for JATS, LAPPS and Alvis; for Alvis, these can beprovidedinthenextphase,giventhatthedeveloperispartoftheconsortium;forJATSandLAPPS,thesewillneedtobeprovidedbyotherpartners.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page30of54
• Lackofappropriatemetadataelementsintheusedmetadataschemastoencodetherequiredinformation(e.g.REQ-41andREQ-33foraccesspointandlicensingrespectively).
• Lack of standardised vocabulary to encode the information, even though the elements areconsidered important and may already be present in other forms (e.g. as free textdocumentation); for instance, forREQ-30,REQ-31(qualitymetrics)andREQ-32(version), it isimportanttoarriveataconsensusontheencodingpracticesbeforeaddingittothemetadatadescriptioninaharmonisedway.
• Absence of the information in the metadata descriptions, despite the existence of theappropriateelements;suchcasesare,forinstance,REQ-39(formatforcontentresources)andREQ-41 (language for content resources); this category includes both technical andadministrative information, and the reasons behind this non-compliance can be that theinformationisusuallyoptionalandregardedbytheprovidersasredundant.
Prototypes–thecomponentoverview2representsaprototypefortheaggregationandtransformationofexistingcomponentmetadatadescriptionsfromdifferentsources(GATE,UIMA,Alvis,Maven,etc.)into a common scheme. Its development has provided insights that have been integrated into thedevelopmentof the first versionof theOpenMinTeDMetadataSchema.Partsof its functionality, inparticular functionality related to theharvestingofmetadata, isalsonowbeing transferred into theOpenMinTeDregistry.WG1willaccompanytheevolutionofthisprototypeasitisbeingintegratedintothe registry and as its harvesting functionalities are expanded and update the interoperabilityspecificationasnewrequirementscomeup.
Table5–WG1summaryofactionstoimprovecompliance
Action Products Relatedrequirements
Createformalmetadatadescriptions Alvis,JATS,LAPPS AllWG1requirements
Enrichmetadatadescriptionswithspecificelements(addedinthereferencemetadataschema),atleastregardingrequiredelements
Alvis,Argo/U-Compare,DKProCore,ILSP
4,33,38,40,44,47,48
Discussinordertostandardisevocabularyandaddtometadatadescriptions
all
30,31,32,35,36,44
Promotetheenrichmentofmetadatadescriptions,especiallyforrequiredmetadataelements,inthecaseofresourcesownedbyothers
OpenAIRE,CORE,originalprovidersofdependingresources
4,33,37,39,41,47,48
Evolvemetadataharvestingandaggregationprototype
all AllWG1requirements
1https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-spec.html#REQ-42https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/components.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page31of54
6.2 WG 2 Ingeneral,therequirementsweredeemedessentialandrelevant.AllKRsweredeemedcompliantwithREQ-67,REQ-71andREQ-72.Forthisreason,weconsidermakingtheserequirementsmandatoryinthenextiterationofthespecification.RequirementREQ-68andREQ-69checkthelevelatwhichKRsalreadyresource-internallymakeuseof:
• linkstoelementsfromexternalvocabularies• elementsfromOWL/RDF/SKOSlinkingvocabularies
The set of (de facto) standard reference vocabularies to be used for establishing data categoryinteroperabilityisthefollowing:
• SchemasassociatedwithOpenMinTeDo GATEo LAPPSExchangeVocabularyTypeHierarchyo DKProCore
• Strategical, i.e. widely used and interconnected (de facto) standard vocabularies forlinguistic/terminological/ontologicalmetadata
o Ontolexo OLIAReferenceModelo CLARINConceptRegistryo Schema.orgo PennTreebanko UniversalDependency
• Representativesetofusecasedrivenschemas o TheSOZ(socialsciences)o JATS(structureofscholarlyarticles)o Agrovoc(agriculture)o BioLexicon(lifesciences)
Thissetwillbeextendedwherenecessary inordertoensurefullcoveragefortheinteroperabilityofTDMKRelementsrequiredinOpenMinTeD.REQ-70wasconsideredaspotentiallyageneralWG4requirement,andwasthereforenotincludedinthecompliancecheck.Usingtheschemaalignmentscreatedinthisperiod,aprototypewillbesetupduringthenextreportingperiodtoinvestigatetheapplicationofthealignmentsinpracticalworkflows.Tothisend,wewillinvolveinparticularthoseOpenMinTeDpartnersandexternalexpertsthatworkextensivelywithknowledgeresources. In this way, we expect to generate further abstract and concrete interoperabilityrequirements for OpenMinTeD. This prototype can be implemented in conjunction with anotherprototypeoriginatingfromWG4:OpenMinTeDScript.
InteroperabilityStandardsandSpecificationsReport
•••
Public Page32of54
Table6–WG2summaryofactionstoimprovecompliance
Action Product Relatedrequirements
Implementationofprototypeintegratingschemaalignmentswithactualworkflows
all AllWG2requirements
PossibleextensionofWG2requirementswithrequirementsproducedbyWP4.
all
Checkoverlapwith/migrationtorequirementsfromWG1/3/4 all 70
Furtherextensionofreferencevocabulariessetthroughinclusionofadditionalstandards.
all
69
Definitionofinitiallinkingstructureintheformofaspreadsheet:classname/feature/featurevalue/SKOSlinkingrelation/classname/feature/featurevalue
all
69
Furtherdefinitionoflinkingstrategies.• LinksbetweensimpleRDFclasseswithSKOSrelations• ExploretheviabilityofthedefinitioninRDFofcomplexdata
categories(e.g.classeswithspecificfeaturevalues)asnamedgraphs.
• LinkthesenamedgraphstosimpleclassesorothernamedgraphswithSKOSrelations.
all
69
Furtherlinkingofreferencevocabularyelements. all 69
Selectionofrelevant/necessaryGATEpluginsandtheirannotationtypes;createGATEschema.
all
67
CreationoflinksbetweenGATEdatacategoriesandreferencevocabularies.
all 69
CreationofRDFserializednetworkof(defacto)standardKRelementsandlinks.ThisnetworkwillbeusedforinteroperabilitypurposesasthemediatingKRfortextannotationtypeharmonisation.
all AllWG2requirements
6.3 WG 3 DuetothenatureoftheWG,thepossibleimmediateactionsarelimited.TheprimaryrelevantproductsforthisWGarelicencesandtermsofusethatarecreatedbythirdparties.Hence,actionswithexternalinfluencearegenerallyofastrategicnature.OtheractionsarecoordinatedwithotherWGs,mainlyWG1,regardingthe inclusionof licencemetadatawithresourcesbeingsubmittedtoandaccessedthroughOpenMinTeD.Strategicactions–AllresourcesingestedbyOpenMinTeDorproducedastheresultofaTDMprocessmustcarryalicence.Thelicencehastobeexpressedinbothlegalandmetadataterms.Anadditionallayer of information regarding themain rights and obligations should also be added (e.g. commonsdeed).
InteroperabilityStandardsandSpecificationsReport
•••
Public Page33of54
Licencesshouldcomplywiththeproductrequirements(licences) identifiedbyWG3. Inparticular,alllicences (inbound; outbound) should be chosen among standard licences with clear compatibilitystandards.Ad-Hoclicencesaredeprecated.Amajorissuehereisconnectedwiththefactthattypicallytermsofuseforservicesarenotstandardised.Thisisanaspectthatwillbeaddressed.Anassessmentofthelegalstatusofaresource(otherthanlicenceorabsenceoflicences)isdependentonapplicablelegislation.Ad-hocanalysisinthiscaseseemsunavoidable.Immediateactions–Wecontinuediscussionwithinternalandexternalexpertsthe“rightsstatement”implementationandfeedtheoutputofthisdiscussionintotheimplementationoftheconnectedlicenceorrightsstatementcompatibilitytable.Prototypes–Forthenextreportingperiod,weplantheimplementationofalicenceselectorprototypeturningthecompatibilitymatrixintoauser-orientedapplication.Additionally,weshallinvestigatethewayinwhichcertainpermissionsorobligationsare(not)grantedorimposedinparticularlicencetextsthrough.Tothisend,wewillconductanexperiment inwhichexpertswith legaltrainingwillanalyselicencetextsandannotatephrasesorsentencesinthelicencetextswiththeirlegalimplications.Table7–WG3summaryofactionstoimprovecompliance
Action Product Relatedrequirements
Applylicencetoyourresources all 33,51,53,54
Chooseonlyresourceswithalicence all 33
Applylicenceproperly(legal;metadata) all 33,56,62
Forresourceswithnoclearlicencestatementlookatapplicablelaw(TDMexception?)
all 58,59,60,61
6.4 WG 4 Immediate actions – Based on the compliance assessment, we identify three areas that requireimmediateactioninthenextreportingperiod:1. Core requirements – necessary for workflow execution, e.g. regarding component input/output
definition,metadata,dependenciesspecification.Thesearefairlywellsupportedacrosstheproductsandonlyafewimprovementsarenecessarytoachievefullcompliance:• Makingcomponentmetadataavailablebothfromitssource(REQ-2)andseparately(REQ-3).• Creatingauniqueidentifierforeachcomponent(REQ-6).• Enablingusingworkflowsascomponents(REQ-24).• Enforcingspecifyinginput/outputannotationtypesforcomponents(REQ-10).
2. Additional requirements – where all (or nearly all) of the products are non-compliant, thussignificantchangesarenecessarytosatisfythem:• Non-technical information in component metadata, including citable publications (REQ-13),
componentcategory(REQ-8),associatedlicences(REQ-14)andlicenceaggregationforawholeworkflow(REQ-23).
InteroperabilityStandardsandSpecificationsReport
•••
Public Page34of54
• Handlingexternalresourcesusedinaworkflow:determiningasourceofaspecificannotationelement(REQ-26)orensuringre-usabilityofresourcesacrossdifferentplatforms(REQ-16).
• Lettingusersdecideonhowaworkflowisdeployed,whichmaybenecessarybecauseoflegalreasons by making sure a workflow engine doesn’t see the processed data (REQ-20) ordownloadingcomponentsforlocaluse(REQ-28).
• Writingdocumentationforcomponentsthatlackit(REQ-12).3. Vaguerequirements–whereadditionalworkisnecessarytoclarifytheformulation,sothatitwill
becomeevidentwhatactionsarenecessary:• Componentsdeclaringtheirannotationschema(REQ-9)andenvironmentalrequirements(REQ-
5).• Statelessnessofcomponents(REQ-17).• Uniformworkflowdescriptionlanguage(REQ-18).
Prototypes–Duringthepresentreportingperiod,WG4hascreatedOpenMinTeDScriptasaprototypetoinvestigateinteroperabilityissuesincross-platformworkflows(i.e.workflowsinvolvingcomponentsfrom UIMA and from GATE). This was a necessary step in order to deepen the discussion aroundinteroperabilityinworkflowstowardsthegenerationofconcreteinteroperabilityrequirements.Italsocan serve as a temporary research substitute for theOpenMinTeDworkflow service,which is to bedeliveredlaterintheproject.Infact,weexpectthatpartsofOpenMinTeDScriptcanbetransferredintothe design and implementation of the OpenMinTeD workflow service. As such, we shall continueevolvingthisprototypeduringthenextreportingperiod,inparticularincorporatingmoreplatforms(e.g.web-servicesfromILSP,fromUNIMAN,orfromLAPPSGrid).Thisalsoentailsintensifiedinvestigationintothedatatransformationprocessesnecessarytobridgethetechnicalandsemanticgapsbetweenthedifferentplatforms.Table8–WG4summaryofactionstoimprovecompliance
Action Product Relatedrequirements
ContinueevolvingtheOpenMinTeDScriptprototypewith a focus on the integration of additionalplatformsandondatatransformation
all AllWG4requirements
Improveexistingcomponentmetadata(UIMAXML)modeltomakeitavailablebothfromitssourceandseparately.
Alvis,Argo 2,3
Agreeonaformatofidandversionforcomponentsandapplyit.
all 6
Add functions to export all the configurationparametersofaworkflowasasinglefile(considereddifficultbecauseofarchitecturallimitations).
Alvis,Argo,ILSP
22
Add changes to execution environment and userinterface that would enable running workflows ascomponents
Argo
24
InteroperabilityStandardsandSpecificationsReport
•••
Public Page35of54
Action Product Relatedrequirements
Revise implementations andmetadata for existingcomponentstomakesuretheyspecifyinput/outputtypes
Alvis,Argo
10
Expand the component metadata schemata toincludealldesiredadditionalfields.Thatseemstobefairlyeasilyachievableinallproducts,butitrequiresalotofefforttofillinthatinformation(esp.licences)forallexistingcomponents.
all
8,13,14
Unifyhandlingofresourcesandmodels. all 16,26
Prepare the execution model in a way thatguarantees that a user may choose where theprocessinghappens,which is importantbecauseoflegal restrictions. Achieving this is consideredpossiblebutdifficult,asitrequiresmajorchangesinthesystems.
Alvis,Argo,GATE,ILSP
28
Write documentation for undocumentedcomponents.
Alvis,Argo,ILSP 12
Offer classes for component authors to extend, sothattheywillhandlefailuresproperly
Argo 27
Define exactly what kind of type system andenvironmental information is necessary, how it’sgoingtobeused,andhowtoencodeit.
Alvis,Argo,DKProCore
5,9
Define (or choose) a workflow representationlanguagetobeusedbeforewecanassessabilitytocomplywiththis
all 18
InteroperabilityStandardsandSpecificationsReport
•••
Public Page36of54
7. List of attachments • DetailedInteroperabilitySpecificationv1
o https://openminted.github.io/openminted-site/releases/interop-spec/1.0.0/openminted-interoperability-spec.html
• DetailedInteroperabilityScenariosv1o https://openminted.github.io/openminted-site/releases/interop-
spec/1.0.0/openminted-interoperability-scenarios.html• Detailedtypesystemalignmentv1
o CanpresentlynotbeincludedasPDFbecauseoftechnicalreasonso https://openminted.github.io/openminted-site/releases/interop-
spec/1.0.0/typealignment.html• DetailedoverviewofcomponentsfrompartnersinvolvedinWG4v1
o CanpresentlynotbeincludedasPDFbecauseoftechnicalreasonso https://openminted.github.io/openminted-site/releases/interop-
spec/1.0.0/components.html• OpenMinTeDMetadataScheme
o https://openminted.github.io/openminted-site/releases/omtd-share/1.0.0/html/index.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page37of54
8. Appendix
8.1 OpenMinTeD Component Classification (Draft) Thissectionprovidesanoverviewofthedraftcategorisationsystemforcomponents.Thisisanexcerptfromawork-in-progressdocument.Theactualdocumentalsoincludesinformationonhowtomapthesecategoriestoothercategorisationsystems,e.g.theMETA-SHAREvocabulary.
CategoryLevel1
CategoryLevel2
CategoryLevel3
CategoryLevel4
Description(notformaldefinition!)
AccessComponent Reader Acomponentthatreadscontentofvarioustypes
(pdf,txt,xmletc.) Writer Acomponentthatwritesprocessingresultsin
variousformatsSupportComponent Acomponentthatprovidessupportto
developers Visualiser Acomponentorinterfacethatrendersthe
contentsofaresourceinagraphicwayforvisualisationpurposes
Debugger Acomponentthathelpsinthedebuggingprocess Validator Acomponentusedtoconfirmthata
system/resourcemeetsthespecificationsandfulfillsitsintendedpurpose
Viewer Acomponentthatprovidesaccesstothecontentsofaresourcebutintendedonlyforaccessbyhumans
CorpusViewer Acomponentthatprovidesaccesstothecontentsofacorpusbutintendedonlyforaccessbyhumans
LexiconViewer Acomponentthatprovidesaccesstothecontentsofalexical/conceptualresorucesbutintendedonlyforaccessbyhumans
Editor Acomponentthatallowshumanstoeditthecontentsofaresource
MLTrainer Acomponentthatisusedintrainingmodelsformachinelearning
MLPredictor Acomponentthatisusedinpredictingbasedonmachinelearningmodels
FeatureExtractor Acomponentthatisusedforextractingfeatures DataSplitter Acomponentthatperformsdatasplittingfor
crossvalidationpurposes
InteroperabilityStandardsandSpecificationsReport
•••
Public Page38of54
DataMerger Acomponentthatsupportsdatamergingfromvarioussources
Converter Acomponentthatperformsconversionbetweenformatsofaresource
Evaluator Acomponentthatisusedintheevaluationoftheperformanceofacomponent
FlowController Acomponentthatsupportscontrollingflows Script-BasedAnalyzer Acomponentthatperformsanalysistasksbased
onascript Matcher Acomponentthatallowsmatching(mapping)of
elements Gazetteer-based
MatcherAcomponentthatallowsmatchingofelementsbasedonagazeteer
CrowdSourcingComponent Acomponentthatsupportscrowdsourcingoperations
DataCollector Acomponentthatcollects(retrieves)datafromvarioussources
Crawler Acomponentthatcrawlsthewebandcollectsdatafromvariouswebsites
Processor Acomponentthatisusedinprocessingoperations
Annotator Acomponentthatannotatesanylanguagedata(text,video,audioetc.),i.e.addsanydescriptiveoranalyticnotations(structural,linguistic,etc)torawlanguagedata
Segmenter Acomponentthatsegmentsatextintostructuraluntis(chapters,paragraphs,sentences,words,tokensetc.)
Stemmer Acomponentthatextractsstemsfromwordsinatext,usuallybyremovingthecommonermorphologicalandinflectionalendingsfromwords
Lemmatizer Acomponentthatannotatesthetokensofatextwithlemmainformation
MorphologicalTagger Acomponentthatannotatestokensofatextwithmorphologicalinformation(part-of-speechandmorphologicalfeatures)
Chunker Acomponentthatgroupstokensofatextintochunks
Parser Acomponentthatparsessentencesand? Coreference
AnnotatorAcomponentthatannotatestokensofatextwithcoreferenceinformation,i.e.annotating
InteroperabilityStandardsandSpecificationsReport
•••
Public Page39of54
expressionsthatrefertothesameentityinthetext
NamedEntityRecognizer
Acomponentthatseekstolocateandclassifyelementsinatextintopre-definedcategoriessuchasthenamesofpersons,organisations,locations,expressionsoftimes,etc.
SemanticsAnnotator Acomponentthatannotatesthetokensofatextwithsemanticfeatures
SRLAnnotator
AcomponentthatannotatesthetokensofatextwithSemanticRolelabels
ReadabilityAnnotator
Acomponentthatannotatesthetokensofatextwithreadabilityscores
Aligner Acomponentthatdetectsandannotatesequivalencerelationsbetweenitems(corpora,texts,paragraphs,sentences,phrases,words)intwolanguages
Generator Acomponentthatgenerates(semi-)automaticallynaturallanguagetexts(basedonnon-linguisticdata,keywords,logicalforms,knowledgebases...)
Summarizer Acomponentthatproducesanaturallanguagesynopsisofalongertext
Simplifier Acomponentthatoutputsasimplerrenditionofagivenitem(sentence,textetc.)
preOrPostProcessingComponent Acomponentthatisusedatpre-orpost-processingstagesinordertonormalizeinput/output
SpellingChecker Acomponentthatcorrectsspellingmistakesinatext
GrammarChecker Acomponentthatcorrectsgrammaticalmistakesinatext
Normalizer Acomponentthatremovesunwantedmaterialfromtext,usuallyasapre-processingstep
Filters Analyzer Acomponentthatisusedforanalysinganinput
textinordertoperformextractionoffeatures/information(e.g.wordlist),orcharacterisationofthewholetext
TopicExtractor Acomponentthatguessesthetopicofatext DocumentClassifier Acomponentthattriestoclassifyadocument
intooneormorecategories LanguageIdentifier Acomponentthatidentifiesthelanguageofa
giventextbasedonitscontents
InteroperabilityStandardsandSpecificationsReport
•••
Public Page40of54
SentimentAnalyzer Acomponentthattriestoidentifysentencesthatexpresstheauthor’snegativeorpositivefeelingsonsomething;(Sentimentanalysis(alsoknownasopinionmining)referstotheuseofnaturallanguageprocessing,textanalysisandcomputationallinguisticstoidentifyandextractsubjectiveinformationinsourcematerials(wikipedia))
KeywordsExtractor Acomponentthattriestoextractkeywordsfromagiventext
TermExtractor Acomponentthattriestoextracttermsfromacorpus
ContradictionDetector
Acomponentthattriestoautomaticallyrecogniseelementsthatrevealcontradictioninatext
EventExtractor Acomponentthattriestoextractinformationrelatedtoincidentsreferredtoinatext
Persuasive ExpressionMiner
Acomponentthattriestoidentifypersuasiveexpressionsinagiventext
InformationExtractor Acomponentthatautomaticallyextractsstructuredinformationfromunstructuredand/orsemi-structuredmachine-readabledocuments
LexiconExtractorFromCorpora
Acomponentthatextractsstructuredlexicalresourcesfromcorpora
LexiconExtractorFromLexica
Acomponentthatextractsspecificlexicalinformationcontainedinotherlexica
WordSenseDisambiguator
Acomponentthattriestoidentifywhichsenseofaword(i.e.meaning)isusedinasentence,whenthewordhasmultiplemeanings(Source:wikipedia)
QualitativeAnalyzer platform Atechnologythateasesthedevelopmentofnew
toolsandservicesintheNLPfieldinfrastructure architecture Atechnologythatsupportstheflexible
developmentofNLPapplications,togetherwithalltherequestedresources
NLPdevelopmentenvironment Atechnologythatsupportsthedevelopmentofdataresources,likelexicons,grammars,corpora,etc.CanbeincludedinanArchitectureorinaPlatform
other
InteroperabilityStandardsandSpecificationsReport
•••
Public Page41of54
8.2 WG1 - Inventory of metadata schemas and related efforts Title Fulltitle Type Description Publications Lexica,
ontologies, etc.
Corpora S/w tools
Web services
Workflows Comment
UIMAComponentdescriptors
TDMcomponents
Describelanguageprocessingcomponentsandtheirparameters
no no no yes ? yes (aggregate components)
already used by partners
bibo1 publications ontologyforbibliographiccitations
yes yes no no no no popular but similar to other bibliographic resources
DC/DCMI2 DublinCoreMetadataInitiative
general metadataschemafordigitalresources
yes yes yes yes no no the most widespread at least for exchange purposes
ALVEO3 AVirtualLabforHumanCommunicationScience
languageresources
infrastructureforfinding,accessingandprocessingdatasetsforNLP
no no yes no no yes metadata mostly for linguistics; interesting mainly for galaxy
OLAC4 OpenLanguageArchives
languageresources
repository&metadataschemaforlanguageresources
no yes yes no no no metadata very general, based on DC; mainly for OAI-PMH
TEI5 TextEncodingInitiative
languageresources
metadataschemaforencodingexternal&internalinformationoftextresources
yes yes yes no no no metadata mainly for humanities; more interesting for external structure
CERIF6 CommonEuropeanResearchInformationFramework
projects,people,publications
schemaforresearchentities,coveringprojects,funding,researchers,researchorganisationsetc.
yes yes yes no no no used already by OpenAIRE; mainly for research entities; might be interesting for satellite entities
CrossRef7 publications providerofDOI'sforcitation,linking&access;mainlyforpublications
yes no no no no no widespread for publications
1http://bibliontology.com/2http://dublincore.org/documents/dcmi-terms/3http://alveo.edu.au/4http://www.language-archives.org/OLAC/metadata.html5http://www.tei-c.org/Guidelines/P5/6http://eurocris.org/cerif/main-features-cerif7http://www.crossref.org/&http://doi.crossref.org/schemas/unixref1.0.xsd
InteroperabilityStandardsandSpecificationsReport
•••
Public Page42of54
Title Fulltitle Type Description Publications Lexica, ontologies,
etc.
Corpora S/w tools
Web services
Workflows Comment
JATS1 JournalArticleTagSuite
publications XMLtagsforjournalarticles;schemaforcontentsofpublications
yes no no no no no mainly for structure & article types
OpenAIRE2 OpenAIRE publications aggregator;schemaforpublications&researchdata;guidelinesforliteratureproviders,dataarchives&CRISmanagers
yes no no no no no for open access publications; used by partner
BetaSHAREMetadataSchema3
general metadataschemaforresearchobjects
yes no yes no no no general for exchange
MavenProjectObjectModel4
softwareandresourceartifacts
Describessoftwarelibrariesandothersoftwareartifacts(canalsoberesourcepackages)intheJavaworld.
no no no yes no no used by partners
MARC215 MARC21formatforbibliographicdata
publications MARC21providesacompletebutcomplexdescriptionofbibliographicmetadatausingcodenumberstodescribedata;fordifferenttypesofprintedmaterialsanddigitalmedia
yes no no no no no MARC21 provides a complete but complex description of bibliographic metadata using code numbers to describe data. MARC21 presents some inconveniences, such as its high complexity and its inability to be easily read by humans.
FaBiO6 FRBR(functionalrequirementsforbibliographicrecords)-alignedBibliographicOntology
publications anontologyforrecordingandpublishingontheSemanticWebdescriptionsofentitiesthatarepublishedorpotentiallypublishable,andthatcontainorarereferredtobybibliographicreferences,
yes no no no no no FaBiO allows for the semantic description of a variety of bibliographic objects, such as research articles, journal articles, and journal volumes, to clearly separate each part of the publishing process, the people
1http://jats4r.org2https://guidelines.openaire.eu/en/latest/3https://osf.io/wur56/wiki/Schema/4https://maven.apache.org/pom.html5http://www.loc.gov/marc/bibliographic/6http://www.sparontologies.net/ontologies/fabio/source.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page43of54
Title Fulltitle Type Description Publications Lexica, ontologies,
etc.
Corpora S/w tools
Web services
Workflows Comment
orentitiesusedtodefinesuchbibliographicreferences.
involved in the publication process, and the various versions of documents (electronic or physical); may check for next version
EDAMontology1 EMBRACEDataandMethods
bioinformatics ontologyofwellestablished,familiarconceptsthatareprevalentwithinbioinformatics,includingtypesofdataanddataidentifiers,dataformats,operationsandtopics
yes no no no yes no seems to cover datasets, web services & publications; domain-specific; check for use cases later
CMDI2 ComponentMetadataInitiative
languageresources
metadataschemamodellerforlanguageresources®istryformetadatacomponents&profiles
no yes yes yes yes no wide range of profiles; check specific profiles later on
MetaShare3 MetaShare languageresources
repository&metadataschemaforlanguageresources
no yes yes yes yes no widespread for language resources
CORE4 COnnectingREpositories
publications aggregatesscholarlypublications(metadataandfull-textcontent)thatisavailableasOpenAccess;itlookslikethemetadatacomesfromOLAC/DC(giventheOAI/PMHprotocol)
yes no no no yes no for publications; used by OU
swso5 SemanticWebServicesOntology
webservices ontologyforwebservices no no no no yes yes from Semantic Web; to check more thoroughly together with workflows for next version
DCAT6 DataCatalogue datasets schemaforcataloguesanddatasets
no no yes no no no general for publishing catalogues
1http://edamontology.org/page2http://media.dwds.de/clarin/userguide/text/metadata_CMDI.xhtml3http://www.meta-share.org/portal/knowledgebase/home4http://core.ac.uk5http://www.w3.org/Submission/SWSF-SWSO/6http://www.w3.org/TR/vocab-dcat/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page44of54
Title Fulltitle Type Description Publications Lexica, ontologies,
etc.
Corpora S/w tools
Web services
Workflows Comment
CC-REL1 CreativeCommons-REL
licensing ontologyforlegalmetadata
N/A N/A N/A N/A N/A N/A complementary to legal metadata, if WG3 decides to go for machine readable licences
ODRL2 OpenDigitalRightsOntology
licensing ontologyforrepresentinglegalrights
N/A N/A N/A N/A N/A N/A complementary to legal metadata, if WG3 decides to go for machine readable licences
LREMap3 LREMap languageresources
user-provideddescriptionsoflanguageresources
no yes yes yes no no general, free values; user filled in
CCR4 CLARINConceptRegistry
metadata(external&linguistic)
registryformetadataelementsandvalues
no yes yes yes yes yes follow-up of ISOcat; good for checking, but not all elements and values are validated; difficult to select those that are needed fro external metadata
Prov-O5 provenance ontologyforprovenanceinformation
N/A N/A N/A N/A N/A N/A for provenance information regardless of resource type; check for next version
DataCite6 publications citationofdatasets;DOI's;collaborationwithCrossRef
yes yes yes yes no no focusing on citation rather than description
DOI7 DigitalObjectIdentifier
publications providerofPID's N/A N/A N/A N/A N/A N/A
RIOXX8 publications metadataapplicationprofile&guidelinesforresearchpublications,incl.researchgrantsetc.;mappingtoOpenAIRE
yes no no no no no
1https://wiki.creativecommons.org/wiki/CC_REL2http://www.w3.org/ns/odrl/2/3http://www.resourcebook.eu/searchll.php4http://www.clarin.eu/ccr/5http://www.w3.org/TR/prov-o/6https://www.datacite.org/7http://www.doi.org/8http://rioxx.net/guidelines/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page45of54
Title Fulltitle Type Description Publications Lexica, ontologies,
etc.
Corpora S/w tools
Web services
Workflows Comment
swrc1 publications ontologyformodellingentitiesofresearchcommunitiessuchaspersons,organisations,publications(bibliographicmetadata)andtheirrelationships
yes no no no no no for satellite entities mainly
DOAJ2 publications DOAJ(DirectoryofOpenAccessJournal)articleformat
yes no no no no no DC-based;
EDM3 EUROPEANADataModel
culturalheritageobjects
N/A N/A N/A N/A N/A N/A not so interesting for our scope; keep in mind only for the licensing model
NLM4 NLM(NationalLibraryofMedicine)JournalArchivingandInterchangeTagsuite
publications yes no no no no no obsolete; continues as JATS;
NISO5 NationalInformationStandardsOrganisation
standards standardsforcontentpublishers,libraries&s/wpublishers
yes no no no no no general link to standards
HandlePID PID PID's providerofPID's N/A N/A N/A N/A N/A N/A LAPPSGrid6 Language
ApplicationGridwebservices open,interoperableweb
serviceplatformfornaturallanguageprocessing(NLP)researchanddevelopment
no no yes yes yes yes for components and type systems; checked by WG4
CREOLEdescriptors7
TDMcomponents
no no no yes ? yes already used by partners
1http://ontoware.org/swrc/2https://doaj.org/features3http://pro.europeana.eu/page/edm-documentation4http://dtd.nlm.nih.gov/5http://www.niso.org/standards/6http://www.lappsgrid.org/7https://gate.ac.uk/sale/tao/splitch4.html
Public Page46of54
8.3 WG3 – Compatibility Matrix: Summary
8.3.1 Preliminary considerations Thetermsofcopyrightlicencesareoftenunclearandnotstandardized,withtheconsequencethataneffectiveandinteroperableuseofresourcesthroughTDMisextremelylimited.AmongtheobjectivesofOpenMinTeD,licencestandardisationandinteroperabilityisofcrucialimportance.TheaimoftheWG3istoprovideguidanceforuserswhowishtoundertakeTDMactivitiesbydevelopingatoolthatwouldhelpovercomingtheambiguityofthecurrentsetting.ThemainideaoftheWG3istodraftaCompatibilityMatrixthatconsiders
a) thetypeofcontent,b) thetypeoflicence,resultingc) inwhetherthereiscompatibilityornotamongdifferentlicences.
Thismatrixshouldhelpuserstoshareanddistributetheirresourcesundertheappropriatelicenceand,atthesametime,tocomprehendthelegalimplicationsofchoosingoneortheotherlicence.
8.3.2 License Compatibility Tools InordertodeveloptheOpenMinTeDCompatibilityMatrix,WG3hascollectedexistingrelatedwork.Thefirst step was to collect a list of examples of how to provide or represent graphically informationregardinglicenceconditionsand,forsome,compatibility.Foreachexample,mainfeaturesandlimitshavebeenindicated.Exemplarytoolscanbegroupedintothreemaincategories:
1. licencecalculatorsorselectors;2. licencedescriptors;3. comparativetablesandgraphics.
Thefirstcategoryincludestoolsthatuseslightlydifferentsetsofcriteria(licensingconditions)fortheclassificationof licencesandendupwithanumberof licencesthatsatisfythesecriteria.Thesecondoffersarepresentationoflicenceswithvisualisationofbasiclicensingconditions.Thethirdprovidesafigurativeillustrationoflicencesandlicencerules.
8.3.2.1 License Calculators/Selectors Amongthisfirstgroupoftools,thefollowingexampleshavebeenconsidered:LICENTIA¸ELRALicenceWizard;LINDATOpenLicenceSelector;CLARINLicencecategorycalculator;RDFRepresentationoflicences;OSSWATCHLicencedifferentiator.LICENTIA1isasuiteofservicesthatsupportusersinfindingasuitablelicencefortheirdata.Itisalicencecalculator/selectorbasedonODRLrepresentationsofcommonlyusedlicencesandcanbeusedinthreemodes: users select conditions of use (obligations, permissions and prohibitions) and compatible
1http://licentia.inria.fr/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page47of54
licencesareshown;usersselectalicenceandseewhetherit'scompatiblewithcertainconditionsofuse;usersselectalicence,viewitwithavisualisationtoolandexportaRDFrepresentation.Thetoolappearseasytouseifusersknowtheirpreferencesintermsofpermissions,obligationsandprohibitions.However,thepartitionintopermissions,obligationsandprohibitionsmaybetrickyifnotsupportedbyaclearlegaldefinition.Besides,itdoesnotallowmultiplechoicesanditdoesnotprovideabroaderillustrationoflicences.ELRALicenceWizard1isawebconfiguratorthatenablestochooseamonganumberoflegalfeaturesandconsequentlyobtainasuitablelicencedistributecontentsadjustedtotheirselection.Itcovers24licences(ELRA,CreativeCommonsandMETA-SHARE)whichareclassifiedaccordingtoninecriteria(e.g.usetype,whetheritrequireselectronicsignaturesetc.).Thetoolguidesuserstomaketheirchoiceswiththehelpofexplanatorytext.Italsoallowstheusertostatemultiplepreferences.Itprovidesabroaderpictureofavailablelicencesbasedonusers'selection.Thelanguageusedtoguideusers'choicesisnothoweveralwaysclear(explanationsunderthequestionmarkthataimstoexplainthecriteriaarenotalwaysclear)anditdoesnotalwayshaveacorrespondinglegalmeaning(see,for instance,thedistinctionbetweenimplicitandexplicit).Moreover, itdoesnotprovideagraphicalillustrationthatcouldhelpuserstobettervisualizethelicences'rules.Similarly, LINDAT Open Licence Selector 2 asks users a number of questions (based on licensingconditions,again)andconcludeswithasetoflicencesthatmatchhis/heranswersandwhichtheusercanuseforhis/herresource.Thetoolisquiteappealingintermsofinterface,alsoallowingafreesearchandprovidesasummaryofpotentiallyapplicablelicences.Nevertheless,itdoesnotallowmultiplechoicesneitheritreallyguidesusers(especiallynon-professionalusers)tomaketheirchoice.Another similar tool isCLARIN Licence category calculator3, which suggests a number of labels forconditionsofuse(aka“LaundryTags”).IthelpsuserstoclassifythelicencethattheywouldliketouseforacertainresourceaccordingtotheCLARINlicencecategories.Iftheuserhasnotchosenalicence,italsoprovidesalinktoa"ready-made"legaltextconformantwiththelicensingconditionstheuserhasselected.Iftheconditionsdonotrequireuseridentification,itprovidesalinktotheLINDATOpenLicenceSelector.Thetoolguidestheusersprovidinganumberofconditionsofuseidentifiedbylabels.However,itseemstobeconfinedtoCLARIN,asitimpliescertainspecificationsthatarenarrowedtoCLARINcategories.Inaddition,themeaningofeachconditionistoosyntheticandmoreexamplescouldbeadded.Finally,OSSWATCH Licence differentiator4aims at helping users to understand their preferences inrelationtofreeandopensourcesoftwarelicences.Itguidesusersspecifyingindetailthecontentoftheirchoices.Itmakesalsoexplicitthatusersfullyreadand understand their chosen licence (by stating that “it is no substitute for reading the licences1http://wizard.elra.info/index.php2http://ufal.github.io/public-license-selector/3http://www.helsinki.fi/finclarin/calculator/ClarinLicenseCategory.html4http://oss-watch.ac.uk/apps/licdiff/
InteroperabilityStandardsandSpecificationsReport
•••
Public Page48of54
themselves [and] the classifications of licence type that enable this tool to work are by necessitysomewhat reductive, and therefore output of this tool cannot andmust not be thought of as legaladvice”),whichisoftennotthecaseformostofuserswhodonothavealegaltraining.
8.3.2.2 License Descriptors Amongthelicensedescriptors,thedatasetprovidedbyRDFRepresentationoflicences1contains126licensesthatareexpressedasRDF,whilelicensescanbealsoaccesseddirectly.Thetoolisarepresentationoflicenceswithvisualisationofbasiclicensingconditions:asetofcommonlyusedlicenceswiththeirRDFrepresentation(ODRL&CC-REL).Whilethedatasetcontainsmanylicenses,itdoesnotprovideguidancetousers.Likewise,RDFLicensedataset2alsoincludeslicencesrepresentedinRDF.Similartotheprevioustool, it ismakesuseoftheMS-rightsvocabulary3. Itcontainsthesamelistoflicenses as the previously mentioned dataset, but it adds the value of including a keyword- basedsummary.Atthesametime,conditionsofusecouldbeexpanded.
8.3.2.3 Comparative Tables and Graphics The first example considered is theMETA-SHARE Table4,which appears in theD6.1.1META-SHAREReport related to the use of language resources (LRs) and language technologies (LTs) within theframeworkofMETA-SHARE.Thetoolhastheaimtocoverasmanyelementsaspossibleandcondenseit intoaconcisegraphicalrepresentation.LimitedtoELRA,LDC(&NIST),CClicenses,ityetprovidessomeunclearinformation(e.g.withref.to"Remark"or"Implicit/Explicit"),thereforeitisnotalwaysstraightforwardtofollowandriskstoconfuseuserstosomeextent.A similar tool is theORACLETable5,which is intended to compare themajorattributesof themostpopularFreeandOpenSourceSoftwarelicenses.Thechartcomparesandgraphicallyrepresentsanumberoflicenses,withtheaimtovisuallycomparethemain featuresofmostpopular freeandopensource software licenses.However, it appears toosynthetic and its own compiler understands the related limits acknowledging the difficulty of fullyunderstandthedifferencesamonglicenses.Anothercomparativegraphictool istheGNUTable6,whichknowinglyaimsatcoveringalsotheNewCompatibleLicenses.InadditiontotheGNUlicenseslist,thegraphicillustratessomelicensingrulesinrelationtonewcompatiblelicenses.
1https://datahub.io/dataset/rdflicense2http://rdflicense.appspot.com/3http://purl.org/NET/ms-rights4http://www.meta-net.eu/public_documents/t4me/META-NET-D6.1.1-Final.pdf5https://blogs.oracle.com/davidleetodd/entry/free_and_open_source_license6http://www.gnu.org/licenses/quick-guide-gplv3.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page49of54
ThechartaimsatclarifyingthecompatibilityofanumberoffreesoftwarelicenseswithGPLandnowalsoGPLv3.Althoughitoffersaquiteclearandschematicpictureoftherelationsamonglicenses,thescopeofthechartisinevitablytoonarrow.Toconcludewith,WG3hasconsideredalistofothergraphicalrepresentations,suchasTLDRLegal1andGitHubTool2,butalso-althoughofdifferentnatureandscopetoolsliketheEuropeanPublicDomainCalculator3andPublicDomainSherpa4,theUScopyrighttermcalculator.Regarding these specific tools, during the previous conference callswithWG3 internal and externalexperts, it emerged that, although they cannot be considered precisely applicable to the scenariosconsideredbyOpenMinTeDnorpreciselytotheextentthattheycouldbedirectlyappliedtotheWG3CompatibilityMatrix,theystillofferabasisforcomparisonandthereforetheycanbeconsideredsimplyasexamples,especiallyintermsofthemethodologybehindthemandtheirgraphicalinterface,tolookatwhendevelopingamoreOpenMinTeDtailoredtool.
8.3.3 The OpenMinTeD Compatibility Matrix (CM) AswellarguedbyLabropoulou,PiperidisandMargoniintheframeworkoftheLREC2016WorkshoponCross-PlatformTextMiningandNaturalLanguageProcessingInteroperability:
“In the field of TDM it is important to properly address the licence compatibility issue by employing a “multi-layer licence approach”. The starting point is of course to focus on just one “layer”, e.g. content licences or software licences or terms of use, and try to resolve compatibility issues “within” the same type of licences. This means to verify the compatibility of the same kind of licences in order to determine whether two or more content licences can be combined, or two or more software licences can be combined. A multi-layer approach applies the same compatibility principle across the 3 categories identified (content licences, tools or software licences, and service agreements). In this way, it will be possible to develop an interoperability model or matrix that is not limited to content, tools or services individually considered, but that, by taking a holistic approach, is able to offer a more complete analysis of the licence compatibility issues faced by TDM researchers. In other words, this formulation, instead
1https://tldrlegal.com/andhttps://tldrlegal.com/compare2http://choosealicense.com/3http://archive.outofcopyright.eu/calculator.html4http://www.publicdomainsherpa.com/calculator.html
InteroperabilityStandardsandSpecificationsReport
•••
Public Page50of54
of taking a theoretical legal approach, puts at its centre the needs and the skills of TDM researchers, who usually are not legally trained”.1
AfirstdraftoftheCMsfor(1)contents,(2)software,and(3)termsofuse,couldlookatfirstlikethefollowing.Itisimportanttonoteatthispointthatthethirdcolumn(“Aretheycompatible?”)referstothepossibilitytocombinethesubjectmatterofcolumn1and2inawaythatundercopyrightlawtheyformasocalled“derivativework2”.Whenthecombinationoftwoworksdoesnotleadtothecreationofaderivativeworkthereshouldbenorestrictiontothepossibilitytocombinethem.Nevertheless,there are caseswhere thedifference is not clear cut and the specific terminology employedby thelicences can become decisive. There are instances, however, where two licences interpret theirrespectivetermsindifferentways.Whenthisisthecase,thiswillbenotedinthe4thcolumn.Note:TablesTable9,Table10,andTable11includeonlysomeofthelicensestobeconsidered.Thesetablesarenowbeingsubstitutedwithtwoaxisgraphicalrepresentations(TablesTable12,Table13,andTable14).Thepresenttablesarestillinadraftversion.UpdatedversionsaretobeincludedwithD5.3.Table9-CompatibilityMatrix(draftversion1.0):Contents
LicenceforresourceA
LicenceforresourceB
Aretheycompatible?
Underwhichconditions?
CCBY4.0 CCBY4.0 Yes NorestrictionsCCBY4.0 CC-BY-SA4.0 Yes ResultsunderSA
CCBY4.0 CC-BY-NC4.0 Yes CCBY4.0 CC-BY-ND4.0 Yes CCBY4.0 CC-BY-NC-SA4.0 Yes ButSACCBY4.0 CC-BY-NC-ND4.0 No CC-BY-ND4.0 CC-BY-ND4.0 No CC-BY-SA4.0 CC-BY-SA4.0 Yes e.g.BY-SA1.0onlywithBY-SA1.03CC-BY-SA4.0 CC-BY-NC4.0 Yes BothrestrictionsapplyCC-BY-SA4.0 CC-BY-NC-SA4.0 No MSCommonsBY MSCommonsBY Yes
1P.Labropoulou,S.Piperidis,T.Margoni.LegalInteroperabilityIssuesintheFrameworkoftheOpenMinTeDProject:AMethodologicalOverview(Abstract),inR.EckartdeCastilho,S.Ananiadou,T.Margoni,W.Peters,S.Piperidis(eds.).ProceedingsofLREC2016,TenthInternationalConferenceonLanguageResourcesandEvaluation,May23-28,2016,Portorož,Slovenia,LREC2016WorkshoponCross-PlatformTextMiningandNaturalLanguageProcessingInteroperability,p.62,availableat:http://www.lrec-conf.org/proceedings/lrec2016/index.html.2Derivativeworkisanexpressionthatmayvaryfromlegalsystemtolegalsystem.Infact,thetermisnotpresentineveryCopyrightAct.Weusethisdefinition:“Aworkthatisbaseduponorotherwisederivedfromanotherworkoranumberofworks,inparticularbymeansofadapting,editing,modifying,translatingthepre-existingwork/sregardlessofthemediumused.”3https://wiki.creativecommons.org/wiki/ShareAlike_compatibility
InteroperabilityStandardsandSpecificationsReport
•••
Public Page51of54
Table10-CompatibilityMatrix(draftversion1.0):Software
LicenceforsoftwareA
LicenceforsoftwareB
Aretheycompatible?
Underwhichconditions?
GPLv3 GPLv3 Yes GPLv3 GPLv2 No UnlessGPLv2“oranylaterversion”GPLv3 Apachev2 Yes TheLicencesarecompatibleaslongas
derivativeworksaredistributedunderGPLv3.Apachefoundation1pointsoutthatApachev2softwaremaybeincludedinGPLv3projects,butNOTviceversa
GPLv3 EPL No (bothcopyleft)GPLv3 LGPLv3 Yes
GPLv2 GPLv2 Yes GPLv2 Apachev2 No GPLv2 EPL No Apachev2 Apachev2 Yes Apachev2 EPL Yes Apachev2 LGPLv2 No Apachev2 LGPLv3 No
Table11-CompatibilityMatrix(draftversion1.0):TermsofService
ToSforServiceA ToSforServiceB Aretheycompatible? Atwhatconditions?
GoogleTranslate2 Googlesearchengine3 Yes GoogleSearchengine Twitter4 Yes Theirservices
interactatdifferentlevels,e.g.Twittercontentsbeingindexedinthesearchengine
GoogleSearchengine Facebook5 Yes However,someconflictmayariseatsomepoint(asitoccurswithprivacy)
1Thisconditionisspecifiedbythestewardofthelicense.
2https://cloud.google.com/translate/v2/terms3https://www.google.com/policies/terms/4https://dev.twitter.com/overview/terms/agreement-and-policy5https://developers.facebook.com/policy
InteroperabilityStandardsandSpecificationsReport
•••
Public Page52of54
ToSforServiceA ToSforServiceB Aretheycompatible? Atwhatconditions?Twitter Facebook Yes Aslongasthe
licensestousetheirservicesarenotconflicting
TwitterFacebook
Dropbox1 Yes Aslongasthelicensestousetheirservicesarenotconflicting
CLARIN2 TwitterFacebook
No NothirdpartyaccesstoCLARINresources
Mendeley3 SSRN4 Yes
Mendeley Zotero5 Yes ContentMine6 Mendeley Yes Table12-CompatibilityMatrix(draftversion2.0):Concent
CC-0 CC-BY4.0 CC-BY-NC4.0 CC-BY-SA4.0 CC-BY-ND4.0
CC-BY-NC-ND4.0
CC-BY-NC-SA4.0
CC-0 Yes,norestrictions
Yes,norestrictions
Yes,resultsunderNC
Yes,resultsunderSA
CC-BY4.0 Yes,norestrictions
Yes,norestrictions
Yes,resultsunderNC
Yes,resultsunderSA
CC-BY-NC4.0
Yes,resultsunderNC
Yes,resultsunderNC
Yes,resultsunderNC
Yes,resultsunderbothrestrictions
CC-BY-SA4.0
Yes,resultsunderSA
Yes,resultsunderSA
Yes,resultsunderbothrestrictions
Yes,resultsunderSA
CC-BY-ND4.0
No
CC-BY-NC-ND4.0
No
CC-BY-NC-SA4.0
Yes,resultsunderSA
No
1https://www.dropbox.com/terms2https://www.clarin.eu/content/licenses-agreements-legal-terms3https://www.mendeley.com/terms/4http://www.ssrn.com/en/index.cfm/terms-of-use/5https://www.zotero.org/support/terms/terms_of_service6http://discuss.contentmine.org/tos
InteroperabilityStandardsandSpecificationsReport
•••
Public Page53of54
Table13-CompatibilityMatrix(draftversion2.0):Software
GPLv3 GPLv2 Apachev2 EPL LGPLv3 LGPLv2GPLv3 Yes No,unless
GPLv2“oranylaterversion”
Yes,butaslongasderivativeworksaredistributedunderGPLv3
No
GPLv2 No,unlessGPLv2“oranylaterversion”
Yes No No
Apachev2 Yes,butaslongasderivativeworksaredistributedunderGPLv3
Yes Yes No No
EPL No No LGPLv3 Yes No LGPLv2 No
Table14–CompatibilityMatrix(draftversion2.0):TermsofService
search
Translate
Twitter Facebook CLARIN Dropbox Mendeley SSRN Zotero Content
Mine
search
Yes Yes Yes Yes
Translate
Yes
Twitter Yes Yes No Yes
Facebook No
CLARIN No No
Dropbox Yes Yes
Mendeley Yes Yes Yes
SSRN Yes
Zotero Yes
Content
Mine
Yes