Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Semantic assets and challenges of ontologies management
Vasily.Bunakov<at>stfc.ac.ukScienceandTechnologyFacilitiesCouncil,UnitedKingdom
TheEMMCIntOP2018WorkshopinFreiburg,6-7November2018
TOC
• STFCandSCDbackground• SemanticAssetsforMaterialsScienceTaskGroup• Lessonsfromnano-foundriesmetadatadesign• Lessonsfromelsewhere• Suggestionsonfurthercommunication
STFC and SCD background
STFC in a nutshell
~1700permanentstaff~7500visitorscientistsannually
STFC Scientific Computing Department
• High Performance Computing • Petabyte data store • CERN LHC Tier 1 hub • Data management and
data analysis solutions
• Biology and Life Sciences • Engineering and Environment • Computational Chemistry • Theoretical and Computational
Physics
Seemoreatwww.stfc.ac.uk/SCD
Docomputationalscience:OperateanddevelopITinfrastructure:
ThisiswhereIcomefrom
Physical Sciences Data Service
• ServicetoprovidedataresourcestoUKChemistryandMaterialsScienceCommunity• Extendacurrentservice:http://cds.rsc.org/• ProvideUKAcademicaccesstocommercialchemicaldatabases
• UniversityofSouthamptonandSTFCtakingovertheservicefromJan2019• InitiallytransferringthecurrentservicefromtheRoyalSocietyofChemistry
• PlantodevelopthisasaDataScienceplatform• DevelopitasaresourcehubforPhysicalSciences• ExtendfromChemistry,toincludeMaterialsScience,ChemicalEngineeringandotherrelatedareas
• MoreOpenScienceresources• Provideaddedvalue–commonmetadata,crosssearch,accesstosoftware,training
• Computed(simulated)datasetsareidentifiedasapossibleterritoryfortheservicegrowth• Theadventofmoremachine-usableinterfacesisforeseen• RelationwithNISTimportant
Recent EU projects with the STFC SCD contribution
• EUDAT–researchdatainfrastructure• EOSC–EuropeanOpenScienceCloud• VIMMP(wellrepresentedinthisworkshop)• NFFA–NanoscienceFoundriesandFineAnalysis• FREYA–persistentidentifiersinsupportofOpenScience
WealsocontributetoanumberofRDAgroups,notablyResearchdataneedsofthePhotonandNeutronSciencecommunityIGandVocabularyServicesIG
Semantic Assets for Materials Science Task Group
Semantic Assets for Materials Science Task Group
• DevisedintheRDABerlinplenary(April2018),asaresultofdiscussionsbetweenSTFCandNIST• SetupwithintheRDAVocabulariesInteroperabilityIG• FirstonlinemeetinginMay2018,followedbymeetingsinJulyandSeptember• Veryopenandinclusivegroup• ~25inthemailinglist,~10-12atypicalattendance• VasilyBunakov(STFC)andZacharyTrautt(NIST)co-chair
Semantic Assets Task Group scope
• BuildinganinventoryofexistingsemanticassetsforMaterialsScience:ontologies,vocabularies,controlledtermslists,metadataschemes.Thiscanincludenotonlyvocabulariesaboutmaterialspersebutalsocoveradjacenttopics,sayinstrumentationandchemistry,thatarehighlyrelevantforMaterialscommunity.• Monitoringtechnologyforvocabulariesbuildingandvocabulariesmaintenance/updates/curationinMaterialsdomain• MonitoringusecasesandactualpracticesforsemanticassetsapplicationinMaterialsdomain.ThisincludesusingthemintheactualITservices.• Discussingformsofrepresentation/publishingforsemanticassets• Discussinginteroperabilitybetweenvocabularies:apossibilityforcross-walksorsensiblelinksbetweentermsfromdifferentvocabularies
Semantic Assets Task Group progress so far
• AgoodcommunicationchannelwithrepresentationfromEuropeandAmerica;liaisonwithJapan/NIMSrequiresdevelopment• FirstexperimentswithsemanticassetsregistrationusingNISTplatformhttp://schemas.nist.gov/• Workonacommonvocabularystarted• PotentialfortheF2FmeetingintheRDAPlenaryinPhiladelphia(April2019)• MovingfromtheRDAVocabulariesInteroperabilityIGtotheRDA/CODATAMaterialsData,Infrastructure&InteroperabilityIGispossible
Lessons from NFFA metadata design
NFFA in a nutshell
• IsaHorizon2020project• Givesaccesstodistributedinfrastructureforgrowth,nano-lithography,nano-characterization,theoryandsimulationandfine-analysiswithsynchrotron,FELandneutronradiationsources
• “Virtualresearchenterprise”withproposalssystemanddatamanagementobligation
Seemoreatwww.nffa.eu
“What artefacts we produce” and “How we discuss them”: Stages of NFFA metadata design
Commonvocabulary
ERdiagram
ListofMDelements
NFFAdiscussio
nandam
endm
ents
Externaldisc
ussio
nandam
endm
ents
Metadatainaserializedform(XML,JSON,RDF,…)
Othermetadata,vocabulariesandontologies
CODATA-VAMAS,NOMAD,RDA,…
NFFAite
rativ
ediscussio
n
An example of a semantic asset: A fragment of NFFA Common Vocabulary
• ResearchUser.Aperson,agroupofthem,oraninstitution(organization)whoconductExperimentonananoscienceFacilityusingananoscienceInstrumentinordertocollectandanalyzeRawData,orisinterestedindatacollectedoranalyzedbyotherResearchUsersonthesameorotherFacilities.
• Project.Anactivity,oraseriesofactivitiesperformedbyoneormoreResearchUsersononeormoreFacilitiesusingoneormoreInstrumentsfortakingoneormoreMeasurementsofoneormoreSamplesduringoneormoreExperiments.Facility,Instrument,MeasurementandSamplecanrefertocomputersimulationenvironment.
• Facility.Aninstitution(organization),oradivisionofitthatoperatesoneormorenanoscienceInstrumentsforResearchUsers.Forcomputersimulation,Facilitycanbeasoftwareplatformthatallowstoorderandmanagecomputationalexperiments(sothatthesoftwareplatformservesthepurposeofmanagingsoftwaremodulesthatcanbeconsideredvirtualInstruments).
• Instrument.Identifiableequipment(suchasadeviceorastandoraline)thatallowsconductinganindependentnanoscienceresearch,perhapswithoutinvolvementofotherInstruments.InstrumentishostedbyFacilityandusedbyResearchUser.InstrumentproducesRawDatainthecourseofExperiment.Instrumentcanbeinfactasoftwareforcomputersimulation(asoftwaremoduleor/andaparticularconfigurationofit).
An example of a semantic asset: ER diagram for NFFA metadata components
“No model is an island”: Mapping and gap analysis exercise
NFFA concept
CODATA-VAMAS concept
NOMAD concept
Experiment Nano-object production steps Series of software runs
Measurement Nano-object testing steps Software run
Sample Nano-object or collection of objects Input data
Data Asset Output data
Nanotechnology aspect
NFFA model
CODATA-VAMAS model
NOMAD model
Nano-object (sample) Conceptual Detailed Detailed
Computation Detailed Unaddressed Detailed
Experiment lifecycle Detailed Conceptual Conceptual
Data lifecycle Detailed Unaddressed Conceptual
Concep
tsm
apping
Mod
elsc
overage/g
aps
“Why do we do it at all”: A place of metadata in a (virtual) Enterprise Architecture
UseCases/BusinessAnalysis
Metadatadesign
ITArchitecturedevelopment
UseCases,ITArchitectureandMetadatacanbeconsideredpartsofa(virtual)EnterpriseArchitectureSeemoreaboutEnterpriseArchitectureathttps://en.wikipedia.org/wiki/Enterprise_architecture
Lessons from semantic modelling beyond Materials Science
Ontology for finance
200+organizations7000+professionals
Businessconceptualmodelofhowallfinancialinstruments,businessentitiesandprocessesworkinthefinancialindustry
www.edmcouncil.org
https://spec.edmcouncil.org/fibo/
FIBOisawell-governedprojectstartedcirca2010andsupportedbyawell-fedworld-wideorganization
Ontology for finance (continued): FIBO structure vs FIBO teams
• FIBOLeadershipTeam(FLT)• FIBOProcessTeam(FPT)• FIBOProof-of-ConceptTeams• FIBOFoundations(FND)• FIBOBusinessEntities(BE)• FIBOFinancialBusiness&Commerce(FBC)• FIBOIndicesandIndicators(IND)• FIBOSecurities&Equities(SEC)• FIBODerivatives(DER)
12vendorsarereportedsofarashavingimplementedFIBOintheirITsolutions.NotallpartsofthemodelarecurrentlycoveredbyFIBOteams.
Ontology Maturity Model that informs FIBO development process
“TheOntologyMaturityModel”byLeoObrst,2009(inspiredbyCMM/CMMImodelforbusinessprocessesmaturity)
(a kind of) Ontology favoured by social science data archives
Aninternationalstandardfordescribingsurveys,questionnaires,statisticaldatafiles,andsocialsciencesstudy-levelinformation
Ittook18yearsfromthefirstcodificationoftermstothefirst(incomplete)semanticrepresentation.TheofficialserializationisstillXMLSchema. www.ddialliance.org
Ontology for bibliography (one of a few out there)
• 1960s:MARCStandardsdeveloped• 1971:MARCbecomeanationalstandardintheUS
• 1973:MARCbecomesaninternationalstandard• 2002:librarytechnologistRoyTennantarguedthat"MARCMustDie",asitisusedonlywithinthelibrarycommunity,anddesignedtobeadisplay,ratherthanastorageorretrievalformat
• 2008:reportfromtheLibraryofCongresswrotethatMARCis"basedonforty-yearoldtechniquesfordatamanagementandisoutofstepwithprogrammingstylesoftoday"
• 2012:theLibraryofCongressannouncedthatithadcontractedwithZepheira,adatamanagementcompany,todevelopalinkeddataalternativetoMARC
• 2012:thelibraryreleasedadraftofthenewmodel,namedBIBFRAME• 2016:TheLibraryofCongressreleasedversion2.0ofBIBFRAME
The actual experiment of transforming MARC records to Linked Data by four national libraries )*
)*AspresentedinMTSR2018conferencebyProf.ChristosPapatheodorou,IonianUniversity,Corfu,GreeceDetaileddescriptionofexperiment:Tallerås,K.(2017).Qualityoflinkedbibliographicdata:Themodels,vocabularies,andlinksofdatasetspublishedbyfournationallibraries.JournalofLibraryMetadata,17(2),126–155.https://doi.org/10.1080/19386389.2017.1355166
Linked Data by 4 national libraries continued (something about semantics and interoperability)
• 3of1,141uniquepropertyandclasstermsareusedbyall4libraries(owl:sameAs,rdf:type,anddct:language)• 13termsby(setsof)3libraries• 34termsby(setsof)2libraries
Whythesethree?
Set Triples Entities Data-levelconstants
BNB 104,139,477 10,126,344 52,671,707BNE 71,199,698 5,763,188 56,681,387BNF 304,587,809 30,671,400 192,224,487DNB 329,261,459 32,673,901 250,613,437Average 202,297,111 19,808,708 138,047,754
Picturecredits:“Threestonesofwisdom”byhttp://livertising.net/blog/2013/three-stones-of-wisdom-livertising-exam-concepts/
Ontologies for biology )*
• Ontologiescanbecomplex• Ontologiescanbebig• Ontologiescanchange
)*SimonJupp(EUBioinformaticsInstitute,Cambridge,UK).BuildingarepositoryofbiomedicalontologieswithNeo4j.https://www.slideshare.net/thesimonjupp/building-a-repository-of-biomedical-ontologies-with-neo4j
Rationaleforontologiesrepository
• Searchforterms• Queryingthehierarchy• Queryingacrossrelations
Ontologyrepositoryusecases
https://www.ebi.ac.uk/ols/index(asper1November2018)216ontologies5,526,032terms19,119properties
Semantic modelling and technology with no RDF involved
FlexibleMDM(MasterDataManagement)withgraphdatabase:https://neo4j.com/case-studies/schleich/Picturecredits:https://www.ebay.co.uk/usr/bargain-vapes
We may have learned something about semantic interoperability…
• Ontologies/semanticassetsdevelopmenttakessubstantialeffort.Havingaproperprocessmayhelp• Havingdifferentpracticesofapplicationforthesamesemanticassetisnormal• Havingmultiplesemanticassetsforthesamedomainisnormal• SemanticscanbeexpressedandexploitedusingvariousmodellingtechniquesandITsolutions
…but there are other flavours of interoperability beyond semantics )*
Challenge Popularresponse
Syntacticinteroperability Commonterminology,commonXMLschemasTechnicalinteroperability Configurableandwell-governedsoftware,well-
specifiedAPIsSemanticinteroperability Clearidentificationofallconcepts,connections
betweenthem,andinferencerules
)*For"layered"interpretationoftheseinteroperabilityaspects,seeAndreasTolketal.ComposableM&SWebServicesforNet-CentricApplications.TheJournalofDefenseModelingandSimulation.Vol.3(1),pp.27-44(2016).https://doi.org/10.1177/875647930600300104-kindlyindicatedbyZacharyTrautt(NIST)
… also interoperability is not the end in itself
• Thereisoftenatrade-offbetweeninteroperabilityandextensibility• Usecasesandsuccessstoriesareimportant• Toolsandtechnologytosupportsemanticmodellingandmodelsreuseareimportant–notonlyforITinfrastructure,butasacommunicationaidandasameansofdiscourse
(not mutually exclusive) Solutions for Interoperability and Reproducibility
of data-intensive R&D • SensiblegovernanceandqualitydocumentationforITimplementations• Metadataexchangeformatorself-documenteddataexchangeformats• APIsspecifications(canbeself-documented,too)• OOdesignframeworkswithwell-definedobjectsforaspecificdomain• DSLs(domain-specificprogramminglanguages)• Schemalanguages/specifications,includingforRDF• Ontologies• Workflows(forasmallernumberofwell-definedobjectscomparedtotheOOdesignapproach–perhapsjustonecommonobject)andenginesfortheworkflowsexecution)*
FA??->FAIR
)*SeeSeanBechhoferetal.“Whylinkeddataisnotenoughforscientists”.https://doi.org/10.1016/j.future.2011.08.004Theyrefertowww.myexperiment.orgasaplatformforthenewkindofresearchdiscourseempoweredbyworkflows
(Relatively) new kid on the block: SHACL
https://www.w3.org/TR/shacl/
RDF
RDFS SPARQLOWL SHACL
Statements:Whatisbeingsaid?
Whatwordsdowehave?
Whatmakeslogicalsensetosay?
WhatdidyousayaboutXYZ?
Isthatwordusedcorrectly?Whatdoyouneedtoknowfromme?Youcan'tsaythathere!I'dneversaythat!
ThediagramreplicatestheoneinRichardCyganiak’s2016presentation“SHACL:ShapingtheBigBallofDataMud”https://www.slideshare.net/cygri/shacl-shaping-the-big-ball-of-data-mud
Communication with a wider community of semantic modellers and technologists
that can be beneficial for Materials Science
• Fintech/FIBOcommunitycanadviseonqualitygovernancefortheontologydevelopment.Lookonline,approachthemdirectly,orIcanseewhatIcando• Bio-informaticiansmaybeabletoadviseonmanagementofmultiplesemanticassets,andontheiractualuseforindexing.Lookonline,askEMBL-EBI(UK)–directlyorusingmeasaproxy• EUON(EuropeanOntologyNetwork)–onlyoneworkshopsofar,supportedbyEUDATproject.Ifinterested,askYannleFranc(co-chairoftheRDAVocabulariesInteroperabilityIG)–directlyorusingmeasaproxy• TherearepocketsofEuropeanexpertiseinsemanticmodelling&visualizationtools.Ifinterested,askKārlisČerāns(UniversityofLatvia)–directlyorusingmeasaproxy
Picture:FOAF(friendofafriend)ontologylogo
Opportunities and goals for further discussions
• SemanticAssetsforMaterialsSciencetaskgroupinRDA(nextcall28thNovember14:00CET)• EMMCInternationalWorkshopinVienna(February2019)• RDAgroupsandRDAplenaryinPhiladelphia(April2019)• DAMDIDconferenceandapotentialworkshoponinformaticsformaterialsscienceinKazanorMoscow(October2019)• PossiblesynergiesbetweenEMMCandPhysicalSciencesDataService(withservicevisiondevelopedthrough2019)• FutureEUprojects