Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descriptors by Mark Wilkinson

  • Published on
    12-Aug-2015

  • View
    282

  • Download
    3

Transcript

  1. 1. This presentation is licensed CC-BY Mark Wilkinson (markw@illuminae.com) https://goo.gl/ts3hLW
  2. 2. EU Lead Mark Wilkinson Isaac Peral Distinguished Researcher, CBGP-UPM, Madrid USA Lead Michel Dumontier Associate Professor, Biomedical Informatics, Stanford, USA FAIRport Project Lead Barend Mons Professor, Leiden University Medical Centre, Netherlands Data FAIRport Skunkworks Common repository access via meta-meta-descriptors
  3. 3. What is a FAIRport? Findable - (meta)data should be uniquely and persistently identifiable Accessible - identifiers should provide a mechanism for (meta)data access, including authentication, access protocol, license, etc. Interoperable - (meta)data should be machine-accessible, using a machine-parseable syntax and, where possible, shared common vocabularies. Reusable - there should be sufficient machine-readable metadata that it is possible to integrate like-with-like, and that component data objects can be precisely and comprehensively cited post-integration.
  4. 4. The Problem
  5. 5. End-user view of The Problem Tissue rejection experimental context. Today, Im looking for microarray data of human liver cells on a time-course following liver transplant. What repositories could contain such data? GEO? EUDat? FigShare? Dryad? Atlas? What fields in those repositories would I need to search, using what vocabularies, to find the microarray studies that are relevant?
  6. 6. Dissecting the problem There are a lot of repositories! General Purpose: DataVerse, Dryad, EUDat, Figshare, etc. Special Purpose: PDB, UniProt, NCBI, GEO, Atlas, EnsEMBL
  7. 7. Dissecting the problem Lack of harmonized metadata structures, or even rich descriptions of the contents of these repositories, hinders us from (for example): knowing where we can look for certain types of data knowing if two repositories contain records about the same thing Cross-referencing or joining across repositories to integrate disparate data about the same thing Knowing which repository I could/should deposit my data to (and how)
  8. 8. Skunkworks Challenge If we wanted to enable this kind of FAIR discovery and integration over myriad repositories, what infrastructure (existing/new) would we need?
  9. 9. If we wanted to enable this kind of FAIR discovery and integration over myriad repositories, what infrastructure (existing/new) would we need? Discussions with Tim Clark revealed that the core objectives of Skunkworks were very similar to those of Force 11 Data Citation Implementation Working Group Team 4 - Common repository interfaces ...so we joined forces :-) Skunkworks Challenge
  10. 10. The Solution?
  11. 11. Shared Metadata Descriptors? They already exist! (e.g. DCAT) Are not (yet) widely implemented But are not sufficiently rich... ...only describe core metadata We need to query, e.g. experimental context and domain-specific metadata
  12. 12. So... extend DCAT?
  13. 13. So... extend DCAT? ...extend it where?... too many specialist domains & data resistance to harmonization resistance to implementation (time, money, expertise, just dont care) attempting to impose standards is a Mugs game!
  14. 14. Common provider-implemented API?
  15. 15. Common provider-implemented API? a la TDWG/TAPIR and caBIO... too many specialist domains & data resistance to harmonization resistance to implementation (time, money, expertise, just dont care) attempting to impose standards is a Mugs game!
  16. 16. Where else could the solution be? What exactly *is* our problem?
  17. 17. What exactly *is* our problem? Data Record (e.g. XML, RDF)
  18. 18. What exactly *is* our problem? Data Record (e.g. XML, RDF) Data Schema (e.g. XMLS, RDFS) Defines
  19. 19. What exactly *is* our problem? Data Record (e.g. XML, RDF) Data Schema (e.g. XMLS, RDFS) Metadata Record (e.g. DCAT-compliant RDF) Defines Describes
  20. 20. What exactly *is* our problem? Data Record (e.g. XML, RDF) Data Schema (e.g. XMLS, RDFS) Metadata Record (e.g. DCAT-compliant RDF) (IF the repository uses DCAT) DCAT RDFS Schema (IF the repository uses DCAT) Defines Describes Defines
  21. 21. What exactly *is* our problem? Data Record (e.g. XML, RDF) Data Schema (e.g. XMLS, RDFS) Metadata Record (e.g. DCAT-compliant RDF) (IF the repository uses DCAT) DCAT RDFS Schema (IF the repository uses DCAT) Defines Describes Defines If everyone used DCAT, we could at least query the core metadata of all repositories ...but they dont... ...and core isnt rich enough anyway...
  22. 22. What exactly *is* our problem? XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema REALITY
  23. 23. What exactly *is* our problem? XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema Repositories dont all use DCAT Schema
  24. 24. What exactly *is* our problem? XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema Those that use DCAT Schema, use only parts of it
  25. 25. What exactly *is* our problem? XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema Those that dont use DCAT use a myriad of alternatives (some very loosely defined)
  26. 26. What exactly *is* our problem? XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema And dont necessarily use all elements of those alternatives either
  27. 27. What exactly *is* our problem? XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema So we need to find a way to do RICH queries over all of these?
  28. 28. What exactly *is* our problem? XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema We need a way to describe the descriptors...
  29. 29. Desiderata of meta-meta descriptors Must describe legacy data (i.e. not just DCAT or other modern data) Must describe a multitude of data formats (XML, RDF, Key/Value, etc.) Must be capable of describing any kind of value constraint, e.g. plain text, numerical, arbitrary CV, rdf:range, or equivalent OWL construct Must be modular, identifiable, shareable, and reusable (to stem the proliferation of new formats) Must be hierarchical to allow composite re-use of shared descriptors Must use standard technologies, and re-use existing vocabularies if poss. Must be extremely lightweight and trivial to create Must NOT require the participation of the repository host (no buy-in required)
  30. 30. The Solution? (or at least, our best attempt to date!)
  31. 31. Exemplar use-cases: A piece of software that can generate a sensible data submission form for any repository (at the Force 2015 meeting a few months ago I gave a presentation of a working example of this so I wont repeat that today) A piece of software that can generate a sensible query form/interface for any repository (demonstration of this today!) Skunkworks Task #1 - [F]indable Invent harmonized cross-repository meta- descriptors
  32. 32. FAIR Profiles FAIR Profiles provide a common way to describe a repositorys metadata (and data, for that matter!)
  33. 33. XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema What FAIR Profiles do
  34. 34. XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema FAIR Profile DCAT Schema FAIR Profile UniProt Metadata Schema FAIR Profile DragonDB Metadata Schema What FAIR Profiles do
  35. 35. XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema FAIR Profile DCAT Schema FAIR Profile UniProt Metadata Schema FAIR Profile DragonDB Metadata Schema Though they are potentially describing very different things (from Web FORM fields to OWL Ontologies!) all FAIR Profiles are written using the same vocabulary and structure, defined by...
  36. 36. XML Data Record XMLS Data Schema DCAT RDF Metadata Record RDF Data Record RDFS Data Schema UniProt RDF Metadata Record ACEDB Data Record ACEDB Data Schema DragonDB Form Metadata Record DCAT RDFS Schema UniProt RDFS MetadataSchema DragonDB Form Metadata Schema FAIR Profile DCAT Schema FAIR Profile UniProt Metadata Schema FAIR Profile DragonDB Metadata Schema
  37. 37. The FAIR Profile Schema
  38. 38. Repo. Data Record (e.g. XML, RDF) Repo. Data Schema (e.g. XMLS, RDFS) Repository Metadata Record Repository Metadata Schema Defines Describes Defines Defines ~~Describes** Repositorys FAIR Profile FAIR Profile Schema
  39. 39. Repo. Data Record (e.g. XML, RDF) Repo. Data Schema (e.g. XMLS, RDFS) Repository Metadata Record Repository Metadata Schema Defines Defines ~~Describes** Repositorys FAIR Profile FAIR Profile Schema
  40. 40. FAIR Profile Schema A very small OWL Vocabulary for writing meta-meta- descriptors FAIR Profile FAIR Class Dataset (W3C HCLS Dataset Description) License, Rights, citation metadata, etc. hasClass hasProperty describes dataset owl:Class (URI or de novo definition) rdf:Property owl:ObjectProperty or owl:DatatypeProperty describes property minCount xsd:anyURI xsd:integer xsd:integer maxCount allowedValues FAIR Property describes class rdf:langString skos:preferredLabel skos:preferredLabel rdf:langString http://datafairport.org/schema/FAIR-schema.owl
  41. 41. FAIR Profile Schema A very small OWL Vocabulary for writing meta-meta- descriptors FAIR Profile FAIR Class Dataset (W3C HCLS Dataset Description) hasClass hasProperty describes dataset owl:Class (URI or de novo definition) rdf:Property owl:ObjectProperty or owl:DatatypeProperty describes property minCount xsd:anyURI xsd:integer xsd:integer maxCount allowedValues FAIR Property describes class rdf:langString skos:preferredLabel skos:preferredLabel rdf:langString http://datafairport.org/schema/FAIR-schema.owl Dataset (W3C HCLS Dataset Description) License, Rights, citation metadata, etc.
  42. 42. xsd:anyURI allowedValues
  43. 43. URI must resolve to: XSD, SKOS Concept Scheme or another FAIR Profile Describes the constraints on the possible values for a predicate in the target- Repositorys metadata Schema xsd:anyURI allowedValues
  44. 44. URI must resolve to: XSD, SKOS Concept Scheme or another FAIR Profile Describes the constraints on the possible values for a predicate in the target- Repositorys metadata Schema NOTE: we cannot use rdfs:range because we are meta-modelling a schema! The predicate is a CLASS at the meta-model level, so use of rdfs:range is not appropriate. xsd:anyURI allowedValues
  45. 45. A FAIR Profile (an RDF document that follows the FAIR Profile Schema) This Metadata Record Metadata Schema Fair Profile Fair Profile Schema
  46. 46. What a FAIR Profile is: A meta-description of the (meta)data in a repository
  47. 47. What a FAIR Profile is: A meta-description of the (meta)data in a repository What a FAIR Profile is NOT: THE meta-description of the (meta)data in a repository
  48. 48. What a FAIR Profile is: A meta-description of the (meta)data in a repository if you were to view it from a particular perspective (also known as a lens* over the data) * Scientific Lenses to Support Multiple Views over Linked Chemistry Data; DOI:10.1007/978-3-319-11964-9_7
  49. 49. What a FAIR Profile is: A meta-description of the (meta)data in a repository if you were to view it from a particular perspective (also known as a lens* over the data) this is where the FAIRport approach becomes distinctly powerful!
  50. 50. What a FAIR Profile is: A meta-description of the (meta)data in a repository if you were to view it from a particular perspective (also known as a lens* over the data) but first, look at the other FAIRport components
  51. 51. Skunkworks Task #2 - [A]cessible Are there already access layer definitions?
  52. 52. A set of behaviors for providing a unified (albeit simplistic!) access layer for records contained in any Web resource Skunkworks Task #2 - [A]cessible Are there already access layer definitions?
  53. 53. LDP sits at a URL waiting
  54. 54. GET Client calls HTTP GET on the URL (thats all!)
  55. 55. ?? LDP communicates with the repository (how? entirely up to you!)
  56. 56. Repository returns data about available records (how? entirely up to you!) ??
  57. 57. LDP returns you an RDF representation of the list of records URLs URL1 URL2 URL3 URL4 URL5 URL6 ...
  58. 58. GET URL6 The URLs (should) point back to the LDP server
  59. 59. ?? LDP communicates with the repository about that record ??
  60. 60. LDP returns you DCAT Distributions for all available formats of that record that the repo provides URL6a URL6b
  61. 61. You directly call the repository using the URL of your choice GET URL6a
  62. 62. Repository returns you the data you requested Content-type: application/xml Yummy Data Here! . (Note: most repositories already do this! So were half-way there :-) )
  63. 63. The first time I wrote one of these from scratch, it was about 170 lines of code, and took less than 4 hours (including reading the W3C documentation!)
  64. 64. The first time I wrote one of these from scratch, it was about 170 lines of code, and took less than 4 hours (including reading the W3C documentation!) When one of these is associated with a FAIR Profile we call it a FAIR Accessor
  65. 65. Skunkworks Task #3 - [I]nteroperable This is the holy grail!!
  66. 66. Skunkworks Task #3 - [I]nteroperable This is the holy grail...

Recommended

View more >