28
Quality and improving interoperabili ty between language resources: trust, process, simplicity [email protected] w.nl coordinator infrastructure at http://www.datasealofappro l.org/

2010 CLARA Nijmegen - Data Seal of Approval tutorial

Embed Size (px)

DESCRIPTION

A tutorial for the participants of the CLARA summerschool about the Data Seal of Approval. Philosophy and practice for quality in research data.

Citation preview

Page 1: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality andimproving interoperability between language resources:

trust, process, simplicity [email protected]

coordinator infrastructure athttp://www.dans.knaw.nl

http://www.datasealofapproval.org/

Page 2: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality and interoperability

evolution

hard-to-fake traits

indicating fitness

promote interoperability

Page 3: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Overview

• Introduction and Theory• qualities• trust, simplicity• guidelines

• Process and Demo• assessment and review

• Discussion and Application• CLARIN centers• language resources

Page 4: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Introduction and theory

Page 5: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Scientific Quality

http://www.ploscompbiol.org/article/metrics/info:doi/10.1371/journal.pcbi.1000112

Page 6: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Scientific quality

• transparent• from producer• through repository• to consumer

• properties to guard• authenticity• integrity• provenance

Page 7: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Usage quality

• data formats• usability

• metadata• findability• intellegibility

Page 8: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality control

• by the stakeholders• data producers• data custodians• date consumers

• custodians = repositories• substantial role for repositories

• guidelines for producers• agreements for consumers

Page 9: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality issues• metadata standards

• CMDI and www.isocat.org

• preferred formats • TEI, XML

• referencing systems• persistent identifiers

• long term preservation• after the live-environment has died off

• interoperability• OAI-PMH

Page 10: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality issues

• search engines• CLARIN search and develop

• access rights• comply with privacy law, copyright law• respect people from which data is obtained

• accountability• for all repository operations

Page 11: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality and Trust

• imperfection lurks everywhere• trust works where certainty blocks• trust is a process

• to greater quality• to better relationships• to more certainty

Page 12: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Quality and Simplicity

http://lawsofsimplicity.com/

reduce organizetime learn differences context emotion trust failurefocus: subtract what is obvious add what is meaningful

Page 13: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Guidelines: producers

1.The data producer deposits the research data in a data repository with sufficient information for others to assess the scientific and scholarly quality of the research data and compliance with disciplinary and ethical norms.

2. The data producer provides the research data in formats recommended by the data repository

3. The data producer provides the research data together with the metadata requested by the data repository

http://www.datasealofapproval.org/

Page 14: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Guidelines: consumers

14. The data consumer complies with access regulations set by the data repository

15. The data consumer conforms to and agrees with any codes of conduct that are generally accepted in higher education and research for the exchange and proper use of knowledge and information

16. The data consumer respects the applicable licenses of the data repository regarding the use of the research data

http://www.datasealofapproval.org/

Page 15: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Guidelines: repositories

4. The data repository has an explicit mission in the area of digital archiving and promulgates it

5. The data repository uses due diligence to ensure compliance with legal regulations and contracts including, when applicable, regulations governing the protection of human subjects.

6. The data repository applies documented processes and procedures for managing data storage

7. The data repository has a plan for long-term preservation of its digital assets

http://www.datasealofapproval.org/

Page 16: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Guidelines: repositories

8. Archiving takes place according to explicit workflows across the data life cycle

9. The data repository assumes responsibility from the data producers for access and availability of the digital objects

10. The data repository enables the users to utilize the research data and refer to them

11. The data repository ensures the integrity of the digital objects and the metadata

12. The data repository ensures the authenticity of the digital objects and the metadata

13. The technical infrastructure explicitly supports the tasks and functions described in internationally accepted archival standards like OAIS

http://www.datasealofapproval.org/

Page 17: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Guidelines: outsourcing

repositories may outsource digital preservation

to specialist repositories• implement all except 4,6,7,8 and 13• store a copy of the data in another (TDR) that

• has acquired the DSA logo • by implementing each of the sixteen guidelines • (including 4, 6, 7, 8 and 13).

http://www.datasealofapproval.org/

Page 18: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Seal of Approvement

• a repository shows it on its webpage• if conditions are fulfilled• as testified by

• a self-assessment• with reviews• on a yearly basis

• the exact level of compliance is• transparently published under the seal

Page 19: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Assessment and review

score actions taken comments issues

* nothing done give a reason

** theoretical concept point to initiation doc describe main issues

*** implementation phase point to definition doc describe main issues

**** fully implemented point to definition doc

N/A not applicable give a reason

minimum requirementsthreshold will go upas time proceeds

Page 20: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Organisation

• repositories represented by a board• tools to facilitate the procedure

• modifiaction record

• the DSA website links to compliant repositories

Page 21: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Process and DemoHeleen van de Schraaf

Page 22: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Application and discussion

Page 23: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

CLARIN centres

• A = provide infrastructure• managing the federation

• B = provide services• data and webservices

• C = provide metadata• harvestable metadata

• R = respected = recognised• offer LRT resources in whatever form

• E = external• offer non-LRT resources or services

• identity federations• national libraries

Page 24: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Group assignment• P(roducers)

• invent p-guidelines for B/C centers

• R(epositories)• invent r-guidelines for A/B centers

• C(onsumers)• invent c-guidelines for B/C/R centers

Suggestions for • assessment• review• modification record

Page 25: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Wrap-up: P-Group

metadata about backgroundinformation about researchers

who, why, publicationsDAIIn IMDI it is difficult to update information, affiliation updates,use unique identifiers for participants in building a corpus, store records of people, and link from the metadata of resources to the records of peopleusing formats depending on formatsformats maybe standardised, but not usable to researchers, I do not want to wrap my data in dead formats: the repositories should support innovation in this respect, when it is driven by researchersthese are all points that can be addressed in the assessment procedure, no new guidelines needed

Page 26: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Wrap up: C-groupgoal is: finding info in a repositorywe need:

overview of access rightsproper web-connection to the repositoryuser-friendly interfacelow threshold for feedback for new features

we should be part of the chain in the design of the access toolsGUIDELINES

WE WANT ALL CENTERS IN THE CHAIN THAT PROVIDE US WITH THE INFORMATION WE NEED TO OFFER US TRANSPARENCY AND VERIFIABILITY ON HOW THEIR DATA IS OBTAINED, PROCESSED AND CONTROLLED/MANAGEDWE WANT TOOLS WITH CLEAR COPYRIGHT PERMISSIONS THAT HAVE A STABLE AND SECURE CONNECTION AND A SOLID USER INTERFACE THAT IS USER FRIENDLY AND ALLOWS FOR USER FEEDBACK ON ITS FEATURESexplanation of second guideline: the access tools must really aid us in the navigation to resources that we have access to. We must be able to see on beforehand whether a resource is closed or open to us.

Page 27: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Wrap-up: R-groupwe provide infrastructure and management for datawe want to standardize our stuffwe need knowledge, the right metadata of the stuff that is coming to uswe want the materials in the right format, allowing for some flexibilityretro-archiving: we offer tools for converting legacy data, so that producers may submit raw materialsmanagement of data concerning legal access

protect the providers, so that the providers can trust the consumers: licensing formsshare knowledge about services we provide withpotential users: people working in the fieldother repositorieswe want a forum as an instrument for developing trust between producers and consumers: the community becomes more transparentproviders can get feedback from the usersproviders get insight in the use of their datamissing in the guidelines:promotion of the materialstraining the peopleinteractivity with producers and consumers

Page 28: 2010 CLARA Nijmegen - Data Seal of Approval tutorial

Wrap-up: General

add weights to guidelines, in order to declare some guidelines more important than others.