Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Horizon 2020Coordination and
Support ActionGARRI-3-2014 Scientific
Information in the Digital Age: Text and Data Mining (TDM)
Project number: 665940
To help the Text and Dataminingcommunity
Lets discuss(best) Practices
Guidance for publishing descriptions of non-public clinical datasets
Iain Hrynaszkiewicz, Varsha Khodiyar, Andrew L. Hufton, Mathias Astell and Susanna-Assunta Sansone
The problem:• Sharing of experimental clinical research data usually happens between individuals or research groups rather
than via public repositories
• It is difficult to connect journal articles with their underlying clinical datasets even when they are “available on request”
Scientific Data workflow for clinical datasets
Our suggested solutions:• New scholarly journal and article types to enable increasing
accessibility to non-public research data and provide case studies
• Journals to develop stronger links with specialist data repositories
• Use and promote voluntary data sharing services to increase accessibility to clinical datasets for secondary uses while protecting patient privacy and the legitimacy of secondary analyses
• Increase collaboration between journals, data repositories, researchers, funders, and voluntary data sharing services
• Use the journal Scientific Data as an example of changes to article format and peer-review process that can be made to journal articles to more robustly link them to data that are only available on request
• Assess and promote features of data repositories to better accommodate non-public clinical datasets, including Data Use Agreements (DUAs)
6 services in 1 afternoon: JOINING UP RESEARCH SUPPORT ACROSS UCL
Our “6-in-1” course : ‘Introduction to Research Support & Integrity’�Coordinated by the RDM & Research Integrity teams, with the support of the Doctoral School�3 hour-long; 6 convenors �Up to 100 PhD students (all disciplines & PhD years)
Drivers & enablers�Researchers’ need for help: during project rather than at the end �Inter-services collaboration�6 services that support the research lifecycle at various stages
Benefits9For students - for speakers - for coordinators
www.ucl.ac.uk/research-data-management
Amber Leahey & Grant Hurley Scholars Portal, Ontario Council of University Libraries
(Canada)
Data Management Outreach Efforts @ University of Florida (UF) - USA
1st Data Management Planning (DMP) Workshop•9/22/16
2nd & 3rd
DMP Workshops•10/24/16
Associate Deans of Research Luncheon•12/14/16
University-wide data survey (preview)•1/3/17
Associate Dean and STRIDEDirector Mtg.•3/9/17
Plato L. Smith IIData Management [email protected]
These activities were made possible through collaborations between the Data Management and Curation Working Group, UF Research Computing, UF Informatics Institute, and UF Division of Sponsored Programs.
What happens when 15 different people curate data?Do they do the same curation activities?
Inspect Files(15)
Inspect Metadata
(15)Quality
Assurance(14)
Activity?n = ?
Activities?n = ?
Activityn = ?
Activity?n = ?
www.dpoc.ac.uk#DP0C
Parallel Auditing of the University of Oxford and
Cambridge’sInstitutional Repositories
How, why?
Research Software in RDM?
Scientific Data Management within the Brazilian Information Science Community
ProblemScientific data management is a necessary practiceto add value to the researcher’s data. This reality isbecoming evident to the Brazilian researchInformation Science community; in this regard wequestion which are their practices concerning the
management of scientific data.
HypothesisWe assume as a hypothesis that only a minority ofthe researchers in the Brazilian InformationScience community effectively perform scientific
data management.
Objectives
Current Research Status
References
The survey is being deployed to the BrazilianInformation Science Community through a FreeOpen Source Software survey tool.
BELL, G. Foreword (2009). In: Hey, Anthony J. G., Stewart Tansley, and Kristin Michele Tolle. The Fourth Paradigm: Data-intensiv e Scientific Discov ery. Redmond, Wash: Microsoft Research, 2009. Available in: <http://digital.library.unt.edu/ark:/67531/metadc31516/>. BORGMAN, C.L. Research Data: Who will share what, with whom, when, and why?” in Proceedings of the China-North American Library Conference, Beijing, Sep. 2010. Available in: <http://works.bepress.com/borgman/238/>. BORGMAN, C. L. Scholarship in the digital age: information, infrastructure, and the Internet. Cambridge: The MIT Press, 2010a. (E-book).BORGMAN, C.L. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, MA: MIT Press, 2015. (E-book).DATAONE. DataOne Education Module. Data Sharing. 2012. PowerPoint Presentation. Available in: <http://www.dataone.org/sites/all/documents/L02_DataSharing.pptx>.TENOPIR, C.; ALLARD, S.; DOUGLASS, K.; AYDINOGLU, A. U. et al. Data Sharing by Scientists: Practices and Perceptions. PLoSONE, Volume 6, Issue 6, June 2011. Available in: <http://www.plosone.org>. ZINS, C. Conceptual approaches for defining data, information and knowledge. Journal of the American Society for Information Science and Technology, v.58, n.4, p.479-493, 2007.
Guilherme Ataíde Dias Universidade Federal da Paraíba - MPGOA
Adriana Alves Rodrigues Universidade Federal da Paraíba - PPGCI
Renata Lemosdos AnjosUniversidade Federal da Paraíba - DCI
• Characterize the scientific data formats usedby the researchers in the Information Sciencefield;
• Identify the average (weekly) time spent by theresearchers in the management of scientificresearch data;
• Investigate the types of scientific datacollected by the researchers;
• Identify the data sharing practices employedby the researchers;
• Identify the actions taken by the researchersrelated to the storage and preservation ofresearch data;
• List the scientific data repositories used by the
researchers.
Methodological Characteristics
• Exploratory research
• Survey research
• Quantitative analisys
Reusability, Digital Reunification and Analysis of US Overseas Pension Records
Students – Mary KENDIG | Jen PROCTOR| Paridhi MATHUR | Scott HARKLESS | Anne DEMPSEY | Rosemary HALLDr. Kenneth HEGER, Richard MARCIANO, Michael KURTZ -- Staff
Genealogy
Human Migration
Health Informat
ics
Economics
Follow our blog at http://dcicblog.umd.edu/overseas-pension/
History•US Civil War•US Spanish American
War•Trans Atlantic Family
Connections
Records•Letters •Reports•Health Files•Statistical Tables
CURE: A consortium of academic institutions that support data quality review, a framework that includes research data curation and code review.
FOUNDING MEMBERS:
http://cure.web.unc.edu
TILDA - A solution for publishing, e-archivingand long term preservation of research and environmental data at SLU (Swedish University of Agricultural Sciences)
1.Fruit of co-operation between different units
2.Joint business process for archiving and publishing
3. Long term preservation aspects in the beginningof the process
4.Quality assured data and metadata
5. Integration of CKAN & Archivematica
6. SLU, first university in Sweden launching solutionfor research data
Methods and metrics for the assessment of research data management maturity, adoption of software tools, and data sharing
outcomes in neuroimaging Ana Van Gulick, Carnegie Mellon University, Pittsburgh, PA, USA
John Borghi, California Digital Library, Oakland, CA, USA
What tools are neuroimaging researchers *actually* using? How are they managing and sharing data?
Are they using open science tools?
Pyles et al. 2013, PLOS One
Pyles et al. 2013, PLOS OnePyles et al. 2013, PLOS One
DATA MANAGEMENT PLAN AS A UNIVERSITY
REQUIREMENT
Funding agency approves grant application
Administrator creates project record in in-house system
PI receives alert to file DMP
PI creates and submits DMP in in-house system
Research fund is released to PI
Ms GOH Su Nee, Ms Lavanya ASOKAN
Establishing data management services for multi-disciplinary, long-term collaborative research centres
Constanze Curdt and Dirk Hoffmeister
CRC / Transregio 32:Patterns in Soil-Vegetation-Atmosphere Systemswww.tr32db.de
12th International Digital Curation Conference |Royal College of Surgeons of Edinburgh, Scotland | 20 - 23 February 2017
funded by:
CRC 1211:Earth – Evolution at the Dry Limitwww.crc1211db.uni-koeln.de
Agile Data
Curation
Values and
Principles
Case Studies
Design Patterns
Karl Benedict,W. Christopher Lenhardt,Joshua Young
Community Engagement for Developing the Principles and
Practices of Agile Data Curation
Printing:
Customizing the Content:
Heuristic model for climate information validation made available via Linked Open Data
João José Barbosa Ferreira | Guilherme Ataíde Dias | Universidade Federal de Minas Gerais - Brazil
Problem / QuestionWhat is the commitment degree of the participating organizations of the Linked Open Data project in relation to the maintenance and updating of the available data?
Hypothesis
• The increasing volume of data being attached to the project Linked Open Data, raises attention to the validity of the available information, since obsolete information can significantly compromise data quality
Project Overview
• This study aims to investigate the metadata related to Linked Open Data using heuristics to analyze the frequency to data updating, studying the degree of reliability and indicating possible points of attention to be corrected.
• An example is the exchange of information among organizations that monitor and assess the Earth's climate status in real time and provide data to assist decision-makers at all levels of public and private sectors w ith data and information on trends such as climate variability for civil defense disaster prediction or to plan actions in agricultural issues.
• It is planned the development of a softw are prototype based on a model to analyze the research data updates published by climate agencies w hich have joined the project Linked Open Data to identify possible non-conformities in the update of this data, generating as output statistical information for the composition of quantitative indicators regarding the the data produced reliability.
Analysis cycle
Operational Steps
Step
1 Choose a database member of the project Open Linked Data
Step
2 Criteria of identification established to update the data
Step
3 Modelvalidation
Step
4 Results presentation
Data / Observations
• Data use effectiveness for the prevention of climatic incidents.
• Identification of possible data users consume.• Amount of periodic database access.• Effective use of controlled vocabularies and ontologies
in the metadata design.
Methodological Approach
• From the scientif ic method standpoint, to achieve the goals of this research is intended to use the statistical method
Initial conclusions
• Creating a model for analyzing the quality of available data on a Linked Open Data project is presented as an important resource in the audit process for certif ication of the data.
• The use of this model allow s information producers to increase the data quality and users greater assurance in their decision making process.
References
• Almeida, M. B., & Bax, M. P. (2003). Uma visão geral sobre ontologias: pesquisa sobre definições, tipos, aplicações, métodos de avaliação e de construção. Ciência da Informação, Brasília, 32(3), 7-20.
• Bauer, F., & Kaltenböck, M. (2011). Linked open data: The essentials. Edition mono/monochrom, Vienna.
• da Silva, D. L., Souza, R. R., & Almeida, M. B. (2008). Ontologias e vocabulários controlados: comparação de metodologias para construção. Ci. Inf, 37(3), 60-75.
• Edw ards, P. N. (2010). A vast machine: Computer models, climate data, and the politics of global w arming. Mit Press.
• Fischer, G., Shah, M. M., & Van Velthuizen, H. T. (2002). Climate change and agricultural vulnerability.
• Jain, P., Hitzler, P., Sheth, A. P., Verma, K., & Yeh, P. Z. (2010, November). Ontology alignment for linked open data. In International Semantic Web Conference (pp. 402-417). Springer Berlin Heidelberg.
• Kovats, S., Ebi, K. L., Annunziata, G., Bagaria, J., Banatvala, N., Baschieri, A., ... & Dow ning, T. Health Impacts of Catastrophic Climate Change: Expert Workshop.
• Moresi, E. (2003). Metodologia da pesquisa. Brasília: Universidade Católica de Brasília, 108.
ORDA(figshare)
archiveUS(Ex Libris Rosetta)
How to make your data valuable via data sharing?
Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=30978545
IncentivesData reuse
Research reproducibility
Data organization and management
Sharing platform infrastructure
Licenses
POSTER: Data Sharing in a Complex Computational Study: Easier Said than Done!
Cost and value
22/02/2017 22
It’s undeniable that research data is valuable…
Outputs from the ‘Research at Risk’ Business case and costing project
So why is it so hard to make the business case…?
Keep the wheels turning - Advocating Data Stewardship at TU Delft
Alastair C. Dunning @alastairdunningJasmin K. Böhmer @JasminBoehmer
Chung-Yi (Sophie) Hou1 ([email protected]), Michael Twidale2, Steven Worley1, Matthew S. Mayernik1
1 – National Center for Atmospheric Research, University Corporation for Atmospheric Research2 – School of Information Sciences, University of Illinois at Urbana-Champaign
Case Studies of Selected Usability Evaluation Techniques and Their Applications to Improve Data Repositories
Poster #6
Without Usability Evaluations With Usability Evaluations
Demonstration of a Humanities Data LibraryTuesday at 14:00 in GB Ong (JHU)
Charles Booth’s Londonhttps://booth.lse.ac.uk/
Hybrid Provenance Overview
ASCII Data or
Column Binary Data
Data Files:SPSS, SAS
Stata
ASCII Data or
Column Binary Data
SFC_FWSFC_CB
Setup Files:SPSS, SAS
Stata
DDI 2.5 Codebook
Data Files:SPSS, SAS
Stata
ASCII Data or
Column Binary Data
CodebookSetup Files:SPSS, SAS
Stata
Data Files:SPSS, SAS
Stata
You want:
You only have: + -
Don’t worry, CISER has:
Demo: CISER Setup Files CreatorFlorio Arguillas William Block
Setup Files:SPSS, SAS
Stata
Demo Room: Tausend Time: 14:00-14:25, Tuesday, 21 February 2017