Data, Data Everywhere

  • View
    35

  • Download
    2

Embed Size (px)

DESCRIPTION

Data, Data Everywhere…. September 8, 2011 The Coalition for Academic Scientific Computation José-Marie Griffiths, PhD Vice President for Academic Affairs Bryant University, Smithfield, Rhode Island. Concerns of Research Administrators. - PowerPoint PPT Presentation

Text of Data, Data Everywhere

PowerPoint Presentation

Data, Data Everywhere.September 8, 2011The Coalition for Academic Scientific Computation

Jos-Marie Griffiths, PhD Vice President for Academic AffairsBryant University, Smithfield, Rhode Island1Concerns of Research AdministratorsStrong advocates of research and its dissemination to as wide a set of audiences as possible.Most concerns today relate to current economic trends and uncertainties.Long been concerned about overhead costs (which are increasing) and the cap on administrative costs.

22Concerns of Research Administrators - 2Concerns about policies translating into unfunded mandates (like recently proposed financial reporting requirements to track all federal funding).Increasingly concerned about roles, responsibilities, and liabilities.Size matters!33

Taking AIM at Data Lifecycle Management: Access, Integrity, Mediation

4Data Policy Task ForceEstablished at the February 3-4, 2010 NSB meeting

Charge: further defining the issues and outlining possible options to make the use of data more effective in meeting NSF's mission.

55Data Policy Task Force StrategiesMonitor the impact of NSF updated implementation of the Data Management Plan requirement to inform a review of NSF policyConsidering issues of data policy, Open Data movements, and related issues, the Task Force will then develop a "Statement of Principles.Provide guidance to subsequent Board efforts to develop specific actionable policy recommendations focused, initially, on NSF, but that could potentially promulgate through other Federal agencies in a national and international context.66NSB Task Force on Data PolicyStatement of PrinciplesOpenness and transparency are critical to continued scientific and engineering progress and to building public trust in the nations scientific enterprise. This applies to all materials necessary for verification, replication and interpretation of results and claims, associated with scientific and engineering research.Open Data sharing is closely linked to Open Access publishing and they should be considered in concert.The nations science and engineering enterprise consists of a broad array of stakeholders, all of which should participate in the development and adoption of policies and guidelines.

77NSB Task Force on Data PolicyStatement of Principles - 2It is recognized that standards and norms vary considerably across scientific and engineering fields and such variation needs to be accommodated in the development and implementation of policies.Policies and guidelines are needed for open data sharing which in turn requires active data management. All data and data management policies must include clear identification of roles, responsibilities and resourcing.

88NSB Task Force on Data PolicyStatement of Principles - 3The rights and responsibilities of investigators are recognized. Investigators should have the opportunity to analyze their data and publish their results within a reasonable time. 99NSB Expert Panel Discussion on Data PoliciesMarch 28-29, 2011Arlington, VAParticipants included:Over 30 experts/research administrators7 NSB members4 NSF Directors/Staff1010Access, Integrity, MediationAccess what goes in must be able to come out!Integrity what goes in must be the same thing that comes out!Mediation what goes in is going to need help coming out!1111Key Areas Emerging from theExpert Panel Discussion on Data PoliciesMarch, 2011 ACCESSStandards and interoperability enable data-intensive science.Data sharing is an identified priority.INTEGRITYRecognize and support computational and data-intensive science as a discipline.MEDIATIONStorage, preservation, and curation of data are critical to data sharing and management (data stewardship).Cyberinfrastructure is necessary to support data-intensive science.

1212AccessWhat goes in must be able to come out!Access Integrity Mediation

13

13Key Areas - National Science BoardExpert Panel Discussion on Data PoliciesMarch, 2011 ACCESSStandards and interoperability enable data-intensive science.Data sharing is an identified priority.

1414Standards and interoperability enable data-intensive science.Citation and attribution normsNeed new norms and practicesData producers, software & tool developers, data curators get credit for their workInteroperability standardsTo enable sharing & interoperability across disciplines and internationallyDevelopment of persistent identifiersTo enable tracking of provenanceEnsure data integrity (see next section)Facilitate citation & attribution1515Interoperability - sooner rather than later

1616Data sharing is an identified priority.Must balance privacy concerns and data access for sharing and re-use.Acknowledge disciplinary cultures while establishing a culture of sharing across all research communities.Must promote & reward exemplary data management projects & plans.Data availability must be timely issues of embargoes and restricted use durations.1717

1818Integrity19What goes in must be the same thing that comes out!

Access Integrity Mediation

19Recognize and support computational and data-intensive science as a discipline.Recognize & reward computational & data scientists & curators: funding, tenure, etc.Support training in computational scienceReward international collaborations to develop cyberinfrastructure, data stewardship, interoperability, international sharingNew funding/economic models to support processing, storing, archiving, maintaining data sets.Need to define who is responsible for what funding agencies/publishers versus research communities

2020Office of Research Integrity, U.S. Department of Health and Human Services: Key Components of Data Lifecycle Management

Guidelines for Responsible Data Management in Scientific Research, ori.hhs.gov/education/products/clinicaltools/data.pdf2121Planning for Preservation over the Data Life CycleAnticipate archiving costs and challengesCreate a data management planFollow best practices for data and documentationManage master datasets and work filesDetermine file formats to depositComply with dissemination standards and formatsSet up support for data users

Courtesy of Cole Whiteman, ICPSRProposal Planning and WritingProject Start-up and Data ManagementData Collection and File CreationData AnalysisPreparing Data for SharingDepositing DataAfter-Deposit Archival Activities12345672222Integrity Concerns for Research InstitutionsWhat to share - raw, processed, analyzed datasets, instruments, calibration and environmental records, analytical tools, etc.Processes for and costs of long-term curation of data

2323Mediation24What goes in is going to need help coming out!

Access Integrity Mediation

24Storage, preservation, and curation of data are critical to data sharing and management (data stewardship)Funding agencies must commit to ongoing financial support for repositories (no orphans)Standardized curatorial mechanismsStrategic partnerships between stakeholder communities and data repositories, supported by fundersDefine roles of different types of digital repositoriesPossibly independent auditing of data repositories to ensure data quality, access, interoperability2525Cyberinfrastructure is necessary to support data-intensive scienceGeographic distribution of research teams, computing resources and datasets requires robust cyberinfrastructureMust include shared applications for analysis, visualization and simulationStandardization for interoperability & accessibilityNeed capital investment in cyberinfrastructureNeed to define appropriate ratio of infrastructure to research funding2626Mediation is Needed at Data Collection, Analysis and UseGio Weiderhold, Stanford: When there is high intensity of interaction with any of these elements, it makes sense to have multiple mediators (e.g. replicate repositories)

Collected Research Data Set ACollected Data Set BRepository 2Repository 1Repository 3UseRepository 4UseUseUseAnalysisAnalysisAnalysisAnalysis2727Informal and Formal MediationMediation at Use level is informal and pragmaticMediation at Repository and Analysis level needs to be formal with domain/expert control*

Collected Research Data Set ACollected Data Set BRepository 2Repository 1Repository 3UseRepository 4UseUseUseAnalysisAnalysisAnalysisAnalysis28*Gio Weiderhold, Stanford, 1995Informal, pragmatic mediationFormal mediationwith domain/expert control28Stakeholders Multiple Players, Inter-relationships2929For data to be discoverable, must have a shared overlay of interdisciplinary and technological connections303031

This.or.This?

31

This.or.This?32Jos-Marie Griffiths, Ph.D.Vice President for Academic AffairsBryant University1150 Douglas PikeSmithfield, RI 02917(401) 232-6060

jmgriff@bryant.edujosemarie@gmail.com

3333