39
Fran Berman Got Data? Building a Sustainable Environment for Data-Driven Innovation Dr. Francine Berman Chair, Research Data Alliance / US Edward P. Hamilton Distinguished Professor of Computer Science, RPI

Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

  • Upload
    vanhanh

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Got Data? Building a Sustainable Environment for Data-Driven Innovation

Dr. Francine Berman

Chair, Research Data Alliance / US

Edward P. Hamilton Distinguished Professor of Computer Science, RPI

Page 2: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Data Drives 21st Century Innovation

Safety and Security

Transformative strategies for disease treatment and well-

being

Research Insights

Efficient Physical Infrastructure Broad spectrum of goods

and services

Page 3: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Technical infrastructure necessary for data-driven results

Data discoverability tools

Data access via portals, science gateways, etc.

Database and data collection systems

Data services to support use and re-use

Data analysis algorithms, data-driven models and simulations

Data visualization tools

Semantic frameworks

Data management systems

Data storage

Page 4: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Social, organizational, and human infrastructure equally important

Systems Interoperability

Policy

Sustainable Economics

Common Standards

Traffic Image: Mike Gonzalez

Community Practice Workforce and

training

Page 5: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

What can we do to accelerate the development of effective data infrastructure needed for 21st

century innovation?

Data-driven Innovation

Data Infrastructure

Technical Infrastructure SW and systems, Tools and algorithms, Hardware and

facilities

Social Infrastructure Policy, Practice, Standards,

Rights, Community culture

Page 6: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Today’s presentation: Emerging trends in the development of effective data infrastructure

for data-driven innovation

“Local” / Institutional

Data Infrastructure What’s needed to

create a 21st century university in the digital age?

Global Data Infrastructure

How do we accelerate open

access data sharing and exchange?

National Data Infrastructure How do we support stewardship and

preservation of valuable research data?

Page 7: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Increasing national focus on data as a driver for innovation in science and society

Fran Berman, Data and Society

Page 8: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Increasing federal R&D agency requirements for managed and accessible research data

Page 9: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

February 2013 OSTP Memo: New policies for public access to research data and publications

Agency Plan Requirements:

• Strategy for capitalizing on what exists and fostering public-private partnerships with scientific journals

• Strategy for increasing / enhancing discoverability, access, dissemination, stewardship, preservation

• Approach for measuring and enforcing compliance

• No new money: “Identification of resources within the existing agency budget to implement the plan”

Page 10: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Page 11: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Infrastructure realities: management, access, and use of data now and in the future presupposes data

stewardship and preservation today

• Stewardship and Preservation critical: “Homeless” data ceases to exist

• Economically sustainable data infrastructure necessary to support

– Federally mandated data management plans

– Public access to research data

– Use and re-use

– Reproducibility

• The “bigger”, longer-term, more complex, or more valuable the data is, the greater the importance of sustainable data infrastructure

Page 12: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Data economics: Responsible data stewardship requires a viable business model for sustaining its

underlying infrastructure Data infrastructure costs increase with usage, stewardship and access requirements, perceived value Greater costs at the extremes (including “big” data) …

“Locally Manageable”

Data

More mgt., stewardship

required

Long-lived data

Big data

Coupled data

services

Access control, broad access

More curation required

Page 13: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

10.0

100.0

1000.0

10000.0

100000.0

June-97 June-98 June-99 June-00 June-01 June-02 June-03 June-04 June-05 June-06 June-07 June-08 June-09

Date

Arc

hiv

al S

tora

ge

(TB

)

Model A (8-yr,15.2-mo 2X) TB Stored Planned Capacity

It’s not just about the cost of storage

• Most valuable data replicated • As research collections increase,

storage capacity must stay ahead of demand

Information courtesy of Richard Moore, SDSC

Resources and Resource Refresh

Data Infrastructure costs include • Maintenance and upkeep

• Software tools and packages

• Utilities (power, cooling)

• Space

• Networking

• Security and failover systems

• People (expertise, help, infrastructure management, development)

• Training, documentation

• Monitoring, auditing

• Reporting costs

• Costs of compliance with regulation, policy, etc. …

SDSC Data Storage Growth ‘97-’09

Page 14: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Economics of public access: Who pays the data bill?

Article: Science Magazine, August 9, 2013. Free public access link at http:/www.cs.rpi.edu/~bermaf/

Page 15: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Op-ed recommendations: Cultivate / coordinate preservation and stewardship options in every sector

Charleston Ballet blog: http://allianceblog.org/tag/charleston-ballet/ ; iTunes gift card

• Evolve research culture to take

advantage of what works in the private

sector

• Create sustainable university library and repository stewardship solutions

• Clarify public sector stewardship

commitments: articulate what data

will / won’t be supported

• Facilitate private sector stewardship of public access research data as a public good

Private Sector

Public Sector

Individuals Academia

Page 16: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Things researchers can do to support their data

Small steps for Research Groups and Projects:

1. If you don’t have one, create a data management plan for your current project for a reasonable fixed term of time. Budget realistically for the costs of data stewardship and preservation.

2. Make your data available to the community (as appropriate) by curating it and ingesting it into a publicly accessible repository

3. Cite and publish your data when you write about your results.

4. Work with your professional societies and conferences to include “data sessions”*

* Suggestion from Sibel Adali, RPI

Page 17: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Local / Institutional infrastructure for data-driven innovation

“Local” / Institutional

Data Infrastructure What’s needed to

create a 21st century university in the digital age?

Global Data Infrastructure

How do we accelerate open

access data sharing and exchange?

National Data Infrastructure How do we support stewardship and

preservation of valuable research data?

Page 18: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Page 19: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

What’s Needed to Create a 21st Century Campus in the Digital Age ?

• Digitally-enabled campus provides information technologies needed to enable 21st century innovation and discovery

• Key questions for institutions:

– What should we support on campus and what should we outsource?

– How can we plan ahead for growth?

– How do we build in flexibility for emerging trends (e.g. MOOCS and distance education, Cloud resources, etc.)?

Critical for success

• Sustainable business model

• Sufficient user support • Plans for

• Recovery from Failure / Disaster

• Strategic Evolution

Page 20: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Work to be done

Finding: “The existing, aggregate, national infrastructure is not adequate to meet current

or future needs of the US open science and engineering research community.”

Finding: “Data volumes produced by most new research instrumentation, including that installed

at the campus lab level, cannot be supported by most current campus, regional, and national networking facilities. There is critical need to

restructure and upgrade local campus networks to meet these demands.”

NSF CISE Advisory Committee Task Force on Campus Bridging (March 2011)

Page 21: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

The digitally enabled campus

• Institutions have the opportunity to create 21st century campus cyberinfrastructure that sustainably supports data-driven innovation:

– Stewardship and preservation of valuable data from campus researchers

– Support for data management plans and public access policy compliance

– Coupling of data stewardship with data-driven computing, data services

– Integration of facilities for data generation, data stewardship, data use

– Potential for revenue from broader use outside of campus

Campus Research

Community

Campus Data Center

Storage, Preservation

University Library

Page 22: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Conversations between campus researchers (data users, creators) and data stewards – the devil is in the details

Researcher: Can you host the digital data from my project?

Steward: • How much data?

• For how long?

• What is the form of the data?

• Who will be in charge of cleaning, annotating, organizing, refreshing, modifying etc. the data?

• Who should have access to the data?

• Can the data be regenerated?

• Who will pay for infrastructure support? Etc. …

Key Issues for Research Data Stewardship Organizations

• Expectations and agreements • What is the “contract” between

users and repositories, between federated partner repositories? Who pays?

• Incentives and penalties • What incentives promote good

stewardship? What happens when the system fails?

• What regulations and policies govern the stewardship relationship?

• Security, privacy, confidentiality, trust • What are appropriate

expectations?

Page 23: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Making it happen -- A stakeholder “dream team”: CIO + VPR + University Librarian

University Librarian: Library community has critical experience with stewardship • Development of reliable

management, preservation and use environments

• Proper curation and annotation

• Understanding of policy, regulation, intellectual property issues

• Culture of collaboration and partnership

• Sustainability

VPR: Research community characterized by a culture of innovation • Periodic new starts

• Experimentation

• Collaboration and competition

• Increasing need for compliance with policy and regulation wrt conduct and outcomes of federally funded research

• CIO: Experience with reliable, at-scale infrastructure – Integrated and reliable support for campus networking, data, computing

environment

– Deep experience with campus-scale IT infrastructure in the context of institutional power, cooling, space constraints

Campus Research

Community

Campus Data Center

Storage, Preservation

University Library

Page 24: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

The “dream team” can help drive institutional culture change, leadership, competitiveness

Institutional infrastructure needed to support 21st century universities

• Research landscape is changing – Traditional academic modes and forms of research recognition

evolving: new approaches to collaboration / competition, publication, citation

• Educational landscape is changing – University curricula evolving; increasing integration of on-line / on-

site options

• Workforce is changing – More data literacy required from everyone – More data science embedded in everything – More complex policy and regulation to deal with

Page 25: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Global data infrastructure

“Local” / Institutional

Data Infrastructure What’s needed to

create a 21st century university in the digital age?

Global Data Infrastructure

How do we accelerate open

access data sharing and exchange?

National Data Infrastructure How do we support stewardship and

preservation of valuable research data?

Page 26: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

21st century innovation: Data sharing drives new discovery

Page 27: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Data sharing is a global issue

Science, Humanities, Arts Communities

Cyberinfrastructure professionals, data analysts, data center staff, …

Data Scientists

Libraries, Archives, Repositories, Museums

Page 28: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Global community-driven organization launched in March 2013 to accelerate data-driven innovation

RDA focus is on building the social, organizational and technical infrastructure to

reduce barriers to data sharing and exchange

accelerate the development of coordinated global data infrastructure

The Research Data Alliance (RDA)

Page 29: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

CREATE ADOPT USE

RDA Members come together as

• Working Groups – 12-18 month efforts to build, adopt, and use specific pieces of infrastructure

• Interest Groups – longer-lived discussion forums that spawn Working Groups as specific pieces of needed infrastructure are identified.

Working Group efforts focus on the development and use of data sharing infrastructure

• Code, policy, infrastructure, standards, or best practices that are adopted and used by communities to enable data sharing

• “Harvestable” efforts for which 12-18 months of work can eliminate a roadblock

• Efforts that have substantive applicability to groups within the data community, but may not apply to everyone

• Efforts for which working scientists and researchers can start today

Page 30: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Technical and Social Data Infrastructure

Broad spectrum of infrastructure needed to support data sharing

Common Metadata Standards

Digital Object Identifiers

Data Discoverability Tools

Data Access and Distribution Policy

Data Sharing Practice

Sustainable Economic Model

Data Services

Data Management Systems

Data Analytics

Data Storage

Data Citation Standards

Preservation Policy

Images: NASA, SDSC / UCSD

Page 31: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Precipitous growth

RDA Launch / First Plenary

March 2013

RDA Second Plenary

September 2013

RDA Third Plenary

March 2014

First RDA organizational telecon: August 2012

Global Data Planning Meeting: October 2012

First Working Groups and Interest Groups

240 participants

First “neutral space” community meeting (Data Citation Summit)

First Organizational Partner Meet-up

First BOFs

380 participants from 22 countries

RDA Fourth Plenary

September 2014

First Organizational Assembly

6 co-located events

14 BOF, 12 Working Groups, 22 Interest Groups

497 participants

RDA Plenary 4 Amsterdam

First Working Group exchange meeting

RDA Plenary 2 Washington, DC

RDA Plenary 1 / Launch Gothenburg, Sweden

RDA Plenary 3 Dublin, Ireland

Page 32: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

The RDA Community Today: Over 2300 members from 96 countries (as of 9/14)

Acad. 63%

Govt. 18%

Private 13%

Other 6%

Map courtesy traveltip.org

North Amer

. 36%

Eur. 50%

Austral-pacific 5%

Africa 3%

South America 1%

Asia 5%

Page 33: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

RDA Interest (IG) and Working Groups (WG) by Focus 1 (as of 9/14)

Domain Science - focused • Toxicogenomics

Interoperability IG

• Structural Biology IG

• Biodiversity Data Integration IG

• Agricultural Data Interoperability IG

• Wheat Data Interoperability WG

• Digital Practices in History and Ethnography IG

• Geospatial IG

• Marine Data Harmonization IG

• Metabolomics IG

• RDA/CODATA Materials Data Infrastructure and Interoperability IG

• Research Data Needs of the Photon and Neutron Science Community IG

• Defining Urban Data Exchange for Science IG*

• The BioSharing Registry: Connecting data policies, standards and databases in the life sciences WG*

• Urban Quality of Life Indicators WG*

Community Needs - focused • Community Capability Model IG

• Engagement IG

• RDA / CODATA Summer Schools in Data Science and Cloud Computing in the Developing World WG*

• Development of Cloud Computing Capacity and Education in Developing World Research IG

• Data for Development IG

• Education and Training on handling of research data IG

* under review

Page 34: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

RDA Interest (IG) and Working Groups (WG) by Focus 2 (as of 9/14)

Data Stewardship and Services - focused • Research Data Provenance IG

• Preservation e-infrastructure IG

• RDA / WDS Publishing Data Services WG

• RDA / WDS Publishing Data Workflows WG

• Long-tail of Research Data IG

• RDA/WDS Publishing Data IG

• RDA/WDS Repository Audit and Certification WG

• Domain Repositories Interest Group

• Brokering Interest Group

• ELIXIR Bridging Force IG*

• Libraries for Research Data IG*

• RDA / WDS Certification of Digital Repositories IG

• RDA / WDS Publishing Data Cost Recovery for Data Centres IG

Reference and Sharing - focused • Data Citation WG

• Standardization of Data Categories and Codes WG

• RDA/CODATA Legal Interoperability IG

• Reproducibility IG*

• Data Description Registry Interoperability Working Group

• RDA / WDS Publishing Data Bibliometrics WG

Base Infrastructure - focused • Data Foundation and Terminology WG • Metadata Standards Directory WG • Practical Policy WG • PID Information Types WG • Data Type Registries WG • Data in Context IG

• Big Data Analytics IG • Data Brokering WG* • Federated Identity Management IG • Metadata IG • PID Interest Group • Service Management IG • Data Fabric IG

* under review

Page 35: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Organizational Members: Alliance for Permanent Access American University Library Australian National Data Service Barcelona Supercomputing Center -

Centro Nacional de Supercomputación Columbia University Library CNRI CSC Digital Curation Center EIROForum IT Working Group eResearch Services and Scholarly

Application Development Division of Information Services, Griffith University European Data Infrastructure (EUDAT) National Institute of Advanced Industrial

Science and Technology (AIST), Japan International Association of STM

Publishers Internet2

Microsoft Research NZ eScience Infrastructure Purdue University Libraries Research Data Canada Scholarly Publishing and Academic

Resources Coalition (SPARC) Washington University in St. Louis

Libraries Science and Technology Facilities Council

Affiliates CODATA ICSU World Data System ORCID DataCite Global Alliance for Genomics and Health CASRAI

Organizations Committed to Joining RDA

Page 36: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

Create a pipeline of data sharing infrastructure efforts

that are adopted and used by communities during their development

that increase their impact through greater adoption over time

Build and expand the research data community for effective impact

globally, regionally, and within constituent groups

Evolve as a useful, relevant, and agile organization

that helps the community capitalize on opportunity and respond to challenges within the data community

RDA Medium-term (3-5 years) Strategic Goals

Page 37: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

RDA/US Goals:

Contribute to RDA “international” efforts and leadership

Bring US efforts to broader RDA community

Build the RDA community within the US

Leverage and implement RDA deliverables in the US to amplify impact

Collaborate closely with other RDA “regions” on key programs and initiatives

RDA/US: Collaborate Globally, Contribute Locally

NSF-supported RDA/US initiatives: • Outreach (RDA RDA/US) • RDA Deliverables Amplification • Student / Early Career Engagement

RDA/US Steering Committee • Fran Berman, RPI • Kathy Fontaine, RPI • Larry Lannom, CNRI • Beth Plale, IU

RDA US membership (yellow states)

Page 38: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

• Next Plenary: March 9-11, 2015, hosted by RDA/US in San Diego

– Working Meeting of the RDA with co-located community meetings

– SDSC hosting RDA Adoption Day on March 8

More information at rd-alliance.org

Page 39: Got Data? Building a Sustainable Environment for …wulibraries.typepad.com/files/bermanslideswuoctober2014.pdfFran Berman Got Data? Building a Sustainable Environment for Data-Driven

Fran Berman

What can we do to accelerate the development of effective data infrastructure needed for 21st

century innovation?

Thank You!