30
ICSTI/ITOC 15 October 2013 Larry Lannom Research Data Alliance Corporation for National Research Initiatives Corporation for National Research Initiati RESEARCH DATA ALLIANCE

ICSTI/ITOC 15 October 2013 Larry Lannom

Embed Size (px)

DESCRIPTION

ICSTI/ITOC 15 October 2013 Larry Lannom Research Data Alliance Corporation for National Research Initiatives. RESEARCH DATA ALLIANCE. Corporation for National Research Initiatives. DAITF: Enabling Technologies 21 March 2012 - PowerPoint PPT Presentation

Citation preview

Page 1: ICSTI/ITOC 15 October 2013 Larry  Lannom

ICSTI/ITOC

15 October 2013

Larry Lannom

Research Data AllianceCorporation for National Research Initiatives

Corporation for National Research InitiativesRESEARCH DATA ALLIANCE

Page 2: ICSTI/ITOC 15 October 2013 Larry  Lannom

DAITF: Enabling Technologies

21 March 2012

Larry LannomCorporation for National Research Initiatives

http://www.cnri.reston.va.us/http://www.handle.net/

Page 3: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Enabling Technologies

ID

010001010010011011010101001101010000

ID

010001010010011011010101001101010000

IDID

IDID

IDID

IDID

010001010010011011010101001101010000

ID

Datasets

ID

Scientists, Data Curators,End Users, Applications

Page 4: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Accessed via Repositories

Enabling Technologies

01000101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets

01000101..

ID

ID

ID

ID

01000101..

ID

01000101..

ID

01000101..

IDID

ID

Scientists, Data Curators,End Users, Applications

Page 5: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Scientists, Data Curators,End Users, Applications

EnablingTechnologies

Discovery

Enabling Technologies

Accessed via Repositories

01000101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets

01000101..

ID

ID

ID

ID

01000101..

ID

01000101..

ID

01000101..

IDID

ID

Page 6: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Discovery & Evaluation

• Search– Metadata registries

• Subject• Parties• Dates• Etc

– Crawlers – more ad hoc

• Citation– Formats

• Permissions– Can I see it?– Can I use it?

• Trust

Page 7: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Scientists, Data Curators,End Users, Applications

Discovery

Access

Enabling Technologies

Accessed via Repositories

01000101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets

01000101..

ID

ID

ID

ID

01000101..

ID

01000101..

ID

01000101..

IDID

ID

EnablingTechnologies

Page 8: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Access

• ID / reference resolution– Go from ‘subject search’ to ‘known item’ search

• Access Protocols– How to get it– Protocol registries– Bootstrapping into new protocols

• Authentication & Authorization– Proof of identity (tradeoff: usability vs security)– Permissions: with the object or in some external system?

Page 9: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Scientists, Data Curators,End Users, Applications

EnablingTechnologies

Discovery

Access

Interpretation

Enabling Technologies

Accessed via Repositories

01000101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets

01000101..

ID

ID

ID

ID

01000101..

ID

01000101..

ID

01000101..

IDID

ID

Page 10: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Interpretation• Registries

– Schemas– Vocabularies– Formats– Available services– Useful client-side tools

• Trust– Who did this?– Who owns this?

• Provenance– Data Source– Processing steps– Computing environment

• what is needed to trust the numbers?• Domain specific?

Page 11: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Scientists, Data Curators,End Users, Applications

EnablingTechnologies

Discovery

Access

Interpretation

Reuse

Enabling Technologies

Accessed via Repositories

01000101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets

01000101..

ID

ID

ID

ID

01000101..

ID

01000101..

ID

01000101..

IDID

ID

Page 12: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

Reuse

• Everything from Interpretation slide + Permissions– Example from BOF: I need to understand a data set for peer review

but that doesn’t give me permission to use the data

• Validation• Education & Training

– Integrate ‘live’ data into education and training

• Repurpose data

Page 13: ICSTI/ITOC 15 October 2013 Larry  Lannom

Corporation for National Research Initiatives

DAITF Roles?

• Bring good people together on a regular basis to discuss these issues

• Get agreement on vocabulary for discussing data access and interoperability?

• Working groups on specific topics– Prototyping specific interoperability issues / domains

• Create high-level framework, ala OAIS? Multiple frameworks?

• Guides to Registries and Best Practices

Page 14: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Plenary 2 Update

Dr. Francine BermanChair, RDA/US

Hamilton Distinguished Chair in Computer ScienceRensselaer Polytechnic Institute

Page 15: ICSTI/ITOC 15 October 2013 Larry  Lannom

15

RDA Plenary 2 368 participants from 22

countries and all sectors

All-hands stakeholder talks and RDA working meeting

Data Citation Summit convened by DataCite, FORCE11,CODATA/ICST, ESIP, DCC, etc. to create a common agenda

~5000 tweets over 3 days

RDA Plenary 2 -- September 16-18, Washington D.C. -- 3 days of Peace, Love and Data

Page 16: ICSTI/ITOC 15 October 2013 Larry  Lannom

16RDA Community Current Status: ~1300 participants from 50+ countries

1. Albania2. Australia3. Austria4. Bangladesh5. Belgium6. Bolivia7. Botswana8. Brazil9. Bulgaria10.Canada 11.China12.Congo

{Democratic Rep}

13.Costa Rica14.Czech

Republic15.Denmark16.Estonia17.Finland

18.France19.Germany20.Greece21.Iceland22.India23.Iran24.Ireland25.Ireland

{Rep}26.Italy27.Japan28.Krygrystan29.Kuwait30.Mexico31.Netherlands32.New Zealand33.Norway34.Palestine35.Poland36.Portugal

37.Russian Federation

38.Rwanda39.Serbia40.Singapore41.Slovenia42.South Africa43.South Korea44.Spain45.Sweden46.Switzerland47.Taiwan48.Turkey49.United Arab

Emirates 50.United

Kingdom51.United States52.Vatican City53.Venezuela

RDA by Sector

Academics (66%)Private Sector (10%)Public Sector (17%)Unknown (7%)

Fran Berman

Page 17: ICSTI/ITOC 15 October 2013 Larry  Lannom

17

Growth in number and scope of Interest Groups and Working Groups New: BOFs for groups as precursor to

Interest Groups

Groups beginning to “self-monitor” to promote concrete deliverables to be used and adopted

Increasing interest in more interaction and “connective tissue” between groups

Pressing To-Dos before Plenary 3: Develop an RDA policy for IP that comes up

in Interest and Working Groups

Determine the form of RDA deliverables and what’s needed in terms of an “RDA archive”

RDA Community Building Momentum

Page 18: ICSTI/ITOC 15 October 2013 Larry  Lannom

18

Birds-of-a-Feather Linked Data Chemical Safety Data Education and Skills

Development in Data Intensive Science

Libraries and Research Data

Cloud Computing and Data Analysis Training for the Developing World

Working Groups Data Type Registries Metadata Standards Practical Policy Persistent Identifier Types Data Foundations and

Terminology Data Categories and

Codes

Interest Groups Agricultural Data Big Data Analytics Data Brokering Certification of Trusted

Repositories (joint with ICSU-WDS)

Long tail of Research Data

Marine Data Harmonization

Community Capability Model

Data Publishing (joint with WDS)

Toxicogenomics Interoperability

Research Data Provenance

Data Citation Metadata

Economic Models and Infrastructure for Federated Materials Data Management

Engagement Preservation e-

Infrastructure Legal Interoperability (joint

with CODATA) Global Registry of

Trusted Data Repositories and Services

Digital Practices in History and Ethnography

Data Citation Harmonization Summit DataCite,FORCE11,

CODATA/ICST, ESIP, DCC, etc.

Groups that Met at the RDA Plenary

BOLD = new since last Plenary

Page 19: ICSTI/ITOC 15 October 2013 Larry  Lannom

19

Organizational Assembly = Organizational Members (subscription) + Organizational Affiliates (MOUs).

Organizational Advisory Board will representOrganizational Assembly.

Current Status: Organizational Membership under

discussion with Microsoft, IBM, ANDS, Australian Antarctic Data Center, Intersect, Terrestrial Ecosystems Research Network, CSC – IT, Center for Science Ltd., Oracle, STFC, CNRI, STM, EUDAT, Barcelona Supercomputer Center, Columbia University Libraries / Information Services,

and many more after the Plenary

Organizational Affiliation under discussion with CODATA, WDS and others

Next 6 months (before Plenary 3)

Firm up model for Affiliates (how many, how substantive should the interaction be?)

Complete creation of legal entity to host subscriptions for Organizational Members

Elect Organizational Advisory Board at Plenary 3

RDA Organizational Partners New RDA constituencies / stakeholders

Page 20: ICSTI/ITOC 15 October 2013 Larry  Lannom

20RDA Constituent Groups Coming Together

New Position: RDA recruiting for full-time Secretary- General

RDA Colloquium (National Research Agencies and Funders)

RDA Membership

RDA Council (overarching leadership)

Technical Advisory Board

(Technical oversight)

Secretary-General and Secretariat

(Administration and Operations)

Organizational Advisory Boards

and Organizational Assembly

(Organizational partnerships and

guidance)

Working Groups and Interest Groups(impact - focused infrastructure)

Page 21: ICSTI/ITOC 15 October 2013 Larry  Lannom

21

Plenary 3 will be in Dublin March 26-28 in 2014, hosted by Australia and Ireland

Plenary 4 will be in the Netherlands – late September in 2014

Plenary 5 or 6 likely back in the U.S. (west coast?)

Next Plenaries (2X a year)

Page 23: ICSTI/ITOC 15 October 2013 Larry  Lannom

Data Type Registries (DTR)

Co-ChairsLarry Lannom: CNRIDaan Broeder: MPI

September 2013

RDA Plenary 2Washington, DC

Page 24: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Corporation for National Research Initiatives

• Data Types– Characterize data structures at multiple levels of granularity– Formats are just part of the story– Optimize interactions between data producers & consumers by

having types defined and associated with the data they describe– Types should be standardized, discoverable, and unique

• Type Registries– Each type registered with unique identifier– Common data model and expression– Associate with services, tools, format registries, etc.– Common API for machine consumption

Goal: Interoperable Set of Data Type Registries

Page 25: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Corporation for National Research Initiatives

• 3/2013 – 9/2013– Gathering use cases– Investigating other work in the area– First drafts of data model and functional specs for a type registry

• 10/2013 – 12/2013– Refine data model and functional specs– Deploy initial prototype

• 1/2014 – 5/2014 – Finalize data model and functional specs– Deploy functional type registry for PID types– Release turnkey registry conforming to functional specs

Schedule

Page 26: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Corporation for National Research Initiatives

• Broad Functional Classification– Repos hold widely varying levels of data & metadata – High-level functional classification of the identified object needed to make sense of what is

available, e.g., data object, metadata, repo description, contact info, etc.

• Simple License Information via PID Resolution– Data set access conditions cannot be predicted based on ID– For DataCite DOIs, a handle/type/value triple could be used to provide access information,

probably through a level of indirection, resulting in a pop-up or intervening page or open linked data

• Object Types as a Short-cut for Dependent Services to Match Processing Requirements to Data Objects

– Using data acquisition as an example • Determine object type you are trying to build• Consult registry to index into an ontology to dynamically define required and optional properties• Does the input data have what is needed?

• Registration of PID Types (in ID/Type/Value triples) for Data Processing and Interpretation

– Distinguish pointers to objects from pointers to metadata from pointers to services– Enable complex client interactions as opposed to simple one-to-one re-direction

DTR Use Cases

Page 27: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Corporation for National Research Initiatives

Users

Typed Data

ID

Type

Payload

ID

Type

Payload

ID

Type

Payload

ID

Type

Payload

ID

Type

Payload

ID

Type

Payload

Federated Set of Type Registries

1010011010101….

VisualizationI Agree

Terms:…

Rights

Services

Data ProcessingData SetDissemination

Client (process or people) encounters unknown type1

Resolved to Type Registry2

Response includes type definitions, relationships, properties, and possibly service pointers. Response can beused locally for processing, or, optionally

3

Typed data or reference to typed data can be sent to service provider4

1

23

4

4

One Use of Type Registries

Page 28: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Corporation for National Research Initiatives

A Few Words About CNRI

• Not-for-profit organization formed in 1986 to foster research and development for the National Information Infrastructure (now internationally focused)

• Major focus on management of information on networks: Digital Object Architecture– Handle System– DO Repository– DO Registry

Page 29: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Corporation for National Research Initiatives

• Research Project: Early 90s– Initial US-funded digital library project (DARPA)

• Library/Publishing: late 90s through 00s and continuing to grow– DSpace – turnkey digital library platform (MIT + HP)– Digital Object Identifier (DOI) for journal articles– International from the start, including Asia

• Breaking out of the publisher/library ghetto: starting late 00s– Scientific data

• Australian National Data Service (ANDS)• Max Planck (handles)• DataCite (DOIs)• EPIC (European Persistent Id Consortium)• EUDAT

– Entertainment Industry• EIDR (DOIs)

• Threshold of use and dependence brings governance and sustainability Issues– Who is CNRI? How long will they be around?– Who is in charge?– Not just a standards issue due to the global service (cf DNS)

Handle System Adoption by Domain

Page 30: ICSTI/ITOC 15 October 2013 Larry  Lannom

Research Data Alliance Corporation for National Research Initiatives

• Spread Responsibility and Control from One Group to Many– Involve stakeholders– Develop financial sustainability plan

• Develop an organizational model– Try to balance long-term and short-term incentives– Try to keep the organization from being captured by minority and/or moneyed interests– Build in flexibility

• Independence from individual governments or industry players• DONA Foundation

– Non-profit being established in Switzerland– Peer group of stakeholders will run and financially support the global infrastructure – Board of Directors will provide high-level guidance– CNRI will transfer relevant rights and technology to the Foundation and continue as 1/N

stakeholders– Each stakeholder has identical responsibilities to the Foundation but otherwise

independent• Governments could participate and provide their support out of general revenues• Industry could create appropriate business models

– Formation in process, near term completion– Longer range objective is Digital Object Architecture approach to information system

interoperability

Infrastructural Governance and Sustainability