12
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium – Feb 26, 2013 Corporation for National Research Initiatives 1

Information Types and Registries

  • Upload
    herman

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Information Types and Registries. Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium – Feb 26, 2013. Research Data Interoperability. Enabling Technologies. ID. ID. ID. ID. ID. 0100 0101. 0100 0101. 0100 0101. - PowerPoint PPT Presentation

Citation preview

Page 1: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Information Types and Registries

Giridhar ManepalliCorporation for National Research Initiatives

Strategies for Discovering Online DataBRDI Symposium – Feb 26, 2013

1

Page 2: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Research Data Interoperability

2

Scientists, Data Curators,End Users, Applications

EnablingTechnologies

Discovery

Access

Interpretation

ReuseAccessed via Repositories

01000101..

ID

ID

ID

ID

ID

ID

ID

ID

ID

Datasets

01000101..

ID

ID

ID

ID

01000101..

ID

01000101..

ID

01000101..

IDID

ID

Page 3: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Research Data Interoperability (cont.)

• Interoperability of research data allows discovery, access, interpretation, and reuse of datasets by researchers

• Examples• Discovery: A scientist from US “discovers” datasets from research

in Germany, in related or even unrelated domain• Reuse: A scientist from US “re-uses” or processes datasets from

the discovered research in Germany• For interpretation of accessible datasets, Types and Type

Registries play a significant role

3

Page 4: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Information Types – Our Definition

• What they are not:• Programmatic data types (string, integer, double, etc.)• Mime types as normally used (text/xml, application/rdf)

• Types are identifiers that, with the help of associated metadata, characterize data structures used for managing information• Data structures could be at multiple levels of granularity

• Individual observations, to sets of observations within a time series, to multiple time-series sets that explain a phenomenon

• Usually • Spread across multiple files (each with specific mime type)• Distributed on the network (managed by various repositories)

• We call such data structures used for managing information digital objects

• Types (aka type identifiers) are unique across their user base• Types are associated with machine-readable metadata to support

interpretation of information• CNRI’s focus is to support infrastructure for enabling inter-discipline

types4

Type IDMachine Readable

Metadata

Digital Object

Network

Typed Digital Object

File File File File

Page 5: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Value Proposition of Info. Types

• Typing allows • Grouping of digital objects generated in different

times and domains for reasoning and establishing correlations between different types of objects• Grouping is an aspect fundamental to humans for

reasoning about things• Creation of services that can automate

information processing based on information types

• Advanced information processing can be performed for finding unforeseen correlations, trends, etc. • This type of advanced processing has different

names: data-intensive science, fourth paradigm, big-data analytics, etc.

5

Type C

Type A Type B

Typed Digital ObjectCollection

Digital Object

Page 6: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Value Proposition of Info. Types (cont.)

6

SUITE OF SERVICES

Visualization

I AgreeTerms:…

Rights

I AgreeTerms:…

I AgreeTerms:…

Data Set Dissemination1010011010101….

1010011010101….

1010011010101….

Data Processing

1. User requests Type from a Digital Object of interest.

1

5

2. Type ID is returned to the user.

Type Registry

Digital Objects

2

3 4

Interaction

3. User requests the Type Registry for the Type info.

4. Type Info is returned to the user containing Services Info.

5. User requests a Service for processing.

Page 7: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Info. Typing Challenges• Challenge: When are two digital objects assigned the same type?• When the bit-level encoding matches?• Or when the higher-level structures and intent matches?• If two observations are made by two similar instruments at the same time on

the same entity, would the data generated by those two observations be constituted as being of the same type?

• Even if the data generated by each observation, similar in concept, has a different format (e.g., JPEG vs. PNG)?

• Our approach: • Intent wins over optics (formats, encodings, etc.)• The metadata associated with the type could list possible formats, encodings,

etc.• Alternative approach:• Establish a base type and then sub-type for accommodating variations• Our experience was that it was too cumbersome to deal with multiple

formats, encodings at the type definition level 7

Page 8: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Info. Typing Challenges (cont.)

• Challenge: Can the same digital object be assigned multiple types?• If so, how do we deal with duplicate

types?• If not, how do we manage multiple

types assigned by several domains?• Our approach: • An object is assigned an inter-

discipline type only once. • Any domain-specific types are listed in

its metadata

8

Type α Type β

Typed Digital ObjectCollection

Type I

Machine-readableMetadata

Type αType β

Biologist Computer Scientist

Inter-discipline Type

Page 9: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Info. Typing Challenges (cont.)• Challenge: How can existing information be typed under this

new scheme?• A lot of information exists already

• One approach: • Start with domain-specific types, if any, and generate domain-

neutral types and list the domain-specific types in their metadata records

9

Page 10: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Info. Types – Machine-readable Metadata• Machine-readable metadata for Info. Types is still an area of research for us• Type interdependence• It is clear that sub-typing is needed for building on previously defined types• Our experience shows that sub-typing based on variations in formats and

encoding is a cumbersome process• Instead, an exhaustive list of possible formats and encodings may be specified

in the metadata• Domain-specific Types• Cross-domain Types could list or point at domain-specific types which could be

multiple for a given object, and which might define detailed semantics for interpretation

• Metadata for automated interpretation• For the few types of information we prototyped, defining metadata that helps

services process datasets is loose ended and sometimes impractical• A parsing-language or a pseudo-code may instead be captured that transforms

datasets into domain-specific ontologies or semantics10

Page 11: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Info. Type Registries• Info. Type Registries are metadata registries that• Support recording of information types and associated metadata

records• Perform federation across other registries• De-duplicate (or match types) to control registration requests of

existing types• Include manual moderation and/or crowd sourcing function for

spotting redundant registrations (optional)• Cross-domain Type Registries may optionally link to domain-

specific Type Registries• Type Registries may manage or reference services that process

information of certain types• CNRI has vast experience building metadata registries 11

Page 12: Information Types and Registries

Corp

orati

on fo

r Nati

onal

Res

earc

h In

itiati

ves

Next Steps• Received Sloan Foundation funding to research Type Registries

within scientific and financial communities• CNRI employees lead and participate in a Type Registry

working group within the Research Data Alliance• Technical goal is to define the scope of ‘Information Type’ by

working in aforementioned projects, and build and release an open-source Type Registry in the next 18 months.

12