Status report from the TWG/CCIT to the CEWG

Status report from the TWG/CCITto the CEWG

2009-08-12

Dave Vieglais and Ryan Scherle

TWG Overview

• Activity

• Two Meetings

• Weekly (or so) telecon

• Time contributions by Duane and Mark

• Significant outcomes thus far

• Project infrastructure (plone sites, svn)

• Use cases, interactions, requirements

• Discussions, especially Identifiers and Identity

• Student projects

Architecture

• Process (Meta-architecture > Conceptual A. > Logical A.)

• Use Cases

• Functional requirements

• Interfaces and interactions

• Prototyping

• Core pieces

• Fluff

• Iterative process (somewhat)

• Identify and resolve issues at all stages

• Limited resources - so important to get design right

Use Cases

• Identified major categories and obvious use cases early

• Subsequently expanded to 34 or so

• Diagrams developed to illustrate interactions for each use case

• Capture desired system functional requirements

• APIs identified, getting more stableNetwork Preservation Federated IdentityObject

ManagementDiscovery and Use

HeartbeatGUID

ReplicationIdentity ProviderAuthentication

CreateRead

AuthorizationData use policy

UpdateDelete

LoggingNotification

HealthCapacity

ValidationMigration

WorkflowOntology

Provenance

Use Case Issues

• UC2 "Get list of GUIDs from metadata search"

• Can queries be done at MN with equivalent results?

• Where is result filtering based on access privileges performed?

• Authentication issue - if search across many nodes, then where is identity resolved

• UC3 "Registration of a new member node"

• Should new nodes be registered with specified trust levels?

Use Case Issues (2)

• UC4,5 "Create/Update/Delete metadata record in Member Node."

• What is the policy on archival copies of data and metadata? (Can data packages be deleted? Published packages modified?)

• UC12 "User Authentication - Person via client software authenticates against Identify Provider to establish session token."

• Where is identity stored? MN? CN? Combination of all?

Use Case Issues (3)

• UC24 "Transactions - CNs and MNs should support transaction sets where operations all complete successfully or get rolled back (e.g., upload both data and metadata records)."

• Do transactions span multiple MNs, CNs?

• UC27 "CN should support forward migration of metadata documents from one version to another within a standard and to other standards."

• 20+ metadata standards

• How to handle lossy conversions?

Use Case Issues (4)

• UC28 "Relationships/Versioning - Derived products should be linked to source objects so that notifications can be made to users of derived products when source products change."

• Who asserts these relationships? How are relationships managed?

• UC31 "Manage Access Policies - Client can specify access restrictions for their data and metadata objects. Also supports release time embargoes."

• Group management has an important, perhaps unusual temporal component.

Coordinating Node Requirements

• CNs provide a central role in infrastructure ∴ critical to identify functional and non-functional requirements early

• Non-exhaustive list of 21 requirements (so far)

• e.g.:

• “Coordinating Node services should be designed to be independently scalable.”

• “Data packages are not discoverable through any public interface until all Coordinating Nodes have confirmed that they have a copy of the corresponding metadata document.”

• “Metadata searches should return in a maximum of “xxx” seconds.”

General conclusions

The member nodes come with a diverse set of technologies and practices. The coordinating nodes will need to be very permissive while providing quality services.

History/versioning:

• Keep all versions of metadata, so we can see where it came from (and metadata doesn't take much storage)

• The original data package should always be stored. Transformed versions may be needed for some operations of the coordinating nodes.

• It may be too much of a burden to store all versions of a data file.

Identity, Authentication, Authorization

• MN & CN security services necessary to

• preserve and verify integrity of data packages (in D1)

• prevent malicious intent or inappropriate access

• Six identity / security models in industry:

• Centralized (LDAP)

• Distributed directories (LDAP + referrals)

• Distributed management and replication (LDAP + replication)

• Grid Security Infrastructure proxy certificates

• Open ID

• Shibboleth + InCommon

Identities

Types of users:• non-authenticated user• registered user (at member node)• registered user (DataONE central)• group member• site manager (for harvests, system operations, etc.)• change request approval workflow• owner of intellectual property rights

Privileges:• access/modify both data and metadata • Member Node Write• create/execute system functions• access logged information

Metadata Standards

• 20 or so relevant standards

DC, DwC, EML, CSDGM, GCMD-DIF, ISO 19137:2007, NeXML, WaterML, Genbank-FFF, ISO 19115, GML, CDF, DDI, GEML, ESML, CSR, ESG, ECHO, ...

• Conversion between standards is a lossy process

• Issues of compatibility in metadata storage across MNs

• Original metadata will be stored unchanged

• Need to define metadata standard that will be used to support search and discovery operations (CN)

Search Terms

Identifiers

• Fundamental component of entire architecture

• Many schemes (handle, LSID, PURL, ...), each with advantages and faults

• Not practical for DataONE to dictate single identifier scheme across all Member Nodes

• Feasible to require that identifiers are unique across all participating MNs

• However, not feasible to assume that all MNs will support all identifier schemes

• Key question: Must an identifier always resolve to the same sequence of bits? Or should it be more abstract?

PrototypesBy November 2009 meeting (hmm...):

• Member Node contributes metadata to Coordinating Node using GUID

• CN initiates replication of data object from MN to MN

• Logging for instrumentation and usage

• Update data object (revision) by Member Node

Others targets, in order of importance:

• Replication of metadata and system information between CNs

• Failover and load balancing between CNs

• Formalize all service API specs. using a language agnostic IDL

• Comparison and evaluation of existing systems/standards/protocols used by prototype implementations

• Authentication and authorization using LDAP (initial impl.)

• Search portal user interface using Coordinating Node metadata content

• Heartbeat/state of health services

• Registry services using, perhaps, a simple list as an initial method

• Stress and load testing

Current activities

• Wrapping up this year’s student internships.

• Addressing the general questions arising out of the use case diagrams (some of these questions will be discussed at the coordination meeting)

• Developing a report on identifier usage.

• Creating APIs to be used in prototypes.

Hurdles

• Resources & Contributors

• Identity, authentication, authorization

• Identifiers

• Rules for data handling and archive (what is data?)

• Metadata extraction

• CN replication

Feedback from CEWG

What is the vision for access management to DataONE, and how much of that will be left up to member nodes?

• Answer: Data providers must "establish trust" to publish/modify content.

• What does "establish trust" entail? Is there a technical component?

• Who are “data providers”? The member nodes or the end users?

Open Questions

• What policies should we have for managing DataONE documents?

• What properties should we enforce regarding identifiers?

• What are the minimum requirements for a member node to join the DataONE community? Or, how accommodating should we be?

• Can we identify some member nodes that will implement all best practices and serve as models for the other member nodes?

• How much data should we expect to handle? It is unclear what the uptake curve will be, but this has major implications for our architectural planning.

• Do we want/need a registry of name spaces for identifiers?

• Is it reasonable to store replicas using the ID scheme of the secondary member node, as long as the coordinating nodes are capable of resolving the original identifier to the correct location?

• What types of access control should be allowed?

• What time constraints are we under?

Open Questions (2)

• Can the CEWG produce some science-oriented use cases that augment our current technical cases?

• Will member nodes be willing to use central DataONE services and/or create adapters that allow their services to communicate with DataONE?

• Are there technologies that are widely used across the member node community? If so, these would be promising targets, as we could create a small number of adapters that could be used for a large number of member nodes.

• What are the high-value member nodes, for which we must provide custom adapters?

Status report from the TWG/CCIT to the CEWG

Documents

CEWG : CURRENT TREND INFORMATION

MENU - TWG Tea

NYCLAC Report Standardization Project · NYCLAC Report Standardization Project BioTWG Firearms TWG Crime Scene TWG Latent TWG . DE TWG QD TWG. Drug TWG Tox TWG . TWGfire Trace TWG

CEWG June 2014, Patterns and Trends of Drug Abuse in Chicago

JURISDICTION DETAILS PRINCIPAL CCIT JAIPUR Region

RA-CN300-CCIT (3)

CEWG June 2011 Volume II Epidemiologic Trends in Drug Abuse

JURISDICTION DETAILS PRINCIPAL CCIT WEST …office.incometaxindia.gov.in/kolkata/Documents/Jurisdiction/... · jurisdiction details principal ccit west bengal & sikkim region ccit:

TWG I Progres

USAID Report Template (Letter Size) - GH Supply Chain€¦ · TWG 1: Senior management/operations TWG 6: Logistics TWG 2: Finance TWG 7: Quality assurance TWG 3: Monitoring and evaluations

TWG -- February 21, 2016ieuvi.org/TWG/Resist/2016/20160221Meeting/14_CNSE...2016/02/21 · SPIE TWG 2-21-16 1 What Don’t We Know About EUV Exposure Mechanisms? TWG -- February 21,

:::yJt - Income Tax Department · 5 Calendars 8 List of Holidays 9 Personal Information 11 Pr. CCIT(Exemptions) 71 Pr. DGIT (I&CI) 72 Pr. CCIT (International Taxation) 74 Pr. CCIT

Twg art portfolio

Illuminating Computer Science CCIT 4-6Sep aast/en/colleges/ccit/cs4hs

WPCampus - Sheridan CCIT Case Study

TWG 6 DRAFT.pdf

1 TCIP TWG Kickoff Meetings Signal Control & Prioritization TWG December 17, 2003 Signal Control & Prioritization TWG December 17, 2003

T+2 Communications and Education Working Group (CEWG ...ccma-acmc.ca/en/wp-content/uploads/CEWG-MeetingPackage-2016-… · T+2 Communications and Education Working Group (CEWG) Proposed

TWG Broshure

TWG Newsletter November