24
iRODS: integrated Rule Oriented Data System Ray Idaszak Director , Collaborative Environments RENCI University of North Carolina at Chapel Hill

IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Embed Size (px)

Citation preview

Page 1: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS:integrated Rule

Oriented Data System

Ray IdaszakDirector , Collaborative Environments

RENCIUniversity of North Carolina at Chapel Hill

Page 2: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS

• Integrated Rule-Oriented Data System– What It Is

• Origins, How it works, What’s different about it

– Why It Is• Context, Role it serves

– Where It’s Going (Today, Future)• Funding, Key efforts

Page 3: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Talk Outline

• Integrated Rule-Oriented Data System– What is the Integrated Rule-Oriented

Data System?• Origins, Technology, How it works

– Why It Is• Context, Role it serves

– Where It’s Going (Today, Future)• Funding, Key efforts

Page 4: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

What’s Different about iRODS?

• iRODS lets you manage your data with your rules and in your way…

Against a backdrop of federatable

community data worldwide

via Policies

Page 5: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Background

• Integrated Rule-Oriented Data System– Open-source initiative that represents 12+ years of

development and over $10M of NSF grant funding– Another $8M+ funding pending (via NSF DataNet)

• Collaboration between– UNC Chapel Hill

• Data Intensive Cyber Environments group (DICE)

– RENCI• State-funded Cyberinfrastructure Institute at UNC Chapel Hill

– San Diego Supercomputing Center

Page 6: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Data and Policy Virtualization

RENCI/cuahsi/modeling

The iRODS Data Grid installs in a “layer” over storage systems, so you can view, manage, access, add, and share part or all of your data in a unified Collection.

Utah State Univ/cuahsi/catalog

User Sees Single “Virtual Collection”/cuahsi/catalog

/cuahsi/modeling/cuahsi/terrain

SDSC/cuahsi/terrain

User Client Views & Manages

Data Data Grid

Page 7: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Using a Data Grid - Details

iRODS ServerRule Engine

• Data request goes to 1st Server

iRODS ServerRule Engine

iRODS Server Rule Engine

• Server looks up information in Catalog (applies rules)

• Catalog responds 3rd Server has data

• 1st Server peer-to-peer asks 3rd Server to serve up data

• 3rd Server applies rules and serves data

• User asks for data using logical properties (client-server)

iCAT Metadata

Catalog

RENCI SDSC USU

Page 8: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Using a Data Grid – NEAR FUTURE (DB Resource)

iRODS ServerRule Engine

• Query goes to 1st Server

iRODS ServerRule Engine

iRODS Server Rule Engine

• Server looks up information in Catalog (applies rules)

• Catalog responds that 3rd Server has SQL db

• 1st Server sends 3rd Server SQL query

• 3rd Server applies rules and serves query result

• User not running SQL Server locally makes query

iCAT Metadata

Catalog

USURENCI SDSC

MySQLPostgreSQL

Oracle

Page 9: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Example Clients & Client Interfaces (i.e. iRODS is client agnostic)

• C library calls - Application level• .NET - Windows client API• Unix shell commands - Scripting languages• Java I/O class library (JARGON) - Web services• SAGA - Grid API• Web browser (Java-python) - Web interface• Windows browser - Windows interface• WebDAV - iPhone interface• Fedora digital library middleware - Digital library middleware• Dspace digital library - Digital library services• Parrot - Unification interface• Kepler workflow - Grid workflow• Fuse user-level file system - Unix file system

iDrop- Drag and drop GUI- User actions can be

mapped to policies

Page 10: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Policies

• iRODS is described as a “Policy-based” data management system

• Policy def’n: A proposed or adopted course of action– ergo iRODS associates a “course of action” for all data

• Pre- and Post- “Policy Enforcement Points” (PEP)– Pre: Course of action for data coming into iRODS– Post: Course of action for data going out of iRODS

Page 11: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Policies

• Retention, disposition, distribution, arrangement• Authenticity, provenance, description• Integrity, replication, synchronization• Deletion, trash cans, versioning• Archiving, staging, caching• Authentication, authorization, redaction• Access, approval, IRB, audit trails, report generation• Assessment criteria, validation• Derived data product generation, format parsing• Federation

Page 12: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Rule Engine, Workflows• iRODS has its own built-in imperative interpreted

programming language called the Rule Engine• The iRODS Rule Engine executes Microservices• An iRODS “program” is called a Workflow

– A Microservice is one “step” of an iRODS Workflow– iRODS Workflows are executed on the iRODS Server– Arbitrary external WEB-SERVICES can be one “step” of

an iRODS Workflow• Encapsulated as a microservice

Page 13: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Microservices• Microservices are written in C and provide:

Well, really anything that can be done in C, and that’s in part what makes iRODS so extensible, but typically:– Standard operations; e.g. file or format conversion– Queries on metadata catalog– Interaction with web services– Triggering external HPC workflows– Remote and delayed execution control

• Microservices communicate through– Arguments, session variables, user space variables, etc.

Page 14: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Differentiating Workflows• iRODS data grid workflows

– Low-complexity, a small number of operations compared to the number of bytes in the file

– Server-side workflows– Data sub-setting, filtering, metadata extraction

• Grid workflows– High-complexity, a large number of operations

compared to the number of bytes in the file– Client-side workflows– Computer simulations, pixel re-projection

Page 15: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

A few more iRODS notes…• Authentication

– GSI (PKI), Kerberos, Shibboleth, Challenge-response• Authorization

– Roles, user groups, resource groups, policy constraints, ACLs• Transport

– TCP/IP (parallel I/O streams), Reliable Blast UDP• Metadata catalog

– PostgreSQL, mySQL, Oracle• Distributed rule engine

– Scheduler, messaging system, execution engine, rule base

Page 16: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Talk Outline

• Integrated Rule-Oriented Data System– What is the Integrated Rule-Oriented Data

System?• Origins, Technology, How it works

– Why is there an Integrated Rule-Oriented Data System?

• Context, Role it serves

– Where It’s Going (Today, Future)• Funding, Key efforts

Page 17: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Entire Data Life Cycle: The iRODS Vision

ProjectCollection

Private

LocalPolicy

DataGrid

Shared

DistributionPolicy

DigitalLibrary

Published

DescriptionPolicy

DataProcessing

Pipeline

Analyzed

ServicePolicy

ReferenceCollection

Preserved

RepresentationPolicy

Federation

Sustained

Re-purposingPolicy

Each data life cycle stage increases the value and usability of the original collection

Jeff gets data from a sensor

Jeff shares data with colleagues

Together w/ colleagues, analyzes data and produces results

Results peer-reviewed and published

Jeff et. al. hit jackpot: collection now accepted as ref collection for decades

Hydrology Datagrid grows in value to ecology and biology and federated

Page 18: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Talk Outline

• Integrated Rule-Oriented Data System– What is the Integrated Rule-Oriented Data System?

• Origins, Technology, How it works

– Why is there an Integrated Rule-Oriented Data System?

• Context, Role it serves

– Where Is iRODS going Today and in the Future?

• Funding, Key efforts

Page 19: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS: Future

• Pending 2011 NSF DataNet– DataNet Federation Consortium (DFC)

• Includes CUAHSI!! (and several others)

• RENCI: Creating an “Enterprise” version of iRODS– http://

iren-web.renci.org/irods-meeting/[email protected]

Page 20: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Summary• iRODS fills an important niche

– Differentiation: It’s a Policy-driven distributed data management system formally supporting the entire Data LifeCycle

• E.g. an iRODS DataGrid is a vehicle to fulfilling NSF’s Data Management Plan requirement at the community scale

– Classification: Middleware

• iRODS is not intended to be all encompassing, but rather work with other DataNets, Workflow Engines, systems like CUAHSI HIS, etc. in canvasing a National Cyberinfrastructure– i.e. Falls primarily in the “Data Services/Storage” portion of NSF’s

Data Enabled Science description

• With iRODS, the community is still responsible for:– Schema, data formats, defining policies, defining web interfaces,

building analysis and knowledge tools, etc.

Page 21: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Credits

Principal InvestigatorsRichard Marciano, Reagan Moore (PI), Arcot Rajasekar

Additional ContributorsWilliam Sims Bainbridge, Leesa Brieger, Luis Carriço, Sheau-Yen Chen, Michael

Conway, Jason Coposky, Vijay Dantuluri, Antoine de Torcy, Wei Ding, Kevin Gamiel, Lucas Gilbert, Nuno Guimarães, Chien-Yi Hou, Bernard J. ( Jim) Jansen, Oleg

Kapeljushnik, Mounia Lalmas, Christopher A. Lee, Xia Lin, Gary Marchionini, Cathy Marshall, Jason Reilly, Meredith Ringel Morris, Stefan Rüger, Wayne Schroeder, Michael Stealey, Lisa Stilwell, Jaime Teevan, Paul Tooby, Michael Wan, Bing Zhu

Page 22: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Credits

Research Supported By NSF ITR 0427196, Constraint-Based Knowledge Systems for Grids,

Digital Libraries, and Persistent Archives (2004–2007) NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From

Vision to Reality—Developing Scalable Data Management Infrastructure in a Data Grid-Enabled Digital

NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From Vision to Reality—Research Prototype Persistent Archive Extension (2006–2007)

NSF SDCI 0721400, SDCI Data Improvement: Data Grids for Community Driven Applications (2007–2010)

NSF/NARA OCI-0848296, NARA Transcontinental Persistent Archive Prototype (2008–2012)

Page 23: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

iRODS Credits

For More Information

http://www.irods.orghttp://diceresearch.org/

http://dice.unc.edu/http://www.renci.org/news/releases/renci-teams-with-dice

Page 24: IRODS: integrated Rule Oriented Data System Ray Idaszak Director, Collaborative Environments RENCI University of North Carolina at Chapel Hill

Thank You.

http://www.renci.org