Upload
marjory-hubbard
View
215
Download
1
Embed Size (px)
Citation preview
iRODS:integrated Rule
Oriented Data System
Ray IdaszakDirector , Collaborative Environments
RENCIUniversity of North Carolina at Chapel Hill
iRODS
• Integrated Rule-Oriented Data System– What It Is
• Origins, How it works, What’s different about it
– Why It Is• Context, Role it serves
– Where It’s Going (Today, Future)• Funding, Key efforts
iRODS Talk Outline
• Integrated Rule-Oriented Data System– What is the Integrated Rule-Oriented
Data System?• Origins, Technology, How it works
– Why It Is• Context, Role it serves
– Where It’s Going (Today, Future)• Funding, Key efforts
What’s Different about iRODS?
• iRODS lets you manage your data with your rules and in your way…
Against a backdrop of federatable
community data worldwide
via Policies
iRODS Background
• Integrated Rule-Oriented Data System– Open-source initiative that represents 12+ years of
development and over $10M of NSF grant funding– Another $8M+ funding pending (via NSF DataNet)
• Collaboration between– UNC Chapel Hill
• Data Intensive Cyber Environments group (DICE)
– RENCI• State-funded Cyberinfrastructure Institute at UNC Chapel Hill
– San Diego Supercomputing Center
iRODS Data and Policy Virtualization
RENCI/cuahsi/modeling
The iRODS Data Grid installs in a “layer” over storage systems, so you can view, manage, access, add, and share part or all of your data in a unified Collection.
Utah State Univ/cuahsi/catalog
User Sees Single “Virtual Collection”/cuahsi/catalog
/cuahsi/modeling/cuahsi/terrain
SDSC/cuahsi/terrain
User Client Views & Manages
Data Data Grid
Using a Data Grid - Details
iRODS ServerRule Engine
• Data request goes to 1st Server
iRODS ServerRule Engine
iRODS Server Rule Engine
• Server looks up information in Catalog (applies rules)
• Catalog responds 3rd Server has data
• 1st Server peer-to-peer asks 3rd Server to serve up data
• 3rd Server applies rules and serves data
• User asks for data using logical properties (client-server)
iCAT Metadata
Catalog
RENCI SDSC USU
Using a Data Grid – NEAR FUTURE (DB Resource)
iRODS ServerRule Engine
• Query goes to 1st Server
iRODS ServerRule Engine
iRODS Server Rule Engine
• Server looks up information in Catalog (applies rules)
• Catalog responds that 3rd Server has SQL db
• 1st Server sends 3rd Server SQL query
• 3rd Server applies rules and serves query result
• User not running SQL Server locally makes query
iCAT Metadata
Catalog
USURENCI SDSC
MySQLPostgreSQL
Oracle
Example Clients & Client Interfaces (i.e. iRODS is client agnostic)
• C library calls - Application level• .NET - Windows client API• Unix shell commands - Scripting languages• Java I/O class library (JARGON) - Web services• SAGA - Grid API• Web browser (Java-python) - Web interface• Windows browser - Windows interface• WebDAV - iPhone interface• Fedora digital library middleware - Digital library middleware• Dspace digital library - Digital library services• Parrot - Unification interface• Kepler workflow - Grid workflow• Fuse user-level file system - Unix file system
iDrop- Drag and drop GUI- User actions can be
mapped to policies
iRODS Policies
• iRODS is described as a “Policy-based” data management system
• Policy def’n: A proposed or adopted course of action– ergo iRODS associates a “course of action” for all data
• Pre- and Post- “Policy Enforcement Points” (PEP)– Pre: Course of action for data coming into iRODS– Post: Course of action for data going out of iRODS
iRODS Policies
• Retention, disposition, distribution, arrangement• Authenticity, provenance, description• Integrity, replication, synchronization• Deletion, trash cans, versioning• Archiving, staging, caching• Authentication, authorization, redaction• Access, approval, IRB, audit trails, report generation• Assessment criteria, validation• Derived data product generation, format parsing• Federation
iRODS Rule Engine, Workflows• iRODS has its own built-in imperative interpreted
programming language called the Rule Engine• The iRODS Rule Engine executes Microservices• An iRODS “program” is called a Workflow
– A Microservice is one “step” of an iRODS Workflow– iRODS Workflows are executed on the iRODS Server– Arbitrary external WEB-SERVICES can be one “step” of
an iRODS Workflow• Encapsulated as a microservice
iRODS Microservices• Microservices are written in C and provide:
Well, really anything that can be done in C, and that’s in part what makes iRODS so extensible, but typically:– Standard operations; e.g. file or format conversion– Queries on metadata catalog– Interaction with web services– Triggering external HPC workflows– Remote and delayed execution control
• Microservices communicate through– Arguments, session variables, user space variables, etc.
Differentiating Workflows• iRODS data grid workflows
– Low-complexity, a small number of operations compared to the number of bytes in the file
– Server-side workflows– Data sub-setting, filtering, metadata extraction
• Grid workflows– High-complexity, a large number of operations
compared to the number of bytes in the file– Client-side workflows– Computer simulations, pixel re-projection
A few more iRODS notes…• Authentication
– GSI (PKI), Kerberos, Shibboleth, Challenge-response• Authorization
– Roles, user groups, resource groups, policy constraints, ACLs• Transport
– TCP/IP (parallel I/O streams), Reliable Blast UDP• Metadata catalog
– PostgreSQL, mySQL, Oracle• Distributed rule engine
– Scheduler, messaging system, execution engine, rule base
iRODS Talk Outline
• Integrated Rule-Oriented Data System– What is the Integrated Rule-Oriented Data
System?• Origins, Technology, How it works
– Why is there an Integrated Rule-Oriented Data System?
• Context, Role it serves
– Where It’s Going (Today, Future)• Funding, Key efforts
Entire Data Life Cycle: The iRODS Vision
ProjectCollection
Private
LocalPolicy
DataGrid
Shared
DistributionPolicy
DigitalLibrary
Published
DescriptionPolicy
DataProcessing
Pipeline
Analyzed
ServicePolicy
ReferenceCollection
Preserved
RepresentationPolicy
Federation
Sustained
Re-purposingPolicy
Each data life cycle stage increases the value and usability of the original collection
Jeff gets data from a sensor
Jeff shares data with colleagues
Together w/ colleagues, analyzes data and produces results
Results peer-reviewed and published
Jeff et. al. hit jackpot: collection now accepted as ref collection for decades
Hydrology Datagrid grows in value to ecology and biology and federated
iRODS Talk Outline
• Integrated Rule-Oriented Data System– What is the Integrated Rule-Oriented Data System?
• Origins, Technology, How it works
– Why is there an Integrated Rule-Oriented Data System?
• Context, Role it serves
– Where Is iRODS going Today and in the Future?
• Funding, Key efforts
iRODS: Future
• Pending 2011 NSF DataNet– DataNet Federation Consortium (DFC)
• Includes CUAHSI!! (and several others)
• RENCI: Creating an “Enterprise” version of iRODS– http://
iren-web.renci.org/irods-meeting/[email protected]
Summary• iRODS fills an important niche
– Differentiation: It’s a Policy-driven distributed data management system formally supporting the entire Data LifeCycle
• E.g. an iRODS DataGrid is a vehicle to fulfilling NSF’s Data Management Plan requirement at the community scale
– Classification: Middleware
• iRODS is not intended to be all encompassing, but rather work with other DataNets, Workflow Engines, systems like CUAHSI HIS, etc. in canvasing a National Cyberinfrastructure– i.e. Falls primarily in the “Data Services/Storage” portion of NSF’s
Data Enabled Science description
• With iRODS, the community is still responsible for:– Schema, data formats, defining policies, defining web interfaces,
building analysis and knowledge tools, etc.
iRODS Credits
Principal InvestigatorsRichard Marciano, Reagan Moore (PI), Arcot Rajasekar
Additional ContributorsWilliam Sims Bainbridge, Leesa Brieger, Luis Carriço, Sheau-Yen Chen, Michael
Conway, Jason Coposky, Vijay Dantuluri, Antoine de Torcy, Wei Ding, Kevin Gamiel, Lucas Gilbert, Nuno Guimarães, Chien-Yi Hou, Bernard J. ( Jim) Jansen, Oleg
Kapeljushnik, Mounia Lalmas, Christopher A. Lee, Xia Lin, Gary Marchionini, Cathy Marshall, Jason Reilly, Meredith Ringel Morris, Stefan Rüger, Wayne Schroeder, Michael Stealey, Lisa Stilwell, Jaime Teevan, Paul Tooby, Michael Wan, Bing Zhu
iRODS Credits
Research Supported By NSF ITR 0427196, Constraint-Based Knowledge Systems for Grids,
Digital Libraries, and Persistent Archives (2004–2007) NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From
Vision to Reality—Developing Scalable Data Management Infrastructure in a Data Grid-Enabled Digital
NARA supplement to NSF SCI 0438741, Cyberinfrastructure; From Vision to Reality—Research Prototype Persistent Archive Extension (2006–2007)
NSF SDCI 0721400, SDCI Data Improvement: Data Grids for Community Driven Applications (2007–2010)
NSF/NARA OCI-0848296, NARA Transcontinental Persistent Archive Prototype (2008–2012)
iRODS Credits
For More Information
http://www.irods.orghttp://diceresearch.org/
http://dice.unc.edu/http://www.renci.org/news/releases/renci-teams-with-dice