27
OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Embed Size (px)

Citation preview

Page 1: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

OSG Public Storage Project Summary

Ted Hesselroth

October 5, 2010

Fermilab

Page 2: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Provenance• 2006: Acquire capability to allocate storage to VOs• 2007: SRM 2.2 – space reservation– Intensely tested, debugged, documented by OSG Storage– Space reservation cleanup tool.

• 2008: Partial adoption– Used by Atlas, not used by CMS– Difficult to set up in dCache– Not supported in Bestman Gateway

• 2009: Increased use of opportunistic storage• 2010: Blueprint meeting -renewed request/storage appliance– Requirements doc signed off– Design doc

Page 3: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Feedback from VOs

• Difficult to use– A large number of steps must be done by a user in order to run

jobs using public storage.– Access to storage may not be available as advertised in the BDII.– There are difficulties in moving and tracking large numbers of

files when they are treated as independent entities.• Suggests requirements beyond those taken from the Blueprint

meeting

Page 4: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Outcome of Requirements Process

• OSG Production Coordinator– Grant allocations to VOs. Resize, extend, rescind.– Clean up expired allocations

• VO Administrator– Request allocations. Resize, extend, rescind.– Make suballocations for users (and datasets).– Run access checker tool.– Clean up expired allocations.

• VO Member– Read, write, delete, (copy, and list) files.

• Includes registration update (and allocation enforcement).

– Define datasets.– Replicate, delete datasets.

• Site Administrator– Help clean up allocation.– Set limits on number of files, concurrent connections.

Page 5: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Constraints on the Design• No alteration of Storage Elements

– Access continues to be through current clients.• No centralized OSG service

– Software to be operated by VO• Accommodate usage outside the service

– Use of traditional means will not have an adverse effect

Page 6: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Design Summary• Database will store info on

– Allocations– Replicas– Logical Namespace– Monitoring– Registered Users

• Database will have web-service front end– Invocation through wrapper scripts on the command line

Page 7: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Site registration/monitoring

• VO Administrator installs software (database, frontend, scripts).• OSG Production Coordinator uses tool to register storage areas.

– Discovery tool shows total and used space, VOs authorized.– Public storage areas are registered in Allocation database.

• VO Administrator runs monitoring tool– Discovers storage areas for which the VO is authorized– Checks access with probes– Results saved in Monitoring database

Page 8: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Making Allocations

• VOA makes a request to the PC asking for space– May optionally specify storage resource

• PC checks allocations and selects a storage resource.– Tool queries Allocation and Monitoring databases, shows resources

• that can meet the allocation parameters• that the VO can access

– New allocation object is made in database– All other VO’s Allocation databases are updated.

• PC informs VOA of new allocation

Page 9: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Using Allocations –writeUser invokes a script which does all of these

• Local file is specified as the source• Logical full path is specified as the destination• An allocation is selected

– that has sufficient space– that has less than the maximum number of files– that is not expired– that is currently accessible to the VO

• The destination URL is composed• VDT client tools are used to write the file• The allocation is updated• The file logical path is registered in the Namespace catalog• The file replica is registered in the Replica catalog

Page 10: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Defining and using Datasets

• Allow operations on sets of files• Especially useful for copying and deleting• Could have hook into classads to trigger processing after upload• In the Namespace catalog

– A file or directory is tagged with a dataset handle– A file belongs to a dataset if it or one of its ancestors has a tag

• In the Replica catalog– There are dataset replica objects– Has list of member files which replicas that storage resource

• In transfer operations– List is composed from dataset replica– Resource selection done on the basis of total size

Page 11: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Defining Suballocations

• Similar procedure to defining allocations– But done by VOA, at the request of a user– Selection is made from the VOs allocations

• Can have suballocation for dataset– Similar to space reservation

• Space accounting tracks both the suballocation and its parent– Suballocation counted against parent when made– Writes/deletes update suballocation remaining space

• Users can clean up their own expired suballocations

Page 12: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Implementation

• Database– SQL for table creation, update, query– Postgres

• Web Service– Simple http-based– Message-level security– Access using curl in wrapper scripts– Possibly use Bestman for read, write, delete, copy, ls

Page 13: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Code Assets• From the OSG Discovery Tool

– Discovery of storage areas– Wrapper scripts for Java clients’– Maven-build capability in the VDT.

• From the Pigeon Tool– Monitoring capability.

• From Bestman– SRM reference implementation– Close support.

• From MCAS– Restful web service– Development methodology– Fitnesse test suite.

• From OSG Storage– Postgres install script from the VDT installation package for dCache. – Java methods for WSS fast message-level security

Page 14: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Development Strategy

• Agile Methodology– Stakeholders involved from the beginning– Demonstrate new features every two weeks– Stakeholders test and give feedback– Frequent developers meetings for progress, short-term planning

• Continuous Integration– Test with every commit– Packaging is part of build– Nightly build available to stakeholders– Continuous documentation

Page 15: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Timeline• October

– Detailed implementation planning.– Infrastructure setup: twiki, issue tracker, code repository, build system, continuous

integration and test system.– Kickoff meeting.

• November– SQL for database creation. Start SQL for updates and queries. Shell wrappers for SQL.– Pigeon integration. Installation scripts for testers.

• December– Finish SQL for updates and queries. Database performance testing/tuning.– Setup webservice code: stubs, queuing mechanism, Java SQL configuration.

• January– Web service installation script. Java SQL wrappers. Java methods for functions. SRM

methods for transfer commands.• February

– GSI authentication for web service. Performance testing of web service.– End-to-end load testing. Reports capability

• March– VDT packaging. – ITB testing.– Post-facto registration and cleanup. RPMs for Operations Toolkit.

Page 16: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Unknowns

• Post-facto accounting.– We have reason to believe it can be done through gratia but

have not tested this.• Performance tuning

– depending on the results of tests.• Requirements creep

– stakeholders to emphasize/deemphasize various elements, or request additional features.

Page 17: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Software not used

• iRODS• SRM Space Reservation• Existing transfer services• Alien• REDDNET

Page 18: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Extra slides

Page 19: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Database Schema

Page 20: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

To provide storage space for non-owner VOs, sites generally allocate an untended common storage area authorized for several OSG VOs. While some VOs have availed themselves of those resources, the experience of the Engage and OSG Storage goups is that there are a number of barriers to its effective use. A large number of steps must be done by a user in order to run jobs using public storage. Access to storage may not be available as advertised in the BDII. There are difficulties in moving and tracking large numbers of files when they are treated as independent entities. A single VO may use up all the public storage on a site, preventing other VOs from having access. Without reportable information on the state and use of public storage, it is difficult to present its value to sites and other VOs.

Page 21: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

The OSG Production Coordinator, OSG Storage, and the Engage and LIGO VOs have recognized a need for software to manage space allocations and data transfer for public storage on the Open Science Grid. OSG Storage produced and circulated a requirements document describing its capabilities, which was approved by the stakeholders after a comment period.

Page 22: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

OSG Storage surveyed existing software and designed a service to meet the requirements. The service is to be deployed by VOs which make use of public storage, and allows VOs to track their use of allocations on public storage areas which are assigned to them by the Production Coordinator. The service also maintains a catalog of the VO's files, to allow cleanup of expired allocations and to support storage operations on sets of files. Finally, a monitoring component is included so that allocation and resource selection can be done for storage areas that are accessible to the VO.

Page 23: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

We estimate it will take one FTE six months to write the service. This includes code/integration for the user interfaces and database operations, a VO test installation package, a VDT-compatible build and installation method, and documentation. We should have the participation of the OSG Production Coordinator and VO representatives throughout the development process, to exercise features as they become available and provide feedback on user experience, software performance, and documentation quality. This requires about 5 % per participant., and a deployment resource for one instance of the service per VO. We would not need participation of site administrators until near the end of the development process; an estimated one-half day would be asked of volunteers at that point. This assumes that the OSG-owned Storage Elements on Gridworks will be available as test endpoints. One non-virtual node should be provided for the developers' test instance of the new service.

Page 24: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

On the specifics of what needs to be done, the design uses a database back end and anticipates wide area access through a GSI-authenticated web service front end. We would need to write SQL scripts to create, update, and query the database tables, and Java wrappers for access through the web service. For the front end we have the option of the Bestman reference implementation or a lightweight http endpoint with message-level security. Scripts for installation, startup, and command-line interface need to be written. The Pigeon access checker would be used for monitoring; it would need an add-on to write to the database. There is a good start on the documentation, as a twiki page (https://twiki.grid.iu.edu/bin/view/Storage/OSGPublicStorage) has absorbed much of the requirements and design information.

Page 25: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

We will build upon software assets acquired in the past. From the current work: vetted requirements and a thorough database schema and operations design. From the OSG Discovery Tool: discovery of storage areas, wrapper scripts for Java clients, and a maven-build capability in the VDT. From the Pigeon Tool, monitoring capability. From Bestman, the Bestman reference implementation and close support. From MCAS, a Restful web service, a development methodology, and the Fitnesse test suite. From OSG Storage: a Postgres install script from the VDT installation package for dCache. Also from the current work is Java methods WSS fast message-level security.

Page 26: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Unknowns are as follows. We have a requirement to do post-facto accounting. We have reason to believe it can be done through gratia but have not tested this. Performance tuning may be required, depending on the results of tests. While the requirements have been approved on paper, experience with the software may cause stakeholders to emphasize/deemphasize various elements, or request additional features.

Page 27: OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab

Timeline October -Detailed implementation planning. Infrastructure setup: twiki, issue tracker, code repository, build system, continuous integration and test system. Kickoff meeting. November -SQL for database creation. Start SQL for updates and queries. Shell wrappers for SQL. Pigeon integration. Installation scripts for testers. December -Finish SQL for updates and queries. Database performance testing/tuning. Setup webservice code: stubs, queuing mechanism, Java SQL configuration. January -Web service installation script. Java SQL wrappers. Java methods for functions. SRM methods for transfer commands. February -GSI authentication for web service. Performance testing of web service. End-to-end load testing. Reports capability. March -VDT packaging. ITB testing. Post-facto registration and cleanup. RPMs for Operations Toolkit.