Technical Report GriPhyN-2002-10 - Department of …avery/griphyn/reports/2004/griphyn... · Web viewThe prototype was based upon a ROOT based data analysis and did not at that time

GriPhyN Technical Report June 30 2004 http://www.griphyn.org

GriPhyN Annual Report for 2003 – 2004

The GriPhyN Collaboration

NSF Grant 0086044

1

Paul Avery University of Florida Co-DirectorIan Foster University of Chicago Co-Director

http://www.griphyn.org/

Table of Contents1 Overview................................................................................................................................32 Computer Science Research Activities...............................................................................4

2.1 Virtual Data Language and Processing System............................................................42.2 Grid Workflow Language.............................................................................................82.3 Grid Workflow Planning..............................................................................................82.4 Research on Data placement planning........................................................................112.5 Policy-based resource sharing and allocation mechanisms........................................122.6 Workflow Execution and Data Movement.................................................................122.7 Peer-to-Peer Usage Patterns in Data Intensive Experiments......................................132.8 Resource Discovery....................................................................................................142.9 Grid Fault Tolerance and Availability........................................................................142.10 Prediction....................................................................................................................152.11 GridDB: A Data-Centric Overlay for Scientific Grids...............................................17

3 VDT – The Virtual Data Toolkit.......................................................................................183.1 VDT Development, Deployment, and Support Highlights........................................183.2 The VDT Build and Test Process...............................................................................193.3 VDT Plans..................................................................................................................20

4 GriPhyN Activities in ATLAS...........................................................................................204.1 Application Packaging and Configuration with Pacman............................................204.2 Grappa: a Grid Portal for ATLAS Applications.........................................................214.3 Virtual Data Methods for Monte Carlo Production....................................................21

5 GriPhyN Activities in CMS...............................................................................................225.1 Data Challenge 04 and Production of Monte Carlo Simulated CMS Data................225.2 Grid Enabled Analysis of CMS Data..........................................................................235.3 Future Plans for CMS Activities in GriPhyN.............................................................28

6 GriPhyN Activities in LIGO..............................................................................................296.1 Building a Production Quality Analysis Pipeline for LIGO's Pulsar Search Using

Metadata-driven Pegasus............................................................................................306.2 Lightweight Data Replicator.......................................................................................39

7 GriPhyN Activities in SDSS..............................................................................................408 Education and Outreach....................................................................................................41

8.1 New UT Brownsville Beowulf System......................................................................418.2 VDT software installation and Grid Demos...............................................................418.3 Portal Development....................................................................................................418.4 Grid applications in gravitational wave data analysis................................................418.5 Teacher Assessment Workshop..................................................................................428.6 Grid Summer Workshop.............................................................................................428.7 QuarkNet – GriPhyN Collaboration: Applying Virtual Data to Science Education. .448.8 A New Initiative: Understanding the Universe Education and Outreach (UUEO)....458.9 Milestones for year 2004............................................................................................46

9 Publications and Documents Resulting from GriPhyN Research.................................4710 References...........................................................................................................................50

2

1 Overview1

GriPhyN is a collaboration of computer science researchers and experimental physicists who aim to pro-vide the IT advances required to enable Petabyte-scale data intensive science in the 21st century. The goal of GriPhyN is to increase the scientific productivity of large-scale data intensive scientific experiments through a two-pronged Grid-based approach:

Apply a methodical, organized, and disciplined approach to scientific data management using the concept of virtual data to enable more precisely automated data production and reproduction.

Bring the power of the grid to bear on the scale and productivity of science data processing and management by developing automated request scheduling mechanisms that make large grids as easy to use as a workstation.

Virtual data is to the Grid what object orientation is to design and programming. In the same way that ob-ject orientation binds method to data, the virtual data paradigm binds the data products closely to the transformation or derivation tools that produce data products. We expect that the virtual data paradigm will bring to the processing tasks of data intensive science the same rigor that scientific method brings to the core science processes: a highly structured, finely controlled, precisely tracked mechanism that is cost effective and practical to use.

Similarly for the Grid paradigm: we see grids as the network OS for large-scale IT projects. Just as today the four GriPhyN experiments use an off-the-shelf operating system (Linux, mainly), in the future, our goal is that similar projects will use the GriPhyN VDT to implement their Grid. The road to this level of popularization and de facto standardization of GriPhyN results needs to take place outside of and continue beyond the time frame GriPhyN; hence the partnerships that we will create between GriPhyN and other worldwide Grid projects are of vital importance.

One of GriPhyN’s most important challenges is to strike the right balance in our plans between research – inventing cool “stuff” – and the daunting task of deploying that “stuff” into some of the most complex scientific and engineering endeavors being undertaken today. While one of our goals is to create tools so compelling that our customers will beat down our doors to integrate the technology themselves (as they do with UNIX, C, Perl, Java, Python, etc), success will also require that we work closely with our experi -ments to ensure that our results do make the difference we seek. Our metric for success is to move Gri -PhyN research results into the four experiments to make a difference for them, and demonstrate the ability of science to deal with high data volumes efficiently.

Achieving these goals requires diligent and painstaking analysis of highly complex processes (scientific, technical, and social); creative, innovative, but carefully focused research; the production of well-pack -aged, reliable software components written in clean, modular, supportable code and delivered on an an -nounced schedule with dependable commitment; the forging of detailed integration plans with the experi-ments; and the support, evaluation, and continued improvement and re-honing of our deployed software.

This report is being written at the start of GriPhyN’s 33rd month of funding. Our understanding of the problems of virtual data tracking and request execution have matured, and our research software for test-ing these concepts has now been applied to all participating experiments. The Virtual Data Toolkit (VDT) has increased in capability, robustness, and ease of installation, continues to prove itself as an effective technology dissemination vehicle, being adopted by many projects worldwide including the international LHC Computing Grid project. We demonstrated several important milestones in GriPhyN research and its application to the Experiments at Supercomputing 2003 in November 2003, where in collaboration with our Trillium partners we used real science scenarios from HEP, LIGO, and SDSS to demonstrate the ap -plication of virtual data management and grid request planning and execution facilities on the large Grid3 distributed environment. We demonstrated large virtual-data-enabled pipelines running galaxy cluster

1 This overview section contains material from previous GriPhyN reports.

3

finding algorithms on Sloan Digital Sky Survey data, and have applied virtual data techniques and track-ing databases on ATLAS Monte Carlo collision event simulations. We have moved the CMS event simu-lation environment close to production readiness by greatly increasing the event production rates and reli-ability. Notably, another research project is now using GriPhyN technology for BLAST genome compari-son challenge problems.

We have continued to push our CS research forward, particularly in the areas of extending the scope of the virtual data paradigm, simulating and testing mechanisms and algorithms for policy-cognizant re-source allocation, and continued to deploy and stress test scalability enhancements to various Globus Toolkit components (e.g., its replica location service and GRAM job submission system), Condor –G, and DAGman.

Our project continues to be managed by project coordinators Mike Wilde of Argonne and Rick Ca-vanaugh of the University of Florida. Alain Roy of the University of Wisconsin, Madison, manages Vir-tual Data Toolkit development, and other staff manage distinct project areas. Technical support for the project and for the development and enhancement of web communication resources and conferencing fa-cilities has been supplied by staff at the University of Florida.

During this time, we have held an all-hands collaboration-wide meeting in conjunction with the iVDGL project, with broad participation from researchers and graduate students, application scientists and com-puter scientists. We have also conducted smaller, more focused workshop-style meetings on specific CS research topics and CS-experiment interactions.

2 Computer Science Research ActivitiesComputer science research during the past 12 months of the GriPhyN project has focused on the follow-ing primary activities:

Development, packaging, and support of twelve new releases of the Chimera Virtual Data System and the Pegasus planner

Request planning and execution: algorithms and architectural issues

Understanding the architectural interactions of data flow, work flow management, and scheduling

Data scheduling research into advance data placement algorithms and mechanisms

Request execution – continued to enhance, harden, and tune grid execution mechanisms

Understanding fault tolerance for the grid: metrics and strategies

Resource sharing policy algorithm design, simulation, and prototyping

Performance prediction of data grid workloads in specific GriPhyN application domains

Development of languages for specific aspects of Grid computing: database workflow and grid data flow.

In this section we review our progress in each of these areas.

2.1 Virtual Data Language and Processing SystemThe GriPhyN virtual data language, VDL, continues to evolve and experience expanding usage across all four GriPhyN experiments, as well as in other projects that collaborated with the project via iVDGL and the Grid2003 effort (including BTeV, and the NCSA “PACI Data Quest” project in computational biol-ogy and astronomy)

GriPhyN research in the area of virtual data during this period continued to focus on the packaging, sup-port, tuning and functional enhancement of the virtual data language processing stack.

4

During this period, the virtual data language and the Chimera Virtual Data System were enhanced and supported for the application to all GriPhyN experiments. The team also provided support for the use of Chimera and the VDT to problems in computational biology and astronomy, as part of the NCSA Al-liance Expedition “PACI Data Quest”.

Twelve editions of the Virtual Data System component of the VDT were created and released, and an -other release is posted in release-candidate state at the time of this writing. The features provided in these releases included the following highlights:

Attributes were provided to specify run-time handling for virtual data arguments – whether or not data transfer was required, and whether or not cataloging of each output data product was re-quired. These features, driven by application needs, enabled the system to address a wider set of application needs. For example, some applications (the first of which was SDSS galaxy cluster finding) required that the data transfer and cataloging stage after a transformation executes should depend on whether or not the application actually produced a file. This feature allows for such se-mantics, where an application may optionally decide to create an output file depending on condi-tions present in its inputs.

Compound transformations were enhanced to simplify their interface, eliminating the need to placed internal temporary files into the argument list of the outer compound.

The uniform execution environment capture mechanism, named “kickstart” was significantly en-hanced in a new version that can be more flexibly, scalably and transparently integrated into ap-plication workflows. This utility launches applications in a uniform manner across different schedulers/resource managers, and captures important details about the runtime environment and behavior of the application. This information is then returned with each job and placed into the virtual data catalog to complete the “provenance trail” of each Grid job execution and of the re -sulting data files. Support for Cygwin (for Windows execution) and for LCG1 was added; ability to create necessary directories on the fly. New version under development to provide stat infor-mation on arbitrary transformation input and output files, multiple pre-, post-, init-, and cleanup jobs, optional md5sums on files, 2nd level staging (for LCG2), improved XML provenance records, and is config-file driven to overcome command line limitation.

Transfer scripts enhanced and generalized to permit a wider range of transfer utilities and tech-niques to be used at workflow runtime, including support for the efficient batching of the files needed by each job, a local file symlinking option to avoid reduncant file copies, new POSIX-threaded implementation to permit multiple simultaneous transfers, and failed transfer retries. A new data transfer utility under development supports: an input file of transfer directives, added logical filename for logging purposes, and multiple choices for input sources.

Added features to the abstract workflow generator (“gendax”): local variable declarations for compound statements, an optimized abstract DAG format, improved logical filename attributes for data transfer control

A greatly improved Java API for direct embedding of VDL processing into portals and other user interfaces, an “ant”-driven build process, and additional Java documentation files for packages.

The chimera-support email list was used to support the virtual data system tools, with the support load be-ing jointly shared between the University of Chicago and ISI.

Major enhancements were made in the virtual data catalog and the run time environment for virtual data jobs, to record the detailed execution environment and run-time behavior of invoked transformations. An execution wrapper named “gridlaunch” was created to give each executed transformation a uniform run-time environment, and to record the details of the invocation’s return code and exit status, to record the

5

amount of system resources consumed by the invocation, and to capture any log and effort output pro-duced by the invocation.

Work on datasets and a supporting dataset typing system has continued during this period, but has proven to be an exceedingly difficult problem and has not yet reached the stage where results can be included into the virtual data system. This area will continue to be a focus of our virtual data research in the next project year.

Research in virtual data at U of Chicago has been conducted by Yong Zhao, Ian Foster, Jens Voeckler, and Mike Wilde. This research has been focused on virtual data discovery and metadata management, and has yielded several developments:

a web-based visual interface for the discovery and management of virtual data definitions

support for user-provided metadata in the virtual data model and catalog

support for a unified treatment of services and application programs as virtual data transforma-tions.

The current virtual data system offers a set of command-line-based interfaces to interact with the virtual data system, where users across all these domains need an interactive environment where they can easily discover and share virtual data products, compose workflows and monitor their executions across multi-ple grid sites, and integrate virtual data capabilities into their specific user communities and science appli -cations. Moreover, it is also nontrivial to set up and configure the virtual data system and tie it together with grid compute and storage resources.

These considerations led Yong Zhao to develop, apply, and evaluate Chiron, a “Grid portal” designed to allow for convenient interaction with the virtual data system, and to provide an integrated, interactive en -vironment for virtual data discovery and integration. [Chiron] The portal has been used extensively by the Quarknet project as the platform for their physics education applications.

The Chiron virtual data portal is an integrated environment that provides a single access point to virtual data and grid resources. It allows users to

manage user accounts and track user activities;

describe and publish applications and datasets;

discover, validate, share and reuse applications and datasets published by other users;

configure and discover Grid resources;

compose workflows and dataset derivations;

execute workflow either locally or in a Grid environment and monitor job progress; and

record provenance information.

Chiron provides both user level and service (middleware) level functionalities that portal users can use to build customized virtual data interfaces for specific user communities and science applications.

The architecture of the Chiron interface is depicted below, along with workflows captured from several applications using the GriPhyN Virtual Data Language:

6

Figure 1 Chiron System Architecture

Figure 2 Workflows for CompBio, SDSS, Quarknet, and TerraServer

Yong also successfully integrated web services invocation into Chimera, so that VDL could provide a common abstract description to both applications and web services, and users can compose workflows based on the abstract description, invocate applications or web services in the planning stage, and track

7

Chiron Virtual Data Portal

Web Browser

Tomcat Web ServerJava Server Pages

Chimera API & Libraries

GraphViz

Shell Planner

Pegasus

Condor-GDAGMan

MDS

RLS

Grid SiteSE/GridFTP

CE/Condor

Grid SiteSE/GridFTP

CE/PBS

Grid SiteSE/GridFTP

CE/Other

VDC

the provenance of both. I'm working on prototyping VDL2 that has a type system and would allow intelli -gent workflow composition, and dataset management.

The lifecycle of virtual data consists of description, discovery, validation, and reuse, and the key to virtual data sharing is discovery. Metadata facilitates flexible and powerful discovery of virtual data. Yong en-hanced the virtual data system with metadata management capabilities, which allow users to associate metadata attributes with virtual data objects and search for them in a SQL style query language.

2.2 Grid Workflow LanguageGridflow (or Grid Workflow), the work of Reagan Moore and Arun Jagatheesan at SDSC, is a form of workflow that is executed on a distributed infrastructure shared between multiple autonomous partici-pants, according to a set of local domain policies and global grid policies. Since the resources are shared, the execution of these tasks has to obey constraints created by the participants. Scientific applications ex-ecuted on the grid often involve multiple data processing pipelines that can take advantage of the gridflow technology.

A language to describe scientific pipelines needs an abstract description of execution infrastructure, to fa -cilitate late binding of resources in a distributed grid environment. The language has to provide the ability to describe or refer to virtual data, derived data sets, collections, virtual organizations, ECA (event-condi-tion-access) based rules, gridflow patterns, and fundamental data grid operations. Even though traditional workflow languages are available, they don’t provide a way to describe the fundamental data grid opera -tions as part of the language itself. In addition, the language needs to support data types like logical col-lections or logical data sets for location and infrastructure independent descriptions of gridflow pipelines. Additional requirements we have observed based on production data grid projects is the description of simple ECA rules that govern the execution of the gridflows and a way to query another executing grid-flow as part of the gridflow language it self.

The SDSC Matrix Project, supported by GriPhyN, carries out research and development to create a grid-flow language capable of describing collaborative gridflows for the emerging grid infrastructures. The GriPhyN Data Grid Language (DGL) developed by GriPhyN IT research is currently used in multiple data grid projects at SDSC. DGL is a XML Schema based language that defines the format of gridflow requests and gridflow responses. DGL borrowed well-known computer science concepts in: Compiler De-sign (recursive grammar, scoped variables and execution stack management), Data Modeling (Schema definitions for all logical data types), Grid Computing (Fundamental data grid operations) and other fea-tures like XQuery to query the gridflow execution. DGL currently supports the following gridflow pat-terns: sequential flow, parallel flow, while loop, for loop, milestone execution, switch-by-context, concur-rent parameter sweep, n of m execution, etc.. It allows description of pre-process and post-process rules to be executed, very similar to triggers in databases.

As part of the drive to make DGL an accepted standard for gridflow description, query and execution, we are investigating the integration of DGL with other languages like BPEL [BPEL], to describe web service based process pipelines and VDL [VDL] to describe virtual data within DGL. Researchers from SDSC and ISI collaborated on the development of research prototypes to compile DGL at runtime into a format called DAX (DAG in XML). This format serves as input for the GriPhyN Pegasus Planner [PEGS] and can be used to start DGL computational processes in the grid using Condor-G [CONG]. The acceptance of GriPhyN DGL in multiple data grid projects could lead it to become a viable abstract assembly lan-guage for grid computing [SCEC].

2.3 Grid Workflow PlanningDuring the third year of GriPhyN research at ISI focused on exploring the elements of the request plan-ning space. In this period, this work made many advances; it was led by Ewa Deelman, and involved the efforts of Gaurang Mehta and Karan Vahi.

8

Deferred planning has been incorporated into the Pegasus framework. This mode of Pegasus defers the mapping of the jobs to the time that they are ready to be run. It consists of two steps

Partitioning the workflow

Creating a Condor MegaDAG to run the partitions

2.3.1 Partitioning the workflowPartitioning involves subdividing the workflow into smaller workflows(partitions) by a partitioner. Cur-rently, the partitioner can partition the workflow in the following two ways

Level based partitioning: In this approach, a modified version of breadth first search of the abstract workflow is performed.

Single node partitioning: Each job in the abstract workflow is treated as a partition. This is being used to integrate Euryale functionality into Pegasus. Such partitioning refers to delaying the decision of mapping to just before a job is about to run.

Figure 3 shows a breadth-first partitioning:

Figure 3: A simple workflow partitioning scheme. The Resulting abstract workflow determines the lookahead for Pegasus.

2.3.2 Creation of “MegaDAGs”In the default deferred planning mode, each partition is run as a DAGMan job . The DAGMan job main -tains the flow of execution for that partition. Hence, one has to create a mega condor dag that runs Pega-sus to create the concrete condor submit files for the partition, and then run that partition as a DAGMan job. The structure of the MegaDAG is determined by the partition graph, which is generated during the partitioning of the original workflow. The partition graph identifies the relationships between the parti-tions.

9

Figure 4: A Example of a MegaDAG.

The above approach is expected to work well in the case of medium to large sized partitions. However, for partitions with just one node, the overhead of running a DAGMan job for it can be significant com-pared to the execution time of the job itself. Currently we are investigating other ways to incorporate this into MegaDAG, where the single node partition would be run as a condorg job and not a DAGMan run -ning a Condor-G job.

2.3.3 Retry Mechanism for deferred planning A retry mechanism is also being incorporated in deferred planning case that would allow us to retry at a partition level, if any failures occur during the execution of that partition. The existence of an error occur-ring would be determined both from kickstart and Condor log files.

Currently, this has been prototyped as follows. A prescript checks the condor log files for any failure. If any error is found in the condor log files, then the pool of resources on which that failure occurred will be removed from subsequent consideration. To achieve this a per partition cache file with the sites is main -tained. The log files are checked to see which job in the partition failed, and the site at which the failed job was run is taken out of the list of runnable execution pools that is given to Pegasus on retry. Thus Pe-gasus generates the plan again for that partition and ignoring the problem site.

2.3.4 Site SelectionSite selection techniques and api were improved on. A round robin site scheduler was incorporated into Pegasus in addition to the existing Random based site scheduler.

A non-java callout site selector has also been incorporated, that allows Pegasus to interface with any ex -ternal site selector. The callout involves writing out of a temporary file containing the relevant job de-scription and grid parameters (execution pools, grid ftp servers) , that is parsed by the external site selec-tor . The external site selector writes out the solution on it’s stdout , which is piped back to Pegasus.

2.3.5 SimulatorA workflow simulator was developed on top of NS network simulator to test various workflow schedul-ing heuristics. This simulator has been developed with the idea of testing various scheduling techniques before their incorporation into Pegasus. We have currently evaluated min-min, max-min and suffrage heuristics within the simulator. In addition, an AI-based search technique (GRASP) is being evaluated.

The simulator allows the user to specify sites, the hosts making up the sites, the compute powers of the hosts and the bandwidths between the sites. For detailed simulations, one can either use ns routing during the simulation to transfer data between sites or use average bandwidth data obtained from previous runs to estimate the runtime bandwidths.

10

2.3.6 Transformation CatalogA new Transformation Catalog (TC) has been implemented using the MySQL database. Schemas to sup-port Postgres and OtherDB will be provided by UChicago. The other implementation is for a file based TC. A common tc-client has been implemented which ca work with any TC implementation (stock or user provided) as long as they stick to the TC Api's. Operations supported are basic add/update mecha-nism to add information about transformations, profiles associated with transformations and the type and architecture and OS. Also operations to delete and query the TC are provided.

2.3.7 Performance tests and tuningISI has been tuning the condor pool setting using grid-monitor and changing the condor submission rates to see what and if any overhead can be removed.

Pegasus performance tests have not been done as yet but improvements have been made slowly and steadily like adding BULK RLS LFN and attribute queries to decrease query time to the RLS substan -tially for huge workflows. Also RLS mechanism jobs have been moved from the vanilla universe to the scheduler universe to allow them to run much faster.

A full JAVA profiler needs to be run on a huge workflow with Pegasus to identify other hot spots in the code if any but this requires a machine with lots of memory to see all the states.

Also discussions have been done with UofC to set up nightly CVS validating tests with real examples from ATLAS/Montage/SDSS/LIGO

2.4 Research on Data placement planning.The research effort on this topic at Chicago considered scheduling and planning algorithms for large jobs which need large input data files, as is typical in high energy physics experiments. Our studies (both sim-ulation based and on a Grid testbed) indicate that decoupling replication and scheduling strategies works to efficiently use Grid resources and decrease response times of users. In our design for an architecture to support this model of data placement planning, three elements manage the distribution of jobs and data files. Each user is associated with an External Scheduler (ES), and submits jobs to that ES. The Grid can have any number of ES's and different combinations of users and ES's give rise to different configura-tions. The ES then decides which remote site to send the job to depending on the scheduling algorithms being tested.

Jobs submitted to a site are managed by the sites Local Scheduler (LS). The LS decides which job to run on which local processor, depending on the local scheduling policy/algorithm. Each site also has a Data Scheduler (DS), which manages the files present at that site. The DS decides when to replicate data files to remote sites (depending on a replication policy). It also decides when and which files to delete from the local storage.

We have implemented the architecture described above on a large Grid testbed (Grid3) by both imple-menting new modules and using the Globus infrastructure for basic services like replica location and au-thentication. A simulator, ChicSim, was also constructed to evaluate different Grid scheduling and repli-cation strategies. ChicSim is a modular and extensible discrete event Data Grid simulation system built using <http://pcl.cs.ucla.edu/projects/parsec/>Parsec (a discrete event simulation language). A wide vari-ety of scheduling and replication algorithms can be implemented and their joint impact can be evaluated using ChicSim. Discrete event simulation typically is composed of entities and messages (interactions be-tween the entities). The main entities in our simulation correspond to our view of a grid as three distinct components: the underlying network infrastructure, the actual entities that are part of the grid and applica-tions which run at the entities.

We have designed and implemented a Grid testbed that run many of the simulated experiments on real-life scenarios from the GriPhyN science experiments. This work is based on the execution of a Chimera

11

http://pcl.cs.ucla.edu/projects/parsec/

workload. We have implemented tested and used new modules for file transfer jobs, data movement scheduler, site selector, and a file popularity updater used for both cache management and dynamic repli-cation of files.

We have run a large number of jobs on the Grid3 testbed of 27 sites, using a range of application work-loads, to compare our various data and job scheduling strategies. These results corroborate our earlier simulation results, confirming that it is important for job-placement decisions to include data-location in-formation, among other parameters. We also find that dynamic replication of popular data-sets helps ease load across sites.

2.5 Policy-based resource sharing and allocation mechanismsCatalin Dumitrescu at U of Chicago conducted research on policy-based resource allocation for Grid-en -abled environments. This effort focused on challenging policy issues that arise within virtual organiza-tions (VOs) that integrate participants and resources spanning multiple physical institutions. Its goal is to create techniques for site and VO resource allocation policies to be expressed, discovered, interpreted, reconciled and enforced by, for, and to Grid jobs in order to maximize both local site and distributed VO utilization and value.

Resource sharing within Grid collaborations usually implies specific sharing mechanisms at participating sites. Challenging policy issues can arise within virtual organizations (VOs) that integrate participants and resources spanning multiple physical institutions. Resource owners may wish to grant to one or more VOs the right to use certain resources subject to local policy and service level agreements, and each VO may then wish to use those resources subject to VO policy. Thus, we must address the question of what usage policies (UPs) should be considered for resource sharing in VOs.

Additional work [UPRS, CORC] done during a summer internship at IBM, is closely related to, and con-tinued the GriPhyN research, by examining the relationship between SLAs and usage policies. This work focused on agreement-based resource allocation for Grid-enabled environments, specifically the use of agreement-based resource reservation to provide the guarantees to access certain resources within VOs. This work explored how are such agreements negotiated, deployed and monitored, in order to signal agreement or policy violations.

2.6 Workflow Execution and Data MovementIn the last year, the University of Wisconsin-Madison team has made significant progress on request exe-cution and data movement.

We have continued developing Stork, our tool for data placement. Stork has become significantly more stable. In addition to the graduate student who initially developed Stork, there are two academic staff em-ployees who have contributed to hardening Stork and adding features. This allows progress on the re-search aspects of Stork to be made while simultaneously ensuring that a users have access to stable soft-ware.

Stork has added support for more protocols. It has also improved security in two ways: first, by allowing multiple users to share a Stork server without sharing certificates, and second by using MyProxy to con-trol access to proxy certificates.

Recently, collaborators have been showing increased interest in Stork, both within and without GriPhyN. Within GriPhyn, the Pegasus team has been working with Stork to see if it can be integrated into the workflows generated by Pegasus. Outside of GriPhyN, collaborators from LCG at CERN are investigat-ing CERN to see if it can be used in their workflows. We are hoping to build on the developing momen-tum by continue to support these users and to find new user for Stork.

12

We have continued development of DAGMan, for executing workflows. We have recently had conversa-tions with the Virtual Data System team about enhancements needed to DAGMan, and we have imple-mented an improved retry capability to allow DAGMan to more easily interact with planners.

2.7 Peer-to-Peer Usage Patterns in Data Intensive Experiments The GriPhyN project addresses the data sharing and analysis requirements of large-scale scientific com-munities through a distributed, transparent infrastructure. This infrastructure necessarily includes compo-nents for file location and management as well as for computation and data transfer scheduling. However, there is little information available on the specific usage patterns that emerge in these data-intensive, sci -entific projects. As a consequence, current designs either employ a cover-all approach (provisioning for all possible usage patterns and not optimizing for any), or start from unproved assumptions on user be-havior by extrapolating results from other systems (such as file systems, the web, etc.). If inappropriate, these assumptions may lead to inadequate design decisions or to performance evaluations under unrealis -tic conditions. Research in this area was conducted at the University of Chicago by Foster, Iamnitchi, Ranganathan, and Ripeanu. [IR02]

In [RR+03] we analyzed the performance of incentive schemes in P2P file-sharing systems by defining and applying an analytic model based on Schelling’s Multi-Person Prisoner’s Dilemma (MPD). We use both this framework and simulations to study the effectiveness of different incentive schemes for encour -aging sharing in distributed file sharing systems.

Internet traffic is experiencing a shift from web traffic to file swapping traffic. A significant part of Inter -net traffic is generated by peer-to-peer applications, mostly by the popular Kazaa application. Yet, to date, few studies analyze Kazaa traffic, thus leaving the bulk of Internet traffic in dark. In [LRW03] we present a large-scale investigation of Kazaa traffic based on logs collected at a large ISP, which capture roughly a quarter of all traffic between Israel and US.

The Gnutella network was studied as a potential model for file sharing in large scientific collaborations. The open architecture, achieved scale, and self-organizing structure of this network make it an interesting P2P architecture to analyze. The main contributions of this macroscopic evaluation are: (1) although Gnutella is not a pure power-law network, its current configuration has the benefits and drawbacks of a power-law structure, (2) we estimate the aggregated volume of generated traffic, and (3) the Gnutella vir-tual network topology does not match well the underlying Internet topology, hence leading to ineffective use of the physical networking infrastructure. This research is reported in [RF02]

Further research in P2P resource discovery was reported in A peer-to-peer approach in resource discovery in grid environments, to be presented at HPDC-11, July 2002. Our poster On Fully Decentralized Re-source Discovery in Grid Environments will further explore that issue at the same conference.

In [FI03] we contend that any large-scale distributed system must address both failure and the establish-ment and maintenance of infrastructure.

We propose a novel structure, the data-sharing graph, for characterizing sharing patterns in large-scale data distribution systems. We analyze this structure in two such systems and uncover small-world patterns for data-sharing relationships. We conjecture that similar patterns arise in other large-scale systems and that these patterns can be exploited for mechanism design.

Lastly, continuing important work on replication location service (RLS) from the pre-vious period, we propose a general framework for creating replica location services, which we call the GIGa-scale Global Location Engine (Giggle). This framework de-fines a parameterized set of basic mechanisms than can be instantiated in various ways to create a wide range of different replica location services. By adjusting the system parameters we can trade off reliability, storage and communication over-

13

head, and update and access costs. Matei Ripeanu assisted in the construction and operation of the Giggle world-wide replica location challenge demonstrated at SC'02.

2.8 Resource DiscoveryAdriana Iamnitchi completed her dissertation "Resource Discovery in Large-scale Resource Sharing En-vironments" and received her Ph.D. from the U of Chicago in December 2003.

[SWFS] studies the commonalities between web caches, content distribution networks, peer-to-peer file sharing networks, distributed file systems, and data grids. These all involve a community of users who generate requests for shared data. In each case, overall system performance can be improved significantly if we can first identify and then exploit interesting structure within a community's access patterns. To this end, we propose a novel perspective on file sharing based on the study of the relationships that form among users based on the files in which they are interested.

This work proposes a new structure that captures common user interests in data--the data-sharing graph-- and justify its utility with studies on three data-distribution systems: a high-energy physics collaboration, the Web, and the Kazaa peer-to-peer network. We find small-world patterns in the data-sharing graphs of all three communities. We analyze these graphs and propose some probable causes for these emergent small-world patterns. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and, perhaps most importantly, it suggests ways to design mechanisms that exploit these natu-rally emerging patterns.

In [SWUP] we built on previous work showing that the graph obtained by connecting users with similar data interests has small-world properties. We conjectured then that observing these properties is important not only for system description and modeling, but also for designing mechanisms. In this paper we present a set of mechanisms that exploit the small-world patterns for delivering increased performance. We propose an overlay construction algorithm that mirrors the interest similarity of data consumers and a technique to identify clusters of users with similar interests. To appreciate the benefits of these compo-nents in a real context, we take file location as a case study: we show how these components can be used to build a file-location mechanism and evaluate its performance.

Iamnitichi’s research focused on designing decentralized mechanisms for the management of large-scale distributed systems and on understanding the associated tradeoffs between performance and scalability. Grid and peer-to-peer (P2P) resource sharing environments provide the real-world challenges for this re -search. These two environments appear to have the same final objective-the pooling and coordinated use of large sets of distributed resources-but are based in different communities and, at least in their current instantiations, focus on different requirements: while Grids provide complex functionalities to relatively small, stable, homogenous scientific collaborations, P2P environments provide rudimentary services to large-scale, untrusted, and dynamic communities.

This research explored the convergence area of these two environments: mechanisms that cope with the scale and volatility of peer-to-peer systems while providing complex services typical of Grids. Several relevant problems were explored in this context: resource and file location, replication for data availabil -ity, characterization of system and usage behavior, performance predictions for a numerical simulation application, and fault-tolerant algorithms. In order to deal with large scale and unpredictability, the solu -tions often exploit problem and usage characteristics to improve performance.

2.9 Grid Fault Tolerance and AvailabilityResearch on Grid fault tolerance at UCSD has been concentrating on the infrastructure needed to support fault-tolerance in a Grid services architecture. At first glance, it doesn't seem to be a terribly challenging problem to address: there are commercial products, for example, that support making network services highly available. Looking more closely, there are four aspects to Grid services that need to be considered when high availability is desired:

14

Engineering is an issue. Despite decades of experience demonstrating repeatedly that fault-tolerance, like security, is an aspect of a service that is best considered at the onset of design, it is almost always added after deployment. Luckily, many of the design constraints for scalable services (such as statelessness, which means that any state a service maintains is held in a database) make retrofitting high availability easier.

Grid services are often stateful. This means that there can be information in a server that is not stored in a database. Grid services are stateful in part to avoid the cost of using a database, and also because much of the state is "soft": it can be recovered in the face of a failure. This complicated retrofitting high availabil -ity.

Grid services are often nondeterministic. Services are often nondeterministic for low-level reasons (eg, when interrupts occur), which makes replication hard. But, many grid services are frankly nondeterminis -tic. For examples, resource schedulers often make nondeterministic assignments in order to spread load more effectively.

Grid services are built on high-level protocols. One of the major trends in distributed systems has been the constant raising of the level of abstraction used for gluing RPC like systems together. We keep going higher to make interconnection easier. The current level - SOAP and HTTP - works well because of their use as interconnection of web services. Building at this high a level has an impact on performance (aver -age latency becomes larger and peak bandwidth may be reduced) and on timeliness (worst-case latency is finite but it may be effectively unbounded).

Our status is as follows:

Primary-backup is a logical choice for a stateful, nondeterministic service. It is also amenable to being implemented as a "wrapper" based on interception, and so should be relatively easy to add to an existing service. It has significant drawbacks: it can tolerate only benign failures and requires a bound on the worst-case latency. Building it on top of a high level protocol is worrisome both in terms of correctness and of performance. So, we have designed and built two primary-backup service for Grid services: one using the high-level protocols supplied by Globus, including a notification service, and one using low-level TCP/IP. The former is attractive for retrofitting purposes. We also built a protocol that used an inter-mediate approach: state replication was done at a high level but not using the notification service. Not surprisingly, we found the performance of the high-level approaches worse than the low level approach. However, the most severe performance problems were due to the notification service. This gives hope that tuning might make the high-level approach feasible in terms of performance. We have a paper in the upcoming Clusters conference on this work.

Our treatment of state is naive. Some of the latency in primary-backup has to do with ensuring data is sta -ble before replying to the client (thus avoiding the "output commit" problem). Recall that Grid services are often based on soft state. Hence, data doesn't have to be as stable in a primary-backup service before replying: it only has to be in the volatile memory of the backups rather than written to disk. This implies that some performance gain can be obtained by associating with each datum the recovery semantics it has. If one were to do this for a database replication protocol, then one might be able to use a "stateless" ap -proach for Grid services without losing the performance benefits of having the services be based on soft state. We are currently investigating this approach.

We need to address the issue of asynchronous execution. Our current plans are to use an optimistic ap-proach based on Paxos. Paxos is an agreement protocol that is safe but not live: it terminates only if the system behaves in a timely manner for some period of time. Paxos, like all state machine-based replica-tion approaches, requires all replicas to be deterministic. We believe that we can modify Paxos to work for replicated nondeterminsitic state machines. We will start investigating this approach this Summer.

15

2.10 PredictionThis year the TAMU group has worked extensively with the ISI and University of Chicago groups to pro-vide predictions for identifying appropriate resources with the resource planning tool with GriPhyN. To get good predictions, one must have mechanisms for collecting and archiving adequate performance data. In terms of performance data, it is important to collect this information at three different levels:

1. Applications: need to have performance data about the computations and I/O

2. Middlewares: need to have performance data about the different middleware functions that inter-face to the application as well as the ones that interact with the storage resources.

3. Resources: need to have performance data about the compute, network and I/O resources.

The focus thus far has been on predicting application performance for use with the Pegasus Planner. The current approach that we have developed entails interfacing the Pegasus Planner with the Prophesy Pre-dictor. Prophesy is an infrastructure for performance analysis and modeling of parallel and distributed ap -plications. Prophesy includes three components: automatic instrumentation of applications, databases for archival of information, and automatic development of performance models using different techniques. It is the last component, the model builder, which is used to formulate predictions based upon performance data. This current plan is represented in the figure below.

Figure 4. The relations between Prophesy and Chimera/Pegasus

Pegasus receives an abstract workflow description from Chimera, contacts Prophesy Predictor to get the desired prediction information (this process may be iterative), then produces a concrete workflow, and

16

Chimera

Pegasus

Prophesy

Planner

Predictor

AnotherPerformance

Tool

AnotherPerformance Tool

AnotherDatabase

AnotherDatabase

Virtual data catalog

Gridshell stats fromExecution ENV

submits the concrete workflow to Condor-G/DAGMan [FT01] for execution. Prophesy also provides in-terfaces with other performance tools and databases, so that Prophesy Predictor can make more accurate predictions for Pegasus planner.

Currently, we are working on the Pegasus-Prophesy interface using the GeoLIGO application. This benchmark is used to develop an initial language to allow for different queries to predict the computer re -sources as well as the I/O resources needed for an application. Upon working through the details with GeoLIGO, we will work with the other experiments.

The initial performance data is focused on the following events:

Execution time (real time) and number of processors needed for efficient executions

File access time (combination of access latency and time to get files in place for execution)

Input files

File size and location

Disk requirements

Output & temporary files

The results of this work are described in detail in [PRED].

2.11 GridDB: A Data-Centric Overlay for Scientific GridsIn the 2003-2004 academic year, the GriPhyN group at UC Berkeley generalized their interactive query processing framework from the previous year into GridDB, a full-fledged system for providing data-cen-tric services in scientific grids. The GridDB system integrates ideas from workflow and database systems to address new challenges posed by computational scientists using grid platforms. The GridDB design was informed through a long-running collaboration among various groups inside and outside of GriPhyN: the database group at UC Berkeley; the chimera team at Argonne National Labs; and two groups of scien -tists, HEP physicists at LBNL and University of Chicago and astrophysicists at Fermilab. The work on GridDB has resulted in publications in VLDB’04 and SemPGrid’04. It has also served as the basis of David Liu’s master’s thesis at UC Berkeley. Finally, several presentations and demos of the system have been given, which have opened the door for further collaborations, including an interest in adoption from LSST, an astrophysics project in its embryonic stages.

GridDB is a software overlay built on top of currently deployed “process-centric” middleware such as Condor and Globus (Figure 1). Process-centric middleware provides low-level services, such as the exe-cution of processes and management of files. In contrast, GridDB provides higher level services, ac -cessed through a declarative SQL interface. As a result, users can focus on the manipulation of data gen-erated by process workflows, rather than on the workflows themselves.

17

Figure 5: The GridDB data-centric software overlay

GridDB’s model mirrors that of traditional databases: a user first describes user workloads during a data definition phase enabling a system to provide special services to users. Specifically, a modeler describes grid workloads in the Functional Data Model with Relational Covers (FDM/RC), a model we have de-fined to represent grid workflows and data. In contrast to existing workflow models (e.g., those used by Chimera and Dagman) and data models (e.g., the relational model), the FDM/RC allows the description of both workflows and the data transformed by workflows. By modeling both concepts, GridDB provides a uniform set of services typically found either in workflow systems or in database systems, including: declarative interfaces, type checking, work-sharing and memorization, and data providence. Further -more, GridDB is able to provide novel services such as data-centric computational steering and smart scheduling, which arise from the synergy between workflow and data models.

We have presented GridDB at several venues and workshops, including the Lawrence Livermore Na-tional Lab (LLNL), the Lawrence Berkeley National Lab (LBNL), Microsoft’s Bay Area Research Center (BARC), the 2nd International Workshop on Semantics in Peer-to-peer and Grid Computing (SemPGrid’04), and the GriPhyN 2004 all-hands meeting. As a direct consequence of these talks, we have established a collaboration with the Large Scale Synoptic Telescope (LSST) project, a next-genera-tion astronomical survey requiring real-time and ad hoc analysis of telescope image data.

At this point, we believe that main research ideas of GridDB are mature and we are working on rounding out the functionality of the system and transferring the technology into the hands of grid scientists.

3 VDT – The Virtual Data Toolkit3.1 VDT Development, Deployment, and Support HighlightsThe VDT has made significant strides in the last year. In particular:

The VDT has upgraded many of the software packages, particularly Condor (which is now version 6.6.5) and Globus (which is now version 2.4.3). In addition, the VDT has roughly doubled the number of com-ponents that are included.

The VDT has solidified and strengthened its collaboration with European collaborations, particularly LCG and EGEE. LCG has contributed many bug reports and bug fixes to the VDT, and has proven to be a valuable partner. Collaboration with EGEE is just beginning, but promises to be

18

The VDT was adopted by the PPDG collaboration in the Fall of 2003.

The VDT was installed and on 27 sites for Grid2003 in the Fall of 2003. The VDT team interacted with the Grid2003 team to improve the VDT in reaction to user suggestions, and the team was closely involved in the deployment. Such close involvement is critical to the VDT's success: it not only is useful to the users, but is educational for the VDT team.

Late in 2003, the VDT developed and deployed a nightly test infrastructure, to test both the installation process and the functionality of the VDT. The nightly tests have proven invaluable by catching many er-rors before the releases of the VDT.

The VDT has expanded the ways it can be installed. In addition to the standard Pacman installation, RPMs are now available on three varieties of Linux. Also, instead of a only having a monolithic VDT Pacman installation, users can now install just the portions of the VDT they wish, using the same Pacman mechanism.

During the development of VDT 1.1.13, the VDT provided 21 bug fixes to the Globus Alliance, 15 of which were accepted into the standard Globus distribution. Not all patches were accepted because some components were being replaced with newer components, so the bug fixes didn't always make sense. In addition, the VDT provided four new Globus features as patches, and two of them were accepted.

Collaboration with NMI has increased, and NMI is now central to the build process for the VDT.

The VDT team hired a new member to work with our European collaborations.

3.2 The VDT Build and Test ProcessMost of the VDT components are built using machinery (software and computers) developed and de-ployed by the NSF Middleware Initiative. This picture illustrates how the software is built:

Figure 6: VDT Building Process and Relation to NMI

19

3.2.1 NMI buildNMI builds several components for VDT 1.1.13: Globus, Condor, MyProxy, KX509, GSI OpenSSH, Py-Globus, and UberFTP. NMI checks the software out of the appropriate CVS repositories, and then patches the software. Currently, only Globus is patched. Each of these software packages uses GPT, so GPT source bundles are then created. These are built and tested on the NMI build pool, then given to the VDT.

The NMI build pool is a cluster of nearly forty computers, and it uses Condor to distribute the builds. There are a wide variety of computer architectures and operating systems in the build pool. When a VDT build is started, we currently tell Condor to run the same build on three architectures: RedHat 7, RedHat 9, and RedHat 7 with the gcc 3 compiler. (This last architecture is a requirement from LCG.) After the build, NMI automatically deploys the software and does basic verification testing to ensure that the soft-ware works. After NMI completes the build, the VDT team imports the software into the VDT cache.

There is very close collaboration between the VDT and NMI groups. In fact, one person, Parag Mhashilkar, spends 50% of his time on the VDT and 50% of his time on NMI. He is able to facilitate a close collaboration.

3.2.2 VDT buildThe VDT team builds a few software packages such as the fault tolerant shell and expat. Currently we are working with the NMI team to use the NMI machinery to build this software for us. As we expand the number of architectures that we support, this will save us a lot of time and will eliminate errors.

3.2.3 Contributor buildsSome software in the VDT is built by the contributors that wrote the software. The Virtual Data System (VDS), Netlogger, and Monalisa are three such systems. We trust these contributors to give us good builds of their software.

3.2.4 Final packagingWhen all of the builds have been assembled, the VDT team packages them together. All software goes into a Pacman cache, along with scripts to configure the software during the installation. A subset of the software is turned into RPMs and put onto the VDT web site for download. (We expect this subset to grow larger in the near future, and it will hopefully encompass the entire set of VDT software in the fu -ture.)

3.2.5 TestingAt this point, the VDT team begins rigorous testing. We have a nightly test that tests that the installation went smoothly (nothing failed, all files look good, etc.) and also tests the functionality of the software (we can submit jobs, transfer files, etc.) After a couple of days of good tests, the VDT Testing Group is brought in to do more testing. This is a group that installs the VDT on a wide variety of sites and does both local testing and testing between sites. After about a week of testing, the VDT is released to every-one.

3.3 VDT PlansIn the next year, we look forward to meeting several goals, particularly:

Expanding the number of platforms supported by the VDT, to better meet the needs of our users.

Improve our testing structure to better test the VDT, particularly to test compatibility between versions.

Supporting the emerging web services infrastructure by supporting both Condor 6.7.x and Globus 4.0.

20

http://lcg.web.cern.ch/LCG/

4 GriPhyN Activities in ATLASGriPhyN research for ATLAS is done within the matrix of grid computing projects, the U.S. ATLAS Software and Computing project, and the international ATLAS Collaboration. The GriPhyN-ATLAS group members come from the University of Chicago, Argonne National Laboratory, University of Texas (Northwestern), and Boston University. Work is done closely with PPDG, iVDGL, and the European peer Grid projects, e.g., DataTAG, EDG, and LCG. Over the past year the main areas of concentration were in application performance monitoring (covered earlier in Prophesy section), application packaging/configuration for grid execution, grid portal development for physicist user-interfaces, and pro-totyping of virtual data methods using SUSY Monte Carlo simulations and ATLAS reconstruction data challenges (DC1) as application drivers.

4.1 Application Packaging and Configuration with PacmanPackaging and configuring the ATLAS software so that it is suitable for execution in a grid environment is a major task, but necessary before any research using virtual data techniques can proceed. Work con -tinued here, in collaboration iVDGL (Pacman development, and VDT integration for ATLAS). The Gri-PhyN related work here required designing transformations appropriate for the particular dataset creation task. For the Athena-based reconstruction data challenge, this meant, for example, that scripts which setup the Athena environment (including specification of input datasets and outputs) and input parameters (as specified an accompanying “JobOptions” text file).

4.2 Grappa: a Grid Portal for ATLAS ApplicationsWork continued on the development of Grappa, a grid portal tool for ATLAS applications. Grappa is a Grid portal effort designed to provide physicists convenient access to Grid tools and services. The AT-LAS analysis and control framework, Athena, was used as the target application. Grappa provides basic Grid functionality such as resource configuration, credential testing, job submission, job monitoring, re-sults monitoring, and preliminary integration with the ATLAS replica catalog system, MAGDA.

Grappa uses Jython to combine the ease of scripting with the power of Java-based toolkits. This provides a powerful framework for accessing diverse Grid resources with uniform interfaces. The initial prototype system was based on the XCAT Science Portal developed at the Indiana University Extreme Computing Lab and was demonstrated by running Monte Carlo production on the U.S. ATLAS testbed. The portal also communicated with a European resource broker on WorldGrid as part of the joint iVDGL-DataTAG interoperability project for the IST2002 and SC2002 demonstrations. The current prototype replaces the XCAT Science Portal with an Xbooks JetSpeed portlet for managing user scripts. The work on Grappa was presented at CHEP (Conference on Computing in High Energy Physics) and will be published in the forthcoming proceedings.

4.3 Virtual Data Methods for Monte Carlo Production Research on virtual data methods for ATLAS was driven by three goals:

1. To prototype simple “analysis-like” transformations to gain experience with the VDS system.

2. Use of VDS to perform SUSY dataset creation; these were Monte Carlo simulations which con-tributed to the DC1 dataset.

3. Use of VDS to perform reconstruction of DC1 output.

Several talks on these results were given at ATLAS meetings in the U.S. and at CERN.

We employ and/or interface with grid components, services and systems from a large number of develop-ment projects, with the initial aim of maximizing ATLAS’s use of grid resources in a production environ-ment. An environment is specific to the recent reconstruction data challenge, and use of the Chimera vir -tual data system, was constructed. Figure 7 below shows the major functional components in the current

21

“GCE environment”. The two main components that request user intervention are “submit host” (for the client/physicist) and “computing host” (for the site administrator).

This environment was packaged and documented, based on VDT 1.1.9, Chimera VDS 1.0.3, and ATLAS release version 6.0.3. A set of command line tools were developed to create transformations and load them into a virtual data catalog (VDC); to generate abstract DAGs corresponding to a particular recon -struction task, expressed as “reconstruct this list of partitions (files) from this particular dataset”; and to generate concrete DAGs, and submit the jobs to remote execution hosts using Condor-G.

Before running the GCE reconstruction framework, the user defined a dataset and a list of partitions to re-construct. Input data consisted of Monte Carlo simulation output (Zebra files), and were specified ac -cording to logical file identifiers (LFN) according a particular naming convention. To define the site/lo-cation of corresponding physical file names (PFN) we used the MAGDA file catalog, both with the web interface and client tools. Users also interacted with a production database and “Cookbook” of JobOp-tions files.

Chimera Catalog Host

Transformations Derivations Transformation Catalog

Logging and bookkeeping database options: Cookbook, Silk

Chimera Submit Host VDL/VDC: (install-tr)

DAX: (genmydax) Abstract planning

DAG: (genmydag) Concrete planning

MAGDA

RLS

MDS

pool.config

tc.data

Computing Hosts (Local queue) Output Data

(Pool)

Figure 7: ATLAS Grid Component Environment used for reconstruction data challenge

5 GriPhyN Activities in CMSThe CMS group within the GriPhyN Collaboration includes the California Institute of Technology, the Fermi National Accelerator Laboratory, the University of California-San Diego, the University of Florida, and the University of Wisconsin-Madison. All activities continue to be conducted in close collaboration with the iVDGL, the Particle Physics Data Grid (PPDG), and the US-CMS Software and Computing Pro-ject (USCMSSC).

22

5.1 Data Challenge 04 and Production of Monte Carlo Simulated CMS DataDuring year four—aided by the earlier prototyping, testing and hardening efforts—large scale and con-tinuous production of simulated CMS data was achieved. Relevant components of the grid enabled US-CMS Production system currently include a centralised Production metadata database, known as the RefDB [REFDB2], a metadata manager and Production workflow generator, known as MCRunJob [MCRUN], and a grid job-planner, known as MOP [MOP]. A typical CMS production request requires that many thousands of jobs be created, submitted, and monitored for successful completion. In addition, CMS requires that all sites producing CMS simulated data read their input parameters from the RefDB metadata database and write back to the database a list of values describing how and when the data was produced [REFDB1].

5.1.1 Pre-parallel-post-Challenge Production using Grid3 Production of CMS simulated data was distributed across the resources of Grid3 [GRID3] during year four, helping the production effort to reach new milestones in scale and overall throughput. Grid3 is a consortium of GriPhyN, iVDGL, PPDG, and several experiments, for which GriPhyN provides the grid middleware infrastructure. CMS participation in Grid3 resulted in an additional 50% of simulated events over what was produced by CMS-owned computing resources. Over 15 million events were simulated with a GEANT4 application over a 6 month period (see Figure 1), and peak simultaneous CPU usage reached 1200 CPUs managed by a single FTE. The achieved overall throughput corresponds to approxi-mately one CPU century and 15 TB of data produced. The overall number of files produced was on the order of 45,000.

Figure 8: USCMS simulation throughput. The lower area (red) represents the number of events produced on canonical USCMS resources, whilst the upper area (blue) represents the number of events produced on non-US-CMS resources from Grid3.

5.2 Grid Enabled Analysis of CMS DataActivities in grid enabling the analysis of CMS data continued during year four and, as in previous years, the CMS software environment also continues to evolve at a fast rate. Relevant components for the local analysis of CMS data include the CMS software frameworks COBRA [COBRA], CARF, ORCA, and IGUANA [IGUANA], as well as the LHC data persistency implementation, known as POOL [POOL]. CMS participation in GriPhyN is collaborating closely with PPDG [PPDG] to design, prototype and build

23

grid-enabled systems of distributed services supporting both interactive and scheduled analysis of distrib-uted CMS data.

5.2.1 CMS Analysis Application Activities and StatusOver the past year, the CMS software environment has matured to the level of supporting local analysis of CMS data. As a result, a significant effort is being devoted towards the development of several proto -type data analysis projects, including a Higgs search analysis (Caltech), an exotic-physics search analysis and a standard model Drell-Yan dimuon analysis (both at Florida). These analysis projects serve as ana-lysis benchmarks to define the user-level interaction with a grid-enabled analysis environment. In addi -tion, these activities serve to provide a detailed understanding of the actual application execution environ-ment, metadata management needs for data discovery, and job preparation/submission requirements for local and distributed CMS data analysis.

As part of the recently completed DC04, several complete CMS datasets (also called DSTs or the mis -nomer “Data Summary Tapes”), representing several million fully simulated and reconstructed events and amounting to several gigabytes, were transferred from the CERN Tier-0 site to the Fermilab Tier-1 site and subsequently distributed to the Caltech and Florida Tier-2 sites. Now, in the post-DC04 phase, these datasets are available for analysis by physicists. Prototypes applications of the above benchmark analyses are being developed using the CMS ORCA libraries and have, so far, analysed over 1 million events from DST datasets at local Tier-2 resources.

Using this work, several important lessons and requirements for current CMS Data Analysis are relevant for GriPhyN integration activities and are summarized below:

The CMS POOL metadata model (critically required for managing data access and data discov-ery) is not yet mature and is rapidly evolving. This presents a challenge to the integration of CMS with GriPhyN technology, as understanding the metadata model is crucial for identifying input/output datasets and in building analysis applications to process the data.

The model for CMS low- to mid-level data types is beginning to solidify and currently consist of three types of datasets: Hits, Digis, and DSTs. Derived object collections in a derived dataset contain internal “links” back to the original object collections used in the original dataset. This implies that a “complete” dataset (or sub-dataset consisting of runs or events) is only completely specified when one specifies the corresponding Hits, Digis, and DSTs all-together. Access to all of the above CMS POOL datasets requires applications to be built from ORCA/COBRA libraries. The requirement on the GriPhyN Virtual Data Language (and System) is that a strong notion of “datasets” will need to be supported.

The model for CMS high-level Data Types such as Analysis Object Data (AODs) and TAGs are not yet defined and may be generic enough so as not to require CMS specific ORCA/COBRA lib-raries for access. The extent to which AODs and TAG data contain copies of previously derived data or simply provide internal links back to the external original data is not yet decided. Cur-rently, ROOT Trees are being used in the place of a well defined AOD type. This implies that some degree of uncertainty in GriPhyN integration tasks will persist until such high level datasets become mature.

The CMS Software is composed of a set of libraries and tools, and effectively provides a Soft-ware Development Kit (SDK) and Environment for data analysis. This enables users to build and execute personalised CMS analysis applications in which a significant part of the user analysis activity at the DST level (and possibly the AOD level) is devoted towards developing the analysis application based upon iterations over a small local dataset. Once refined, the application needs to be run over much larger DST datasets in a production like manner across the Grid to produce personal ROOT Trees (or personal AODs when they are available in CMS). The requirement on

24

GriPhyN technology is that the analysis application can not be considered a simple shell script or binary executable, but is rather a user composed complex execution environment.

Users need the ability to easily package, publish and/or distribute their own complete local execu-tion environment (including any and all shared object libraries, environment setup tools, wrapper scripts, etc) of their personal CMS application to grid resources for distributed execution over dis-tributed datasets. Such a package might be considered as an input dataset to a “Transformation” in the GriPhyN Virtual Data Language.

The CMS Software Environment actively supports “reconstruction on demand” which means that if the requested object collection does not persistently exist in the input dataset, it will be instanti -ated by recursively invoking “reconstruction” algorithms on any necessary internal or external (but linked) object collections. This provides the user with significant power and flexibility in defining precisely the reconstructed object collection of interest. It also implies that users will need to take care in specifying which object collections they wish to access: reconstructing a new object collection on demand can take several orders of magnitude longer than simply reading a persistent object collection from disk; indeed, different datasets may require very different re-source usage constraints for the same application. Such differences in performance will strongly influence GriPhyN planning decisions of whether to move the dataset with the job (to exploit compute resources) or to simply schedule the job at the original dataset location (to exploit stor -age resources).

Once the ROOT Trees (or AODs when available) are produced from DSTs (using the user’s spe-cific application) the user needs access to them. If small enough, they can be transported, stored, and analysed locally. If too large, they could be accessed interactively over the wide area net-work or, they could be further refined and reduced so that they are small enough for local storage constraints. Users will desire to have both options.

5.2.2 Grid and Web Services Architecture for CMS Data AnalysisIn parallel to and in close step with the above CMS analysis application work, important and substantial activity continues to be devoted towards designing a distributed, dynamic system of services to support CMS Data Analysis.

5.2.2.1 Initial Prototype at Supercomputing 2003

Considerable work over the past year was devoted towards the deployment of an initial end-to-end system of distributed high-level services supporting grid-enabled data analysis, and resulted in a demonstration at Supercomputing 2003. In particular, the prototype integrated a typical physics user interface client [ROOT], a uniform web-services interface to grid services [CLARENS], a virtual data service [CHIMERA], a request scheduling service [SPHINX], a monitoring service [MONALISA], a workflow execution service [VDT, CONDOR], a remote data file service [CLARENS], a grid resource service [VDT, GLOBUS], and a replica location service [RLS]. For testing and evaluation purposes, the proto-type is deployed across a modest sized U.S. regional CMS Grid Test-bed (consisting of sites in California, Florida, Illinois, and Iowa) and exhibited the early stages of interactive remote data access [GAE], demonstrating interactive workflow generation and collaborative data analysis using virtual data and data provenance [VDCMS], as well as showing non-trivial examples of policy based scheduling of requests in a resource constrained grid environment [QOS]. Using similar prototypes, this work will be used to char -acterize the system performance as a whole, including the determination of request-response latencies in a distributed service model and the classification of high-level failure modes in a complex system. The prototype was based upon a ROOT based data analysis and did not at that time make use of the CMS Software Environment. Later services (see below) are beginning to integrate the rest of the CMS Soft -ware Environment.

25

Figure 9: Diagram showing operation of the CMS Grid-enabled Analysis Prototype at SuperComputing 2003

5.2.2.2 The Clarens Web Services Architecture

Clarens [CLARENS] provides a “portal” for a host of GRID computing services. This portal will take the form of not only a standard web-based application, but also programmatic interfaces to analysis services that physicists use to access and analyze data as part of a dynamic Virtual Organization. Currently a Java and a Python version of Clarens are available.

Clarens acts as the “backbone” for web services that are currently being developed within the context of the Grid Analysis Environment [GAE). After the Supercomputing 2003 conference several (new) core services have been developed within the Clarens Web Service Environment, and deployed on the GAE testbed:

Rendezvous: dynamically discovers services that are deployed on multiple Clarens services within a distributed environment. A javascript front end enables users to search for specific ser-vices, and displays the service interface using the Web Service Definition Language (WSDL).

System management: enables fine granularity access control management and information about services active on a specific Clarens server.

File Access: enables fine granularity access control management and lookup capability on files hosted by Clarens servers.

VO Management.

Part of the Clarens development is providing a web service interfaces to (existing) applications that are used in (or are important for) CMS. Equally important is providing transparent access to these services, not only for applications (by using WSDL interfaces and dynamic discovery), but also to users using graphical front ends (see figure 1).

There are currently 20 known deployments of Clarens: Caltech (5), Florida (4), Fermilab (3), CERN (3), Pakistan (2+2), INFN (1).

26

Work has started to include Clarens (and in a later stage other GAE services) in the official CMS software distribution and the DPE distribution.

Figure 10: Snapshot of graphical front ends to web services within the Clar-ens framework.

5.2.2.3 Collaborative Analysis Versioning Environment Service (CAVES)

The Collaborative Analysis Versioning Environment System (CAVES) project concentrates on the inter-actions between users performing data and/or computing intensive analyses on large data sets, as en-countered in many contemporary scientific disciplines. In modern science increasingly larger groups of researchers collaborate on a given topic over extended periods of time. The logging and sharing of know-ledge about how analyses are performed or how results are obtained is important throughout the lifetime of a project. Here is where virtual data concepts play a major role. The ability to seamlessly log, exchange and reproduce results and the methods, algorithms and computer programs used in obtaining them en-hances in a qualitative way the level of collaboration in a group or between groups in larger organiza-tions. It makes it easier for newcomers to start being productive almost from day one of their involvement or for referees to audit a result and gain easy access to all the relevant details. Also when scientists move on to new endeavours they can leave their expertise in a form easily utilizable by their colleagues. The same is true for archiving the knowledge accumulated in a project for reuse in future undertakings.

The CAVES project takes a pragmatic approach in assessing the needs of a community of scientists by building series of prototypes with increasing sophistication. In extending the functionality of existing data analysis packages with virtual data capabilities these prototypes provide an easy and habitual entry point for researchers to explore virtual data concepts in real life applications and to provide valuable feedback for refining the system design. The architecture is modular based on Web, Grid and other services which

27

can be plugged in as desired. As a proof of principle we build a first system by extending the very popular data analysis framework ROOT, widely used in high energy physics and other fields, making it virtual data enabled.

5.2.2.4 Requirements and analysis for Distributed Heterogeneous Relational Data Warehouses under the CMS context

Progress in studying Distributed Heterogeneous Relations Data Warehouses (DHRD) for CMS continued in year four. In particular, functional requirements on DST and AOD data types deriving from the recent Data Challenge ’04 were studied. One of the conclusions of this work is that a single schema for all DST and AOD data is desirable and conceivably achievable. This would require CMS applications be aware of only one schema, enabling economical access to multiple heterogeneous databases. Finally, a framework of DHRD services is proposed which would provide a coherent system for CMS data analysis services and replication from Tier-0 to Tier-1 to Tier-2 centres.

5.3 Future Plans for CMS Activities in GriPhyN The main goal of the GAE is to provide a distributed analysis environment for the CMS experiment in which users can perform (interactive/batch) analysis based on a transparent view of the resources avail -able. Within the GAE project, Clarens web servers will act as the “backbone” for deployment and devel-opment of web services.

The first phase of the GAE project has focused on the design of a distributed analysis environment archi -tecture, and on creation and deployment of robust “basic” services (see section 5.2.2.2) that can be used to develop “higher level” services.

The second phase focuses on integration of existing CMS software within the GAE. Part of this activity is deploying current GAE services and CMS software on the GAE testbed, to identify what CMS compon-ents can be exposed as a web service. Several of these components are made into a web services: [POOL], [BOSS], [REFDB], [RUNJOB], [SPHINX]. Multiple instances of these web services are being deployed on multiple hosts in the current GAE testbed to provide a robust analysis environment (if one host fails instances of a certain service are available on other hosts). Within the second phase work has also started to integrate monitor information from MonaLisa [MONALISA] into the Clarens environment. The monitoring information will be used in developing “high level” services.

A risk of integration of third party components (in our case CMS related components) is that currently not every application has a stable interface. As a result from time, to time existing web services need to be up-dated to reflect the latest versions of the third party software it interfaces to. The GAE team has identified web service updates as important but also as a risk (as it prevents team member from doing other tasks). It is therefore important to perform “education and outreach” to developers of third party components the GAE team thinks are necessary within the GAE, and to convince third party developers to provide a web service interface to their applications.

The third phase of the GAE project will focus on developing “high level” services. High level services can be described as services that take input from the “low level” passive services and proactive take de-cisions on where to perform an analysis job, when to replicate data, etc… Part of this phase is the devel -opment of an accounting service that allows for “fair” sharing of resources. The accounting service, to-gether with other high level services prevent such scenarios in which one users allocate all resources for a very long time (without the organizations consent)

5.3.1 Application Level Integration During all phases the GAE team conducts and investigate different types of analysis scenarios based on ROOT [ROOT] and ORCA [ORCA]. As a result, efforts will continue in year five to integrate the CMS Analysis Software Environment into the GAE using a “bottom-up” approach. In particular, the bench-mark data analyses which were developed in year four in a local environment will begin to migrate in

28

year five to use GAE services for job submission, remote data access, and scientific collaboration. The work plan below represents the currently envisaged migration from local CMS data analysis to fully dis -tributed CMS data analysis in a collaborative GAE environment:

1. Migrate Benchmark Analysis to GAE

1.1. Demonstrate interactive, remote data analysis capability1.1.1. Distribute signal and background ROOT/AOD samples to different GAE sites1.1.2. Access Root Tree AODs using Root-Clarens Client1.1.3. Optimize selection of signal and rejection of background1.1.4. Apply statistical tools for setting discover potential limits

1.2. Demonstrate integration with CMS Software Environment1.2.1. Integrate ORCA and POOL into GAE1.2.2. Access POOL AODs using ORCA dictionary libs in Root-Clarens Client1.2.3. Optimize selection of signal and rejection of background1.2.4. Apply statistical tools for setting discovery potential limits1.2.5. Perform systematic analysis of results

1.3. Demonstrate collaborative capability1.3.1. Integrate Analysis steps with CAVES and Chimera Virtual Data Service1.3.2. Divide analysis between 2 people1.3.3. Perform analysis demonstrating shared use of data1.3.4. Perform analysis demonstrating shared use of code/libraries1.3.5. Apply statistical tools for setting discover potential limits1.3.6. Perform systematic analysis of results

2. Scale up Analysis on GAE

2.1. Perform analysis on <1% of available CMS Monte Carlo samples2.1.1. Deploy GAE on Testbed

2.2. Perform Analysis on 10% of available CMS Monte Carlo samples2.2.1. Deploy GAE on US-CMS Tier-1 and Tier-2 sites

2.3. Perform Analysis on 50% of available CMS Monte Carlo samples2.3.1. Deploy GAE on Grid3

2.4. Perform Analysis on 100% of available (at the time) CMS Monte Carlo samples2.4.1. Deploy GAE on Grid3 + LCG-N (where possible)

As seen above, the plan includes early deployment and testing of functionality on a small sample of avail-able data with rapid exponential increases in the scale of data access. This ramp up, which will continue beyond year five, is necessary in order to serve the current and anticipated CMS user community needs between now and the start of data taking in 2007 (and beyond).

6 GriPhyN Activities in LIGOLIGO’s effort in FY04 leveraged the previous successes in FY03 in porting the LIGO pulsar analysis pipeline into the experimental Pegasus framework. The focus in FY04 was to evolve the pipeline realized with Pegasus into a production quality analysis pipeline for use in analyzing the data from the LIGO S2

29

science run. In addition LIGO continued to evolve and develop the Lightweight Data Replicator (LDR) for replicating LIGO data to analysis sites in the U.S., Europe, and beyond.

6.1 Building a Production Quality Analysis Pipeline for LIGO's Pulsar Search Using Meta-data-driven Pegasus

As in FY03 the main LIGO application considered this year was the continuous wave or “pulsar search”, which searches the LIGO data for gravitational waves emitted by the gravitational wave equivalent of the well-known electromagnetic pulsars. Search parameters include the position in the sky, the narrow-band frequency range of the expected signal, and the observing time. In particular the GriPhyN-LIGO team fo-cused on a “blind” search looking for unknown sources. This type of search requires analysis over a wide frequency band and large patches on the sky and can be shown to be very CPU intensive and well suited for computation on the grid.

Ideally LIGO scientists would like to submit millions of individual analysis jobs into a large grid and se -lect all the necessary input data automatically using only analysis metadata such as sky position and fre-quency band, rather then having to manage the millions of jobs by locating and tracking by hand each in -dividual data input file.

To support this type of analysis pipeline the pulsar search is composed of several stages (each an applica -tion in itself). The first stage consists of extracting the gravitational wave channel from the raw data and includes some processing and cleaning of the raw data. Next the time series data is cast into the frequency domain by Fourier transforming it into data products called Short Fourier Transforms (SFTs). Each SFT involves approximately one thousand seconds of LIGO data.

Since the gravitational wave pulsars are expected to be found in a small frequency range, the frequency interval of interest is extracted from the wide-band SFTs to create narrow-band SFTs containing only the frequency band of interest. These narrow-band SFTs are used to build the time-frequency image, which is analyzed for the presence of a pulsar signature. If a candidate signal with a good signal to noise ratio is found, it is placed in LIGO's candidate event database. Each of the stages in this analysis can be described uniquely by LIGO-specific metadata. These descriptions were made available to the workflow generation system, Pegasus, which translated them into abstract and/or concrete workflows.

To support a production quality analysis system the LIGO Scientific Collaboration (LSC) developed and deployed the LSC DataGrid Server on LSC computing resources at the Albert Einstein Institute (AEI) in Potsdam, Germany, the University of Birmingham in the UK, Cardiff University in the UK, the California Institute of Technology, and the University of Wisconsin-Milwaukee (UWM). The LSC DataGrid Server is built on top of the GriPhyN-iVDGL Virtual Data Toolkit (VDT) and supplies common Grid services such as a Globus jobmanager and GridFTP server, along with LIGO-specific tools. All together the LSC was able to make over 1400 CPUs available at the beginning of the effort.

In addition to the LSC resources a number of groups around the U.S. granted access to resources for the time period just before and during the SC2003 meeting so that we could try to leverage as many CPUs as possible during SC2003. Other resources included the SuperMike cluster at Louisiana State University, the Condor pool at the University of Wisconsin-Madison, the collective resources available via the Grid3 project, and a smaller Condor pool at the Information Sciences Institute (ISI) at the University of South -ern California. In total for SC2003 the LIGO-GriPhyN effort had access to over 4,000 CPUs.

30

Figure 11: Resources for SuperComputing 2003 Demo

A Pegasus portal was developed from which users could create, execute and monitor jobs on the grid. A user logged on the portal using his DOE X509 credentials stored in a MyProxy server. Each user DN has a profile associated with it where the user can add/delete resource information (Pool Config), configure the properties for Pegasus and provide information about transformations on the grid (Transformation Catalog).

The portal also checked if the user could authenticate to the resources he requested and only submitted jobs to those resources where the user could authenticate.

31

Figure 12: Portal User Profile Page

Two modes of pulsar searches are supported.

A user controlled search where a user inputs search metadata like the alpha and delta parameters (the po-sition in the sky), the start frequency and the frequency band for the search, the GPS start and end times for the data, the instrument whose data is to be used for the search and the step parameter which controls the parallelism to be induced in the workflow.

32

Figure 13: Portal Manual Ligo Job Submit Page

This workflow consisted of 3 transformations:

1. Many extractSFT transformations which took the raw data and extracted a small freq band out of the huge data set. (extractedSFTs)

2. The ComputeSFT transformation which took all of the extractedSFTs and searched for the gravi-tational waves at the user given location and frequency.

3. The notification client which stored the search results into a MySQL database, into the Metadata Catalog Service (MCS) and also displayed it on a 3D visualization client.

Figure 14: Image of the user request DAG.

33

Figure 15: Graph showing user request DAG execution over time.

Figure 16: Graph showing user request DAG execution by host over time.

An automated search capability was also added for all the three instrument data sets (Handford1, Han-ford2 and Livingston1). This option continuously submitted workflows into the grid. Each workflow searched for gravitational waves on a grid of 5x5 points from an area near the galactic core and for fre -quency from 150 to 350 Hz. A user could select a sleep time between each workflow. A perl script read the parameters generated by a parameter generator and constructed the necessary VDL which was saved in the Virtual Data Catalog. Chimera tools were then used to produce an abstract workflow which was then given to Pegasus to construct the concrete DAG and schedule it on the LSC DataGrid.

34

Figure 17: Portal Autogenerated Workflow submission page.

This workflow consisted of only 2 transformations:

1. Many ComputeFstatistic jobs which took the entire raw SFT’s and searched for the gravitational waves for the points in the grid.

2. Many notification jobs which took the ComputeFstatistic results and saved it in the MCS and MySQL database along with showing it on a 3d visualization.

Figure 18: Autogenerated Dag

35

Figure 19: Graph showing autogenerated dag execution over time

Figure 20: Graph showing autogenerated DAG execution by host over time.

36

Figure 21: 3D visualization client.

The portal also allowed users to see the status of the jobs on the grid and statistics such as start and finish time, status information like job finished/active/failed etc. Once the workflow finished successfully two graphs were generated of the jobs in workflow over time and job in workflow by host over time.

37

Figure 22: Portal showing the job monitoring page.

Unfortunately problems in the supporting Grid infrastructure and the implementation of the analysis pipe-line were discovered that prevented the system from meeting the throughput requirements of a production system for LIGO data analysis. In particular it was discovered that the Globus jobmanager in the Globus Toolkit 2.4 could not scale sufficiently and that the I/O details of the analysis pipeline could not be planned with enough fine-grained control using Pegasus to overcome I/O issues.

The Globus jobmanager in the Globus Toolkit 2.4 failed to scale to support hundreds of jobs submitted at a site and either running or waiting in the queue. At most only a few tens of jobs could be submitted into a site and even so the large strain put onto the hardware on which the multiple jobmanagers instances were running made the machine unusable for any other purposes.

Since SC2003 the scaling problem has been studied intensely by a number of groups and while a more permanent solution is being created the Condor group has supplied the Condor GridMonitor, which al-lows jobs that are queued but not yet running to avoid being monitored directly via the Globus jobman-ager. In addition the GridMonitor makes it possible to not have stdout and stderr error for each job streamed back during the job execution but to instead wait until the job is finished.

The LIGO-GriPhyN group tested the Condor GridMonitor after SC2003 using the cluster at UWM and found that it was possible to have 296 jobs running and 700 jobs queued without any noticeable loss of performance on the machine on which the Globus jobmanager and Condor GridMonitor were running. This bodes well for work during FY05 to overcome the obstacles that prevented the implementation of a truly production-quality infrastructure for the pulsar analysis.

Work is also underway within the Pegasus group to allow Pegasus to provide finer-grained control of I/O operations and data movement necessary during the pipeline or workflow execution. For the pulsar pipe-

38

line to run efficiently it is imperative that the narrow-band SFT files be located on a filesystem that is lo-cal to the machine on which the main analysis application will run, or otherwise the entire analysis will be severely I/O limited. During SC2003 Pegasus could only include as part of its workflow planning the transfer of the narrow-band SFT files onto a shared filesystem, and with most (if not all) of the compute resources the group has access to the shared filesystem could not provide a level of performance neces -sary to keep the analysis from being I/O bound.

During FY05 the Pegasus group intends to add functionality that will allow for this more fine-grained control over data movement and will allow data files to not just be placed at at site on a shared filesystem but will allow for the distribution of files to other (usually local) filesystems. The GriPhyN-LIGO group is also taking another approach to solving the issue by enhancing the pipeline implementation so that the jobs that Pegasus plans and which are submitted into each site are so-called “cluster jobs”. These cluster jobs will be realized as a Condor DAGman DAG that includes jobs to distribute the narrow-band SFTs onto the local filesystems of the cluster.

It is important to note that although a production-quality Grid-enabled pipeline for the pulsar analysis was not achieved during FY04 the limitations that were exposed have proven to be a crucial part of the process of making the infrastructure more robust and allowing the effort to be continued for FY05. In ad-dition the full deployment of basic Grid services on LSC resources (which now includes substantial re -sources at the LIGO observatory sites, MIT, and PSU) and the realization of the LSC DataGrid was ac-complished in support of the FY04 goals. The necessary infrastructure pieces are now deployed within the LSC DataGrid to allow more applications and scientists to begin to leverage more LSC (and other Grid) resources and generate more scientific results.

6.2 Lightweight Data Replicator During FY04 work continued to enhance and further deploy the Lightweight Data Replicator (LDR) around the LSC DataGrid.

The major enhancement during FY04 was the design, development, testing, and integration of a new metadata scheme for LDR. A new database table schema with better indexing and dramatically better scaling replaces the single monolithic metadata table in previous versions of LDR. The new schema was strongly influenced by the schema used in the Globus Metadata Catalog Service (MCS) currently under development at ISI, and the LIGO-GriPhyN group expects in the future to incorporate MCS into LDR.

Further metadata enhancements include a new client/server pair for propagation of metadata as it is pub-lished at one site to other sites within the LSC DataGrid running LDR. The client and server use SOAP over http to pass only new metadata updates since the last update. These changes in the metadata handling dramatically increase the performance of LDR and lower the load on the server hardware.

A second major enhancement was the design, development, implementation, and deployment of LDR-dataFindServer, which allows users and their applications to query LDR using metadata to obtain the lo-cations of data files. Two client applications have been developed and deployed for querying the server. The first is LSCdataFind, a simple command-line tool for generating lists of file locations (URLs). The second is framequery, a module that can be used with Matlab to locate and then read into Matlab the ap -propriate LIGO data.

LDR is now deployed at

Albert Einstein Institute in Potsdam, Germany

Australia National University

Cardiff University

California Institute of Technology (two installations)

39

LIGO Livingston Observatory (two installations)

LIGO Hanford Observatory (two installations)

Massachusetts Institute of Technology

Pennsylvania State University (two installations)

University of Wisconsin-Milwaukee

During FY04 LIGO used LDR to replicate LIGO reduced data sets to four of the above sites directly from the LIGO observatory sites at Livingston, LA and Hanford, WA in real time with an average six hour la-tency.

The current LDR network within the LSC contains more then five million logical filenames (LFNs) and well over 30 million physical filenames (PFNs). LDR uses the Globus Replica Location Service (RLS) for cataloging LFNs and PFNs and at this time LDR and the LSC DataGrid maintain the largest produc -tion installation of a RLS network. The LSC DataGrid and LDR use case has been an important contribu-tor to RLS development and planning for FY05.

7 GriPhyN Activities in SDSSJohns Hopkins University expanded the SDSS distributed cluster with 40 nodes to the existing cluster at Fermilab to support the Grid 3 experiment. An additional 20 nodes at JHU are dedicated for a distributed database experiment.

The SDSS project has three applications that benefit from grid computing. Galaxy cluster finding, estima-tion of photometric redshift, and the so called "Southern Coadd Experiment." The COADD can reach ob-jects 2 magnitudes fainter (Z <= 0.7) than a normal SDSS object. This is accomplished by combining 10-20 nights of images of the same patch of the sky to create a single synthetic deep image. These objects with cosmologically interesting redshifts provide further insight for galaxy cluster analysis, as well as data for weak lensing analysis. By using Grid3, SDSS experiments create science data 5 times faster than by using only the local cluster. The nature of the experiment is such that the need for computation is in “bursts”, that it there are short periods of very high level of activity.

This period is marked by the developments of new techniques applied to the galaxy cluster finding algo-rithm. These methods involve reducing the traffic created by moving large volumes of data around, when it is more feasible to partition the data a beforehand. We do this by employing a distributed database across a local cluster, and by moving some of the functions inside the database. When large flat files need to be transported to local clusters for processing the overhead of data replication often approaches or even exceeds the resources needed for the computation alone. Under these circumstances techniques that in-volve partitioning the data and moving some of the computation to the data have proven to be useful. The basic idea is that in a distributed database, several servers maintain separate tables that contain data be -longing to different regions in the sky with only a slight of nodes overlap. Efficient indexing strategy is implemented using the HTM (Hierarchical Triangular Mesh), and is directly exploited by the database en-gine. With the workload well balanced among several servers a modified cluster finding algorithm can run about 20 times faster than previously.

For the classical cluster finding technique we tried two modes of operation. The coarse grain mode uses few very large files, while the fine grain mode uses a large number of small data files. What we found that coarse grain jobs had better success in completing successfully than the fine grain ones. A similar ob-servation can be made about data transfers: few large files seem to reach their destination with greater success than many small files. This suggests, that our Grid as it is now, doesn't lend itself for data-inten-sive jobs. Bulk mode, where many jobs are submitted at once, is limited by manual interventions, and se-quential job submissions were limited by the number of capable and available gatekeeper hosts. Ideally, bulk mode would only report back when the jobs were finished, but the disk-storage utilization went be -

40

yond the capabilities of many sites. Sequential job submission involves waiting for one job to finish be -fore submitting another, but then this removes the advantages of exploiting the natural parallelism that the Grid offers. As we have demonstrated, a scheme using distributed databases can yield better throughput and be more robust. We plan on using this approach to other applications, such as the grid-ready version of the photometric redshift estimation calculations

8 Education and Outreach 8.1 New UT Brownsville Beowulf SystemM. Campanelli and students, Santiago Pena, J. Zamora and S. Morriss, have been working on the design and construction of a new Beowulf cluster ‘Funes’ for research in gravitational wave source simulations. The cluster was sponsored by NASA (80%), and iVDGL (20%). The new 40 node cluster is to be built by the end of summer 2003. This cluster will be composed of 67 Pentium Xeon 3.8 Ghz dual processor nodes, 8 Gigabytes of RAM/node, 120 Gigabyte hard drive/node, and connected to a Mirinet network.

8.2 VDT software installation and Grid DemosDuring Fall 2003, UTB students J. Zamora and S. Morriss updated the VDT software installation on the UTB Beowulf cluster ‘Lobizón’. Lobizón is a 96-node Linux Beowulf cluster with Pentium III Intel pro -cessors, 800 MHz, 512 Mb or RAM, and 40 GB of storage each node has been built at UT Brownsville thanks to NSF support. Lobizón is currently used for Gravitational Wave data analysis and as a test-bed for grid computing software (Condor, Globus, etc) and other (to-be-developed) GriPhyN virtual data tool-kits. UTB participated during a live LIGO/GriPhyN grid testbed at the SuperComputing 2003 conference in Phoenix Arizona.

8.3 Portal DevelopmentJose Zamora has implemented a Condor-based portal prototype, based on the Chimera-based Portal de-veloped by Yong Zhao and Mike Wilde (U. Chicago). The development of a visual method of submitting jobs to any application has been motivated by growth of the number of users to include members with lit-tle knowledge on the subject or little time to deal with the installation hassles. Developing a web portal would be essential to UTB grid users. The main benefit of this is mobility: taking advantage of the om -nipresence of the World Wide Web. Now a user can log in on a regular browser and check his work without the need to be working on a computer with the software installed or some specific operating sys-tem running, the main requirement would be having a web browser. The web portal will have all the nec-essary options available to the user, therefore the user will not have to consult a manual. Some help and .tips. links will be available to provide the users with some examples.

8.4 Grid applications in gravitational wave data analysisA UTB graduate student, Santiago Pena, who is currently under M. Campanelli’s supervision, has been working on gravitational wave data analysis of signals from extreme mass-ratio inspirals, which are im-portant potential sources for the future space-based detector LISA (Laser Interferometer Space Antenna), a NASA-ESA mission scheduled for launch around 2011. LISA will detect gravitational waves from a va-riety of sources. One of the important sources will be inspirals of compact objects (white dwarfs, neutron stars, or stellar-mass black holes) into supermassive black holes residing in the cores of galaxies. The pa-rameter space for such capture events is enormous. Each inspiral is characterized by 17 unknown parame-ters. Consequently, we will need a huge bank of templates (perhaps ~1050!) to dig gravity wave signals out of the noise of the LISA detector. Since the parameter space is so huge one needs to use Monte-Carlo search methods using random sampling. Such a search in a large parameter space can be most efficiently performed with large numbers of jobs submitted to a production Grid.

41

8.5 Teacher Assessment WorkshopA Needs Assessment meeting and Developers' Workshop took place in Miami on January 29–30, 2004, hosted by Florida International University [NEEDS]. The meeting was sponsored The highlights of the meeting included

Educators, Computer Scientists and Scientists discussed design implications related to Educators' interest in classroom investigations using ‘real’ data.

Educators' technology infrastructure and access survey indicated that 94% of schools have inter-net access and >40% of these have enough capabilities to use Grid tools. Some educators are in-terested in contributing to developers' resources.

Representatives of current online resources e.g. JLab, SDSS, LIGO, ATLAS, CMS, NALTA, PPPL, FermiLab and Quarknet gave a summary of the current status of their projects.

Future activities suggested include using the cosmic ray project website as a model for data driven enquiry based online resource.

8.6 Grid Summer WorkshopThe Grid Summer Workshop took place at the University of Texas at Brownsville (UTB) facility located at the South Padre Island June 21–25 2004. The workshop, the first of its kind in the U.S., was in the form of a tutorial teaching advanced Grid techniques and was targeted exclusively for an audience of advanced undergraduate and early graduate students.

The response to the Workshop announcement was overwhelming. The target class size was ~35, but a large number of applications were received from all over the world. Selection of candidates from a pool of almost equally qualified and deserving students was a difficult task. The final list of candidates in -cluded 36 students from 19 universities across the U.S. plus Canada, Mexico and Russia. The class was a well-balanced mix of students with Physics and Computer Science backgrounds. There were 10 women in the class. Among the U.S. universities, there were three Minority Serving Institutions and approximately one fourth of the total number of students belonged to minority groups.

Mike Wilde from Argonne National Laboratory was the Curriculum Director for the course. The curricu-lum committee for the Workshop consisted of scientists from iVDGL and GriPhyN collaborations includ-ing Globus, Condor, VDT and other subgroups. Soma Mukherjee from the University of Texas at Brownsville was the Chair of the Local Organizing Committee that comprised scientific and administra-tive staff from the Department of Physics and the Center for Gravitational Wave Astronomy (CGWA) of UTB.

We have been receiving spontaneous feedback from the participants and lecturers – all of it very positive and encouraging. Suggestions for further improvements in logistics were made, for example, using a re-mote, managed cluster rather than installing a room full of machines. We believe that these and other ideas will enable us to run the workshop more smoothly next summer and allow additional workshops to be developed for other audiences, e.g., postdocs and other young scientists.

The workshop was a very encouraging and enriching academic experience for everyone. The students gained a lot from this event - in addition to getting exposure to the world's leading experts in Grid Com-puting, they also received enough training to be able to work on Grid enabled data analysis and other re -lated projects in the future. The workshop also had a strong impact towards building up a network of fu-ture Grid developers' and users' community.

From the Education and Outreach point of view, the workshop has had a tremendous success. In a place like Brownsville, where 90% of the students are of Hispanic origin, such an event generated great interest and exposed the local students to the newly emerging field of distributed computing. It generated aware-ness among the local population through televised coverage of the event and helped boost the confidence

42

of the local students. It also opened new channels of communication with experts and future career op-tions for many of them.

A website on the First Grid Summer Workshop will soon be made public, where we intend to put up all lectures, exercises etc. along with the workshop information, reports/photos and other facts. A prelimi -nary website for the course materials has been set up at http://www.mcs.anl.gov/~wilde/summer-grid.

The grants that supported this event are:

Grid Summer Workshop (NSF), Award Date: June 30 2004, Award ID: PHY-0437628, PI: Soma Mukherjee, Amount: $25,000

Center for Gravitational Wave Astronomy (NASA), PI: W. Anderson, M. Campanelli, M. Diaz, C. Lousto, J. Romano, Amount given: $22,000

GriPhyN (NSF), PI: P. Avery

iVDGL (NSF), PI: P. Avery

The depth of the organization of the Grid Summer Workshop is shown by the following list of contribu-tions:

Curriculum Direction: Mike Wilde (Argonne National Laboratory)

Lecturers: Paul Avery (University of Florida, Gainesville) Gabrielle Allen (Louisiana State University, Baton Rouge) Pradeep Padala (University of Florida) Charles Bacon (Argonne National Laboratory) James Frey (University of Wisconsin, Madison) Mike Wilde (Argonne National Laboratory)

Curriculum development, exercises and material assistance: Jorge Rodriguez (University of Florida, Gainesville) Alain Roy (University of Wisconsin, Madison) Scott Koranda (University of Wisconsin, Milwaukee) Tom Garritano (Argonne National Laboratory) Andrew Zahn (University of Chicago) Robert Quick (Indiana University) Jens Voeckler (University of Chicago) Sean Morris (University of Texas at Brownsville)

Computer Staff: Scott Gose (Argonne National Laboratory) Charlie Torres (System administrator, University of Texas at Brownsville) Jose Zamora (University of Texas at Brownsville) Ariel Martinez (University of Texas at Brownsville) Alan Farrell (University of Texas at Brownsville) Anthony Loken (University of Texas at Brownsville) Dr. Fitraulla Kahn (University of Texas at Brownsville)

Administrative Staff: Ms. Martha Casquette (Student Coordinator) Ms. Danuta Mogilska (Site Coordinator) Ms. Leslie Gomez Ms. Rosie Flores

43

http://www.mcs.anl.gov/~wilde/summer-grid

Ms. Olga Aguillon Ms. Karla Allala

8.7 QuarkNet – GriPhyN Collaboration: Applying Virtual Data to Science EducationDuring this period, significant effort was devoted to applying GriPhyN virtual data tools to assist in the QuarkNet project’s mission of providing hands-on learning experiences in high-school physics education. GriPhyN project members were actively engaged in the construction and deployment of a QuarkNet por -tal, based on the GriPhyN “Chiron” Virtual Data Portal, to enable students across the country to capture and upload data from QuarkNet cosmic ray detectors, annotate the data with metadata, and perform com-putations on these datasets that were expressed and developed in the Virtual Data Language (VDL). A few screen samples of this project are shown below:

Figure 23: QuarkNet “Detector Performance Study”

44

Figure 24: QuarkNet virtual data Workflow visualization for HEP training example.

8.8 A New Initiative: Understanding the Universe Education and Outreach (UUEO)The continuing promise of Grid-based information technologies in providing scientific and engineering disciplines a powerful infrastructure for developing and deploying new learning strategies and education/outreach tools is becoming widely recognized. These considerations led to a summit meeting on education and outreach was held in Arlington on April 8, 2004. The meeting, sponsored by the Gri-PhyN/iVDGL and QuarkNet Outreach efforts, brought together over 40 education and outreach leaders and NSF officials to explore the idea of linking efforts within a common framework that we call “Under -standing the Universe.” UUEO (UU Education and Outreach) member projects are listed on the UUEO website [UUEO].

This meeting explored how education/outreach could benefit from the synergistic integration of three re -search frontiers:

Physical Science Research Computer Science Research and Technology Research on the Science of Learning and Teaching

Individuals and teams from the physical science, computer science and education communities came to-gether to discuss the benefits of forming a consortium that could take advantage of connections among education and outreach efforts targeting both formal education (K-16) and informal outreach (museums and other forms of public outreach). Attendees noted that such a broad coalition would be well poised to develop and deploy advanced learning environments that take advantage of advanced information tech-nologies such as web portals, Grid infrastructures, databases and high-speed networks as well as powerful new Grid-based “virtual data” tools.

While many collaborative models are possible, preference is being given to a federated approach, in which the education, computer science and physical science communities maintain their separate identi -ties and agendas while working together within a common framework, holds great promise for promoting

45

and supporting collaborative learning. By collaborating and using advanced information technologies, UUEO members can create an environment where scientists, researchers, educators and students can con-tribute knowledge, skills and resources and continue to expand the store of human knowledge.

The consortium will offer secondary teachers and students direct access to experimental and simulated data and analysis tools to explore scientific questions. These data will come from detectors including large-scale accelerator and non-accelerator physics experiments, ground and space-based astrophysics ex-periments, fusion research, and from new theoretical and phenomenological models, to arrays of student-built detectors to study cosmic ray air showers in high schools networked worldwide.

To carry out this program, we plan to build a new e-LAB and we will carry out a number of specific ac -tions over the next year:

1. Continue developing the QuarkNet – GriPhyN pilot project, including documentation

2. Hold needs assessment workshops to develop ideas in traditional and non-traditional areas of education

3. Develop an online portal, create new prototypes and acquire funds to support the consortium

4. Sustain the consortium through additional meetings and take advantage of new NSF-supported projects such as RSVP, VERITAS and ICECUBE.

8.9 Milestones for year 2004 With the final goal of developing an interactive GriPhyN web tutorial, UTB students will keep

updating their documentation for the VDT client installation and the grid certificate procedure. They will also include animated examples about the data flow and demos to show the concept of a functional grid in the Web site

Danka Molgiska of UTB has been working closely with the Enlace project [ENLACE], whose primary goal is to create a community partnership among higher education institutions, local school districts, and local communities to support Science Education Reform.

On September 11, 2003 Enlace project participated in organizing The Rio Grande Valley Re-gional Science Fair “gear up” meeting. The GriPhyN/iVDGL project took this chance to intro-duce the grid computing topic on the forum of high school and middle school science teachers. We discussed with the teachers the possibilities of the future collaboration between them and UTB, and distributed the Data Grids Newsletter 2002. The majority of the 21 participated teach-ers weren’t familiar with grid computing but most of them admitted that the computer and science oriented students will be extremely interested in exploring this area closer.

On September 27, 2003 Enlace hosted another event focusing on Science Fair promotion. It was a great opportunity for us to meet the teachers, students and parents, and show them how the cyber infrastructure can help in exploring the scientific experiments.

On February 18, 2004 the Office of New Student Relations at UTB extended an invitation to all academic departments to participate in UTB/TSC Admissions and Financial Aid Fair. The fair was designed to provide increased access to enrollment services and important information about UTB/TSC to prospective students from the surrounding area. We took this chance to introduce the grid computing to science and computer oriented prospective UTB students.

In August 13, 2004, an article about grid computing was published in ‘L'espresso’, an important Italian magazine, where Dr. M.Campanelli reported about the goals of GriPhyN/iVDGL. Fabrizio Gagliardi (DataGrid) was also featured in that article.

46

9 Publications and Documents Resulting from GriPhyN Research1. Computation Scheduling and Data Replication Algorithms for Data Grids, K. Ranganathan, I.

Foster, In Grid Resource Management, Kluwer Publishing, 2003, GriPhyN 2004-68

2. The Grid: Blueprint for a New Computing Infrastructure, 2nd Edition, I. Foster and C. Kessel-man (Editors), ISBN 1558609334, Hardback, 700 pages, Morgan-Kaufmann, 2004, GriPhyN/iVDGL 2004-67

3. Configuration Monitoring Tool for Large-Scale Distributed Computing, Yujun Wu, Bockjoo Kim, To be published in Nuclear Instruments and Methods in Physics Research A, 2004, Gri-PhyN/iVDGL 2004-66

4. Explicit Control in a Batch Aware Distributed File System, John Bent, Douglas Thain, et.al., Proceedings of the First USENIX/ACM Conference on Networked Systems Design and imple-mentation, San Francisco, CA, March 2004, GriPhyN/iVDGL 2004-65

5. Building Data Pipelines for High Performance Bulk Data Transfers in a Heterogeneous Grid Environment, Tevfik Kosar, George Kola and Miron Livny, Technical Report CS-TR-2003-1487, University of Wisconsin-Madison Computer Sciences, August 2003, GriPhyN/iVDGL 2004-64

6. A Framework for Self-optimising, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment, Tevfik Kosar, George Kola and Miron Livny, Proceedings of 2nd Int. Symposium on Parallel and Distributed Computing (ISPDC2003), Ljubljana, Slovenia, October 2003, GriPhyN/iVDGL 2004-63

7. A Client-centric Grid Knowledgebase, George Kola, Tevfik Kosar and Miron Livny, To appear in Cluster 2004, San Diego, CA, September 2004, GriPhyN/iVDGL 2004-62

8. Stork: Making Data Placement a First Class Citizen in the Grid , Tevfik Kosar and Miron Livny, In Proceedings of 24th IEEE Int. Conference on Distributed Computing Systems (ICDCS2004), Tokyo, Japan, March 2004, GriPhyN/iVDGL 2004-61

9. Run-time Adaptation of Grid Data-placement Jobs, George Kola, Tevfik Kosar and Miron Livny, Proceedings of Int. Workshop on Adaptive Grid Middleware (AGridM2003), New Or-leans, LA, September 2003, GriPhyN/iVDGL 2004-60

10. A Fully Automated Fault-tolerant System for Distributed Video Processing and Off-site Repli-cation, George Kola, Tevfik Kosar and Miron Livny, The 14th ACM International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV 2004), Kin-sale, Ireland, June 2004, GriPhyN/iVDGL 2004-59

11. Profiling Grid Data Transfer Protocols and Servers, George Kola, Tevfik Kosar and Miron Livny, To appear in Euro-Par, Pisa, Italy, 2004, GriPhyN/iVDGL 2004-58

12. Data Grid Management Systems, Moore, R.W., Jagatheesan, A., et.al., Proceedings of the 21stIEEE/NASA Conference on Mass Storage Systems and Technologies (MSST), College Park, Maryland, USA, April 2004, GriPhyN/iVDGL 2004-57

13. Grid Data Management Systems & Services, Arun Jagatheesan, Regan Moore, Norman W. Pa-ton, Paul Watson, Proceedings of the VLDB (Very Large Data bases conference, Berlin): 1150, 2003, GriPhyN/iVDGL 2004-56

14. Grid Data Management Systems & Services, Arun Jagatheesan, Reagan Moore, et.al., VLDB 2003 (Very Large Data bases conference, Berlin): 1150, December 2003, GriPhyN/iVDGL 2004-55

47

15. End-to-end Quality of Service for High-end Applications, I. Foster, M. Fidler, et.al., Computer Communications 27(14):1375-1388, 2004, GriPhyN/iVDGL 2004-54

16. Concepts and Architecture, I. Foster, C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, 2nd Edition, Morgan-Kaufmann, 2004, GriPhyN/iVDGL 2004-53

17. Resource and Service Management, K. Czajkowski, I. Foster, C. Kesselman, The Grid: Blue-print for a New Computing Infrastructure, 2nd Edition, Morgan-Kaufmann, 2004, GriPhyN/iVDGL 2004-52

18. Data Integration in a Bandwidth-Rich World, I.Foster, R. Grossman, Comm. ACM, 46(11):51-57, 2003, GriPhyN/iVDGL 2004-51

19. Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids, K. Ran-ganathan, I. Foster, Journal of Grid Computing, 1(1), 2003, GriPhyN/iVDGL 2004-50

20. The Grid in a Nutshell, I. Foster, C. Kesselman, Grid Resource Management, Kluwer Publish-ing, 2003, GriPhyN/iVDGL 2004-49

21. Grid Service Level Agreements, K. Czajkowski, I. Foster, et.al., Grid Resource Management, Kluwer Publishing, 2003, GriPhyN/iVDGL 2004-48

22. A Peer-to-Peer Approach to Resource Location in Grid Environments, A. Iamnitchi, I. Foster, Grid Resource Management, Kluwer Publishing, 2003, GriPhyN/iVDGL 2004-47

23. Incentive Mechanisms for Large Collaborative Resource Sharing, Kavitha Ranganathan, Matei Ripeanu, et. al., The International Symposium on Cluster Computing and the Grid (CCGrid-04), 2004, GriPhyN/iVDGL 2004-46

24. Connecting Client Objectives with Resource Capabilities, A. Dan, C. Dumitrescu, M. Ripeanu, Submitted to ICSOC2004, 2004, GriPhyN/iVDGL 2004-45

25. Chiron: A Grid Portal for Virtual Data Discovery and Integration, Yong Zhao, Mike Wilde, Ian Foster, Unpublished, GriPhyN/iVDGL 2004-44

26. To Share or not Share: An Analysis of Incentives to Contribute in Collaborative File Sharing Environments, Kavitha Ranganathan, Matei Ripeanu, Ankur Sarin, Ian Foster, Workshop on Economics of Peer-to-Peer systems, Berkeley, CA, June 2003, Berkeley, GriPhyN/iVDGL 2004-43

27. Dynamic Creation and Management of Runtime Environments in the Grid, Kate Keahey, Matei Ripeanu, Karl Doering, Workshop on Designing and Building Grid Services, GGF-9, , Chicago, IL, October 2003, GriPhyN/iVDGL 2004-42

28. Cache Replacement Policies Revisited: The Case of P2P Traffic, Adam Wierzbicki, Nathaniel Leibowitz, et.al., 4th GP2P Workshop, Chicago IL. April 2004, GriPhyN/iVDGL 2004-41

29. Deconstructing the Kazaa Network, Nathaniel Leibowitz, Matei Ripeanu, Adam Wierzbicki, The 3rd IEEE Workshop on Internet Applications (WIAPP'03), San Jose, CA, June 2003, GriPhyN/iVDGL GriPhyN/iVDGL 2004-40

30. Small-World File-Sharing Communities, Adriana Iamnitchi, Matei Ripeanu, Ian Foster, INFO-COM 2004, Hong Kong, March 2004, GriPhyN/iVDGL 2004-39

31. Fault-tolerant Grid Services Using Primary-Backup: Feasibility and Performance, Xianan Zhang, Dmitrii Zagorodnov, et.al., To be published at The 2004 IEEE International Conference on Cluster Computing, in San Diego, September 2004, GriPhyN/iVDGL 2004-38

32. A Metadata Catalog Service for Data Intensive Applications, Gurmeet Singh, Shishir Bharathi, et. al., SuperComputing 2003, Phoenix Arizona, November 2004, GriPhyN/iVDGL 2004-37

48

33. Grid-Based Metadata Services, Ewa Deelman, Gurmeet Singh, et.al. , SSDBM04, Santorini, Greece, June 2004, Jun-04, GriPhyN/iVDGL 2004-36

34. Artemis: Integrating Scientific Data on the Grid, Rattapoom Tuchinda, Snehal Thakkar, et. al., Sixteenth Innovative Applications of Artificial Intelligence, San Jose, CA, July 2004, GriPhyN/iVDGL 2004-35

35. Pegasus and the Pulsar Search: From Metadata to Execution on the Grid , Ewa Deelman, James Blythe et. al., Applications Grid Workshop, PPAM 2003, Czestochowa, Poland Septem-ber 2003, GriPhyN/iVDGL 2004-34

36. Pegasus : Mapping Scientific Workflows onto the Grid, Ewa Deelman, James Blythe, et. al., Across Grids Conference 2004, Nicosia, Cyprus, January 2004, GriPhyN/iVDGL 2004-33

37. Artificial Intelligence and Grids: Workflow Planning and Beyond, Yolanda Gil, Ewa Deelman, et. al., IEEE Intelligent Systems, January 2004, GriPhyN/iVDGL 2004-32

38. VO-Centric Ganglia Simulator, Catalin L. Dumitrescu, Ian Foster, Unpublished, GriPhyN/iVDGL 2004-31

39. Usage Policy-based CPU Sharing in Virtual Organizations, Catalin Dumitrescu, Ian Foster, Submitted to GridWorkshop'04, GriPhyN/iVDGL 2004-30

40. Explicit Control in a Batch-Aware Distributed File System, John Bent, Douglas Thain, et. al., Published in NSDI '04, March 2004, GriPhyN/iVDGL 2004-29

41. GridDB: A Data-Centric Overlay for Scientific Grids, David T. Liu and Michael J. Franklin, Proceedings of the International Conference on Very Large Data Bases (VLDB), Toronto, Canada, August, 2004, GriPhyN/iVDGL 2004-28

42. Data Grid and Gridflow Mangement Systems, Arun Jagatheesan, 2004 IEEE International Con-ference on Web Services, July 2004, GriPhyN/iVDGL 2004-27

43. Gridflow Description, Query, and Execution at SCEC using the SDSC Matrix, Jonathan Wein-berg, Arun Jagatheesan, et. al., Proceedings of the 13th IEEE International Symposium on High Performance Computing (HPDC), Honolulu, Hawaii, June 2004, GriPhyN/iVDGL 2004-26

44. Using Performance Prediction to Allocate Grid Resources, Seung-Hye Jang, Xingfu Wu, et. al., Unpublished, JuneGriPhyN/iVDGL 2004-25

45. Overall Architecture and Aontrol Flow, Ewa Deelman, Ian Foster, et. al., Unpublished, Gri-PhyN/iVDGL 2004-24

46. Types of Editors and Specifications, James Blythe, Richard Cavanaugh, et. al., Unpublished, GriPhyN/iVDGL 2004-23

47. Grid Computing in High Energy Physics, Paul Avery, Proceedings of the International Beauty 2003 Conference, Carnegie Mellon University, Pittsburgh, PA Oct. 14-20 2003, February 2004, GriPhyN/iVDGL 2004-20

48. Data Grids, I. Foster, Databasing the Brain, Wiley, Unpublished, July 2004, GriPhyN 2004-18

49. The CAVES Project - Exploring Virtual Data Concepts for Data Analysis, Dimitri Bourilkov, Unpublished, GriPhyN/iVDGL 2004-3

50. Policy Based Scheduling for Simple Quality of Service in Grid Computing, Jang-uk In, Paul Avery et. al., International Parallel & Distributed Processing Symposium (IPDPS), Santa Fe, New Mexico, April 2004 , GriPhyN/iVDGL 2003-32

49

51. Policy-based Resource Allocation for Virtual Organizations, Catalin Dumitrescu, Ian Foster, et. al., Unpublished, GriPhyN/iVDGL 2003-29

52. Virtual Data in CMS Productions, A. Arbree, Cavanaugh, et. al., In Proceedings of Computing in High Energy and Nuclear Physics 2003 24-28 March 2003, La Jolla, California, May 2003, GriPhyN/iVDGL 2003-26

53. The WorldGrid Transatlantic Testbed, F. Donno et. al., Proceedings of Computing in High En-ergy and Nuclear Physics, 24-28 March 2003, La Jolla, California, March 200, GriPhyN/iVDGL 2003-25

54. The CMS Integration Grid Testbed, The USCMSSC Group, Caltech, FNAL, UCSanDiego, UFlorida, Wisconsin-Madison, Proceedings of Computing in High Energy and Nuclear Physics, 24-28 March 2003, La Jolla, California, March 2003, GriPhyN/iVDGL 2003-24

55. A GLUE Meta-Packaging Proposal for Grid Middleware and Applications, Olof Barring, Ger-mán Cancio, et. al., Unpublished, GriPhyN/iVDGL 2003-23

56. The WorldGrid Design, Robert Gardner, Leigh Grundhoefer, et. al., Unpublished, GriPhyN/iVDGL 2003-22

57. GriPhyN Annual Report - 2002 through 2003, GriPhyN project team, M. Wilde, Editor, Unpub-lished, July 2003, GriPhyN/iVDGL 2003-21

10 References[BOSS] Grandi, C., Renzi, A., “Object Based System for Batch Job Submission and Monitoring (BOSS)”, CMS note 2003/005

[BPEL] Business Process Execution Language (BPEL) for Web Services Version 1.0," BEA, IBM and Microsoft, Aug-2002

[CHIMERA] The Chimera Virtual Data System web page, http://www.griphyn.org/chimera June, 2004

[CLARENS] Steenberg, C., Aslakson, E., Bunn, J., Newman, H., Thomas, M., Van Lingen, F., “Clarens Client Server Applications”, Computing for High Energy Physics, La Jolla, California, 2003

[COBRA[ Coherent Object-oriented Base for Reconstruction, Analysis and simulation (COBRA) web page, http://cobra.web.cern.ch/cobra June, 2004

[CONDOR] The Condor Project Homepage, http://www.cs.wisc.edu/condor June, 2004

[CONG] J. Frey, T. Tannenbaum, I. Foster, M. Livny, and S. Tuecke, "Condor-G: A Computation Man-agement Agent for Multi-Institutional Grids", Proceedings of the Tenth IEEE Symposium on High Perfor-mance Distributed Computing (HPDC 10), San Francisco, California, August 7-9, 2001.

[CORC] Connecting Client Objectives with Resource Capabilities: An Essential Component in Grid Ser-vice Management Infrastructure", Asit Dan (IBM), Catalin Dumitrescu (UChicago), Matei Ripeanu (UChicago) - ICSOC2k4 submission. GriPhyN Technical Report 2004-45.

[ENLACE] ENLACE home page, http://www.phys.utb.edu/enlace/.

[GAE] Bunn, J., Bouriklov, D., Cavanaugh, R., Legrand, I., Muhammad, A., Newman, H., Singh, S., Steenberg, C., Thomas, M., Van Lingen, F., “A Grid Analysis Environment Service Architecture” , http://ultralight.caltech.edu/gaeweb/gae_services.pdf [QOS] In, J., Avery, P., Cavanaugh, R., Ranka, S., “Policy based scheduling for simple quality of service in grid,” Proceedings of the International Parallel & Distributed Processing Symposium (IPDPS), Santa Fe, NM. April, 2004

[GLOBUS] The Globus Alliance web page, http://www.globus.org June, 2004

50

http://www.globus.org/

http://webmail.utb.edu/redirect?http://www.phys.utb.edu/enlace/

http://www.cs.wisc.edu/condor

http://cobra.web.cern.ch/cobra

http://www.griphyn.org/chimera

[GRID3] Grid3 web page, http://www.ivdgl.org/grid3 June, 2004

[IGUANA] Interactive Graphics For User ANAlysis (IGUANA) web page, http://iguana.web.cern.ch/iguana June, 2004

[Liu04] Liu, David T., The Design of GridDB, a Data-Centric Overlay for Grid Computing, M.S. Report, Computer Science Division, University of California, Berkeley, April 2004.

[MCRUN] Graham, G.,. Bertram, I., Evans, D., “MCRunJob: An HEP Workflow Planner for Grid Pro-duction Processing,” Computing in High Energy Physics, La Jolla, 2003

[MONALISA] Legrand, I., Newman, H., Galvez, P., Voicu, E., Cirstoiu, C., “MonaLISA: A Distributed Monitoring Service Architecture”, Computing for High Energy Physics, La Jolla, California, 2003

[MOP] “MOP,” http://www.uscms.org/s&c/MOP June, 2004

[NEEDS] Needs Assessment Workshop at FIU, http://www-ed.fnal.gov/grid/.

[ORCA] CMS Software Group and David Stickland, “CMS Reconstruction Software: The ORCA Project” CMS note 1999/035

[PEGS] E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M. Su, K. Vahi and M. Livny , “Pegasus: Mapping scientific workflows onto the grid,” Across Grids Conference 2004, Nicosia, Cyprus

[POOL] Duellman, D. “POOL Project Overview”, Computing for High Energy Physics, La Jolla, Califor-nia, 2003

[PPDG] Particle Physics Data Grid (PPDG) web page, http://www.ppdg.net June, 2004

[PRED] Seung-Hye Jang, Xingfu Wu, Valerie Taylor, Gaurang Mehta, Karan Vahi, Ewa Dellman, “Us-ing Performance Prediction to Allocate Grid Resources”, GriPhyN Technical Report No. 2004-25.

[REFDB1] Lefebure, V., Andreeva, J. "RefDB." http://cmsdoc.cern.ch/documents/02/in02_044.pdf , CMS IN 2002/044, November 2002.

[REFDB2] Lefebure, V., Andreeva, J., “RefDB: The Reference Database for CMS Monte Carlo Produc-tion,” Computing in High Energy Physics, La Jolla, 2003 [hep-ex/0305092]

[RLS] The Replica Location Services web page, http://www.globus.org/rls June, 2004

[ROOT] ROOT: An Object-Oriented Data Analysis Framework web page, http://root.cern.ch June 2004

[SPHINX] SPHINX: Grid Scheduling Middleware web page, http://www.griphyn.org/sphinx June, 2004

[STEEN] Steenberg, C., Aslakson, E., Bunn, J., Newman, H., Thomas, M., Van Lingen, F., “The Clarens Web Service Architecture”, Computing for High Energy Physics, La Jolla, California, 2003

[SCEC] Weinberg, J., Jagatheesan, A., Ding, A., Fareman, M. and Hu, Y., “Gridflow Description, Query, and Execution at SCEC using the SDSC Matrix, ” Proceedings of the 13th IEEE International Symposium on High-Performance Distributed Computing (HPDC), June 4-6, 2004, Honolulu, Hawaii, USA.

[SPG04] Liu, David T. and Franklin, Michael J., GridDB: Data-Centric Services in Scientific Grids, Se-mantics in Peer-to-Peer and Grid Computing 2004, held in conjunction with WWW’04, New York, NY, May 2004.

[SWFS] Small-World File-Sharing Communities, Adriana Iamnitchi, Matei Ripeanu and Ian Foster, IN-FOCOM 2004, March 2004, Hong Kong. http://arxiv.org/abs/cs.DC/0307036

[SWUP] On Exploiting Small-World Usage Patterns in File-Sharing Communities, Adriana Iamnitchi and Ian Foster, submitted for publication. http://www.cs.duke.edu/~anda/papers/flask_submission.pdf

51

http://www.cs.duke.edu/~anda/papers/flask_submission.pdf

http://arxiv.org/abs/cs.DC/0307036

http://www.griphyn.org/sphinx

http://root.cern.ch/

http://www.globus.org/rls

http://cmsdoc.cern.ch/documents/02/in02_044.pdf

http://www.ppdg.net/

http://www-ed.fnal.gov/grid/

http://www.uscms.org/s&c/MOP

http://iguana.web.cern.ch/iguana

http://iguana.web.cern.ch/iguana

http://www.ivdgl.org/grid3

[UPRS] "Usage Policy-based Resource Scheduling in VOs", Catalin Dumitrescu, (UChicago), Ian Foster (UChicago/Argonne) - GridWorkshop2k4 submission, GriPhyN Technical Report 2004-30.

[UUEO] UUEO web page, http://www-ed.fnal.gov/uueo/.

[VDCMS] Arbree, A., Avery, P., Bourilkov, D., Cavanaugh, R., Rodriguez, J., Graham, G., Wilde, M., Zhao, Y., “Virtual Data in CMS Analysis'' and “Virtual Data in CMS Production” TUAT010 & TUAT011, Computing in High Energy Physics, La Jolla, June 2003

[VDL] Foster, I., Voeckler, J., Wilde, M. and Zhao, Y., “Chimera: A Virtual Data System for Represent-ing, Querying, and Automating Data Derivation”. In Scientific and Statistical Database Management, (2002).

[VLDB04] Liu, David T. and Franklin, Michael J., GridDB: A Data-Centric Overlay for Scientific Grids, Proceedings of the International Conference on Very Large Data Bases (VLDB’04), Toronto, August 2004 (to appear). (Also published as UC Berkeley Computer Science Technical Report TR CSD-04-1311, March 2004)

[VDT] The Virtual Data Toolkit web page, http://www.cs.wisc.edu/vdt June, 2004

52

http://www.cs.wisc.edu/vdt

http://www-ed.fnal.gov/uueo/