31
VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

Embed Size (px)

Citation preview

Page 1: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 1

The Virtual Data Toolkit

Todd Tannenbaum(Alain Roy)

Page 2: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 2

What is the VDT?

• A packaging of software– Grid software (Globus, Condor-G…)– Virtual data software (Chimera)– Utilities

• An easy installation mechanism• Testing and hardening• Support

Page 3: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 3

Who makes the VDT?

• Grid Physics Network (GriPhyN)– Constructs the VDT

• International Virtual Data Grid Laboratory (IVDGL)– Testing and hardening

Very tight collaboration between GriPhyN and IVDGL

Page 4: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 4

Who makes the VDT? (2)

• Core VDT Team:– Miron Livny: The boss– Alain Roy– Carey Kireyev

• VDT Testing– Xin Zhao– Brian Moe

• Pacman– Saul Youssef

Page 5: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 5

Who uses the VDT?

• GriPhyN collaborators– USCMS: In use today– USAtlas: In use today– LIGO: Will use soon– SDSS: Will use soon

• European Data Grid– Uses subset of software– Uses just RPMs

• LCG

Page 6: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 6

What exactly is in VDT?

• VDT 1.1.8:– Globus 2.2.4 + advisories + patches– Condor & Condor-G 6.5.1– Chimera/Pegasus– RLS– GLUE Schema– CA Certificates– Fault Tolerant Shell– EDG’s Make Gridmap– EDG’s CRL Update– ClassAds– Netlogger

Page 7: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 7

What exactly is in VDT?

• VDT 1.1.8:– Globus 2.2.4 + advisories + patches– Condor & Condor-G 6.5.1– Chimera/Pegasus– RLS– GLUE Schema– CA Certificates– Fault Tolerant Shell– EDG’s Make Gridmap– EDG’s CRL Update– ClassAds– Netlogger

Page 8: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 8

Grid Software Installation

Typical Grid SoftwareInstallation Experience…

VDT Installation Experience!

Page 9: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 9

VDT Installation

• 2 Methods– Pacman– RPM

Page 10: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 10

Pacman Installation

• Goal: – Type a single command– Everything downloads– Everything installs– Everything is configured– No questions asked

• We’re close:– A few questions if you’re root– Basic configuration, may need changing

Page 11: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 11

Pacman Installation (2)

• Download Pacman– http://physics.bu.edu/~youssef/pacman/

• Install VDT– cd <install-directory>– pacman -get VDT-Server– pacman -get VDT-Client– ls

condor/ globus/ post-install/ setup.sh

edg/ gpt/ replica/ vdt/

ftsh/ perl/ setup.csh vdt-install.log

• Use

Page 12: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 12

Pacman post-installation

• Post-install directory:– Notes on configuration choices made– Instructions for editing configuration

• Configuration scripts:– Globus configuration– Condor configuration

Page 13: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 13

RPM Installation

• Subset of whole VDT– Globus– Condor-G

• Nice RPMs:– We repackage Globus– A dozen Globus RPMs, not hundreds

• No configuration• No post-installation help

Page 14: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 14

Testing

• VDT team is building test suite• Interaction with LCG testing group• Working with NMI* to leverage:

– NMI test suite• Stress testing• Application testing (CMS pipeline)

– NMI test infrastructure

* NMI = NSF Middleware Initiative– http://www.nsf-middleware.org

Page 15: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 15

Support

• Send us questions or problems– We will solve them if we can– We will interact with the developers, if

necessary

Page 16: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 16

Interaction with EDG

• EDG gets Globus and Condor-G RPMs from VDT

• We do what we can to solve problems and get changes to Globus and Condor

• We want to make a great package for you

Page 17: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 17

What exactly is in VDT?

• VDT 1.1.8:– Globus 2.2.4 + advisories + patches– Condor & Condor-G 6.5.1– Chimera/Pegasus– RLS– GLUE Schema– CA Certificates– Fault Tolerant Shell– EDG’s Make Gridmap– EDG’s CRL Update– ClassAds– Netlogger

Page 18: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 18

Chimera Virtual Data System

• Much scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures

• Chimera catalog can be used by application environments to describe a set of application programs ("transformations"), and then track all the data files produced by executing those applications ("derivations").

• Chimera contains the mechanism to locate the "recipe" to produce a given logical file, in the form of an abstract program execution graph. These abstract graphs are then turned into and executable DAG for the Condor-G DAGMan meta-scheduler by the bundled Pegasus planner.

• Enables on-demand execution of computation schedules constructed from database queries.

Page 19: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 19

NetLogger

• “Networked Application Logger”• API w/ calls you add to existing

source code to generate time-stamped monitoring events (sent to a file, network server, syslogd, or RAM)

• Visualization Tools• Storage and Retrieval Tools

– Store all events into a database

Page 20: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 20

Fault Tolerant Shell (FTSH)

• The Grid is a hard environment.• FTSH

– The ease of scripting with very precise error semantics.

– Exception-like structure allows scripts to be both succinct and safe.

– A focus on timed repetition simplifies the most common form of recovery in a distributed system.

– A carefully-vetted set of language features limits the "surprises" that haunt system programmers.

Page 21: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 21

Simple Bourne script…

#!/bin/sh

cd /work/foo

rm –rf data

cp -r /fresh/data .

What if ‘/work/foo’ is unavailable??

Page 22: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 22

Getting Grid Ready…#!/bin/sh for attempt in 1 2 3

cd /work/foo if [ ! $? ] then

echo "cd failed, trying again..." sleep 5

else break

fi done

if [ ! $? ] then

echo "couldn't cd, giving up..." return 1

fi

Page 23: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 23

Or with FTSH

#!/usr/bin/ftsh

try 5 times

cd /work/foo

rm -rf bar

cp -r /fresh/data .

end

Page 24: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 24

Or with FTSH

#!/usr/bin/ftsh

try for 3 days or 100 times

cd /work/foo

rm -rf bar

cp -r /fresh/data .

end

Page 25: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 25

Or with FTSH

#!/usr/bin/ftsh

try for 3 days every 1 hour

cd /work/foo

rm -rf bar

cp -r /fresh/data .

end

Page 26: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 26

Or with FTSH

#!/usr/bin/ftsh

try for 3 days every 1 hour

cd /work/foo

rm -rf bar

cp -r /fresh/data .

end

Page 27: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 27

Or with FTSH

#!/usr/bin/ftsh

try for 3 days every 1 hour

cd /work/foo

rm -rf bar

cp -r /fresh/data .

end

Page 28: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 28

Or with FTSH

hosts="mirror1.wisc.edu mirror2.wisc.edu mirror3.wisc.edu"

forany h in ${hosts} echo "Attempting host ${host}" wget http://${h}/some-file

end

echo "Got file from ${h}"

Page 29: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 29

FTSH

• All the usual constructs– Redirection, loops, conditionals, functions,

expressions, nesting, …• And more

– Logging– Timeouts– Process Cancellation– Complete parsing at startup– File cleanup

• Used on Linux, Solaris, Irix, Cygwin, …• Simplify your life!

Page 30: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 30

VDT’s Future

• Additional Software– MyProxy, Java ClassAds

• Access to new versions– Globus 3.0

• Extra VDT to help early adopters• Condor-G will submit to GT2 or GT3

• Helping You– What can we do to make life easier

for you?

Page 31: VDT 1 The Virtual Data Toolkit Todd Tannenbaum (Alain Roy)

VDT 31

Where do you learn more?

• http://www.griphyn.org/vdt• Support:

[email protected]– Alain Roy: [email protected]– Miron Livny: [email protected]