15
www.eu-etics.org ETICS All Hands meeting ETICS All Hands meeting Bologna, October 23-25, 2006 Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL

Www.eu-etics.org ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL

Embed Size (px)

Citation preview

www.eu-etics.org

ETICS All Hands meeting ETICS All Hands meeting Bologna, October 23-25, 2006Bologna, October 23-25, 2006

NMI and Condor:Status + Future Plans

Andy PAVLO

Peter COUVARESBecky GIETZEL

Bologna -- All Hands Meeting 2

Overview

• Introduction• Cross-site Job Migration• Improving Documentation• Virtual Machines• Generic Connection Broker• Future Plans• Q & A

Bologna -- All Hands Meeting 3

Introduction

• University of Wisconsin team is dedicated to improving Condor technologies and the NMI framework.

• Condor user base continues to grow.• Expecting upcoming surge of NSF users for NMI.

Bologna -- All Hands Meeting 4

Cross-site Job Migration

• Pools of ETICS computing resources installed at INFN, CERN, and University of Wisconsin.

• Jobs automatically routed to remote sites when local resources are unavailable to satisfy requirements.

• Transparent to users.

Bologna -- All Hands Meeting 5

Cross-site Job Migration

CondorSchedd-on-the-Side

CondorSchedd-on-the-Side

CondorJobCondor

JobCondor-CJob

Grid ResourceRouting Table

NMIBuild/Test

Submission

Local Site

Remote Site

CondorSchedd

CondorSchedd

ResourceAdvertiser

ResourceAdvertiserCondor

Matchmaker

CondorMatchmaker

CondorMatchmaker

CondorMatchmaker

Bologna -- All Hands Meeting 6

Cross-site Job Migration

NMI UniverseBeyond ETICS:

OMII-UK, OMII-Europe

Available Resources

ResourceAdvertiser

CERN

ResourceAdvertiser

INFN

ResourceAdvertiser

University of Wisconsin

Bologna -- All Hands Meeting 7

Cross-site Job Migration

• Current status:– Explicit job routing is available in NMI framework 2.1.7

• Future plans:– Initial deployment (without prereq information): November 2006– Improved matchmaking: December 2006

• Still to be determined:– Authorization/Authentication method(s)– Scalable distributed data dissemination

Bologna -- All Hands Meeting 8

Documentation

• Emphasis on creating complete documentation and user tutorials for NMI framework.

• Additional contributions from Michael Bletzinger (NCSA)• Target deadline: December 2006 ~ January 2007• New website: http://nmi.cs.wisc.edu

Bologna -- All Hands Meeting 9

Virtual Machines

• Jobs are sand boxed inside of a virtual machine– Changes to the system are isolated to the local VM.

• Allow for more robust build and test scenarios• Current Status in Condor:

– Preliminary support for VMware is in Condor 6.9– Users must create the VM image beforehand.– Future plans is to create VM dynamically and insert jobs– Plan to support Xen and VirtualPC Virtual Machines

• Condor's current VM-support is not directly usable by the NMI framework.

Bologna -- All Hands Meeting 10

Virtual Machines: Future Plans

• NMI and ETICS could provide a standard image per OS, configured with pre-requisite software.

• Images are stored in a cache and dynamically deployed with builds and tests.

• Users only need add a single-line to their submission file

• NMI framework enhancements:– Maintain cache of available OS VM images.– Inject build and test scripts inside of VM image.– Extract appropriate status, logs, and job artifacts.

Bologna -- All Hands Meeting 11

Generic Connection Broker

• One way for Condor jobs to traverse firewall.• Daemon that acts as a proxy at the edge of firewalls.• Acts as a broker, then steps out of the way.• Low “maintenance”:

– Works with NATs and multipleprivate networks.

– No changes to firewallconfiguration

Matchmaker

Executor

Submitter

GCB4 1

2

3

5

1) Executor registers with GCB2) Executor advertises to matchmaker3) After match, submitter contacts executor, via GCB4) GCB tells executor to open connection5) Executor opens connection to submitter

Bologna -- All Hands Meeting 12

Gateway Connection Broker

• Currently only supported in Condor 6.8 for Linux• Wisconsin team is working to improve GCB:

– Clean up code base and remove testing logic– Port to other operating systems– Improve scalability and network performance

Bologna -- All Hands Meeting 13

Other Future Plans: NMI

• Parallel scheduling enhancements:– Task synchronization– Primitives today, high-level dependency spec/mgmt tomorrow?– Scalability testing: 10^1, 10^2, 10^3, 10^4 nodes?

• Re-factored database schema:– Improved DB scalability and performance– Improved build/test artifact provenance– Project hierarchy– Users and groups– Builds and tests are coupled to projects– Task-level metrics

• Fuzz testing mechanisms• Website enhancements (maybe):

– Consolidate "old" and "new" web interface– May focus more on debugging info than status info

Bologna -- All Hands Meeting 14

Other Future Plans: Condor

• New Development Series: Condor 6.9• Improved scalability:

– Modularize schedd tasks– Non-blocking I/O

• Privilege separation:– Daemons no longer need to start with setuid permissions– Integration with glexec/sudo

• Enhanced security– Continue with source code audits– Signed ClassAds

• Parallel scheduling:– Document & understand current issues in a pool doing both

independent & parallel work– Improve incrementally based on production experiences

Bologna -- All Hands Meeting 15

• Q & A