29
www.cs.wisc.edu/~miron Welcome to CW 2007!!!

Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

Embed Size (px)

Citation preview

Page 1: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~mironWelcome to CW 2007!!!

Page 2: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

The Condor Project (Established ‘85)

Distributed Computing research performed by a team of ~40 faculty, full time staff and students who

face software/middleware engineering challenges in a UNIX/Linux/Windows/OS X environment,

involved in national and international collaborations,

interact with users in academia and industry, maintain and support a distributed production

environment (more than 4000 CPUs at UW), and educate and train students.

Page 3: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

“ … Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub-systems can be integrated into multi-computer ‘communities’. … “

Miron Livny, “ Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.

Page 4: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

A “good year” for theprincipals and

conceptswe pioneered and the

technologies that implement them

Page 5: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

In August 2006 theUW Academic Planning

Committee approved theCenter for High

Throughput Computing (CHTC). The L&S College created to staff positions

for the center

Page 6: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Main Threads of Activities

› Distributed Computing Research – develop and evaluate new concepts, frameworks and technologies

› Keep Condor “flight worthy” and support our users › The Open Science Grid (OSG) – build and operate a

national High Throughput Computing infrastructure› The Grid Laboratory Of Wisconsin (GLOW) – build,

maintain and operate a distributed computing and storage infrastructure on the UW campus The NSF Middleware Initiative

› Develop, build and operate a national Build and Test facility powered by Metronome

Page 7: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Later today

Incorporating VM technologies

(Condor VMs are now called slots)

and improving supportfor parallel applications

Page 8: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Downloads per month

Page 9: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Downloads per month

Page 10: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Software Development for Cyberinfrastructure(NSF 07-503) Posted October 11, 2006

All awards are required to use NMI Build and Test services, or an NSF designated alternative, to support their software development and testing.  Details of the NMI Build and Test facility can be found at http://nmi.cs.wisc.edu/.

Page 11: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Later today

Working with RedHat

on integrating Condor into Linux

Miron Livny and Michael Litzkow, "Making Workstations a Friendly Environment for Batch Jobs", Third IEEE Workshop on Workstation Operating Systems, April 1992, Key Biscayne, Florida. http://www.cs.wisc.edu/condor/publications/doc/friendly-wos3.pdf

Page 12: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

06/27/97 This month, NCSA's (National Center for Supercomputing Applications) Advanced Computing Group (ACG) will begin testing Condor, a software system developed at the University of Wisconsin that promises to expand computing capabilities through efficient capture of cycles on idle machines. The software, operating within an HTC (High Throughput Computing) rather than a traditional HPC (High Performance Computing) paradigm, organizes machines into clusters, called pools, or collections of clusters called flocks, that can exchange resources. Condor then hunts for idle workstations to run jobs. When the owner resumes computing, Condor migrates the job to another machine.

To learn more about recent Condor developments, HPCwire interviewed Miron Livny, professor of Computer Science, University of Wisconsin at Madison and principal investigator for the Condor Project.

Page 13: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Why HTC? For many experimental scientists, scientific progress and quality of research are strongly linked to computing throughput. In other words, they are less concerned about instantaneous computing power. Instead, what matters to them is the amount of computing they can harness over a month or a year --- they measure computing power in units of scenarios per day, wind patterns per week, instructions sets per month, or crystal configurations per year.

Page 14: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

High Throughput Computing

is a24-7-365activity

FLOPY (60*60*24*7*52)*FLOPS

Page 15: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

The grid promises to fundamentally change the way we think about and use computing. This infrastructure will connect multiple regional and national computational

grids, creating a universal source of pervasive and dependable computing power that supports dramatically new classes of applications. The Grid provides a clear vision of what computational

grids are, why we need them, who will use them, and how they will be programmed.

The Grid: Blueprint for a New Computing InfrastructureEdited by Ian Foster and Carl KesselmanJuly 1998, 701 pages.

Page 16: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

“ … We claim that these mechanisms, although originally developed in the context of a cluster of workstations, are also applicable to computational grids. In addition to the required flexibility of services in these grids, a very important concern is that the system be robust enough to run in “production mode” continuously even in the face of component failures. … “

Miron Livny & Rajesh Raman, "High Throughput Resource Management", in “The Grid: Blueprint for a New Computing Infrastructure”.

Page 17: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Later today

Working with IBM

on supporting HTC on the Blue Gene

Page 18: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Taking HTCto the

National Level

Page 19: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

The Open Science Grid (OSG)

Taking HTC to theNational Level

Miron LivnyOSG PI and Facility CoordinatorUniversity of Wisconsin-Madison

Page 20: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

The OSG vision

Transform processing and data intensive science through a cross-domain self-managed national distributed cyber-infrastructure that brings together campus and community infrastructure and facilitating the needs of Virtual Organizations at all scales

Page 21: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

OSG Principles

Characteristics - Provide guaranteed and opportunistic

access to shared resources. Operate a heterogeneous environment both in services available at any site and for any VO,

and multiple implementations behind common interfaces.

Interface to Campus and Regional Grids. Federate with other national/international Grids. Support multiple software releases at any one time.

Drivers - Delivery to the schedule, capacity and capability of LHC

and LIGO: Contributions to/from and collaboration with the US

ATLAS, US CMS, LIGO software and computing programs.

Support for/collaboration with other physics/non-physics communities. Partnerships with other Grids - especially EGEE and TeraGrid. Evolution by deployment of externally developed new services and technologies:.

Page 22: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Tomorrow

Building Campus Gridswith Condor

Page 23: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

Grid of Grids - from Local to Global

Community Campus

National

Page 24: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

Who are you?

A resource can be accessed by a user via the campus, community or national grid.

A user can access a resource with a campus, community or national grid identity.

Page 25: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Tomorrow

Just in time schedulingwith Condor “glide-ins”

(scheduling overlays)

Page 26: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

OSG challenges

Develop the organizational and management structure of a consortium that drives such a Cyber Infrastructure

Develop the organizational and management structure for the project that builds, operates and evolves such Cyber Infrastructure

Maintain and evolve a software stack capable of offering powerful and dependable capabilities that meet the science objectives of the NSF and DOE scientific communities

Operate and evolve a dependable and well managed distributed facility

Page 27: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

The OSG Project

Co-funded by DOE and NSF at an annual rate of ~$6M for 5 years starting FY-07.

15 institutions involved – 4 DOE Labs and 11 universities

Currently main stakeholders are from physics - US LHC experiments, LIGO, STAR  experiment, the Tevatron Run II and Astrophysics experiments

A mix of DOE-Lab and campus resources Active “engagement” effort to add new

domains and resource providers to the OSG consortium

Page 28: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

SecurityWorkflowsFire-wallsScalabilityScheduling

Page 29: Www.cs.wisc.edu/~miron Welcome to CW 2007!!!. miron The Condor Project (Established ‘85) Distributed Computing research performed by

www.cs.wisc.edu/~miron

Thank you for building such

a wonderful community