28
Legion RYAN BARTLETT, TIMOTHY VIRGILLO

RYAN BARTLETT, TIMOTHY VIRGILLO. The Legion project was born with the determination to build, test, deploy and ultimately transfer to industry, a robust,

Embed Size (px)

Citation preview

LegionRYAN BARTLETT, TIMOTHY

VIRGILLO

What is Legion? The Legion project was born with the determination to build, test, deploy and ultimately transfer to industry, a robust, scalable, Grid computing software infrastructure

An object-based metasystems software that was created at the University of Virginia

A single, coherent virtual machine that addresses key grid issues such as scalability, programming ease, fault tolerance, security, and site autonomy

The user can sit at a terminal and manipulate objects on several processors, but has the illusion of working on a single powerful computer.

1. Site Autonomy - resources are owned and controlled by an array of organizations

2. Extensible Core – The components that comprise Legion’s Core are designed to be replaceable and extensible

3. Scalable Architecture - Completely distributed system to allow for the scalability needed to handle millions of hosts

4. Easy to Use - hide the complexity of the system to create the illusion of working with a single powerful computer

5. High Performance – Large degree of parallelism, requiring parallelization of tasks, data and their arbitrary combinations

10 Key Design Objectives

6. Single Persistent Name space - one name space for file and data access

7. Security - provide mechanisms to allow users to manage the security of their own objects

8. Heterogeneous Resource Management - cross platform to support different types of hardware and software

9. Multiple Language Support - integrate different types of source languages

10. Fault Tolerance - deal with host, communication links, and disk failures with the dynamic reconfiguration

The Legion Grid is comprised of an array of organizations’ resources

To ensure organizations may be willing to participate in Legion, and contribute their resources, each organization must be assured control of their own resources

Site Autonomy

Cannot relace host operating systems◦ Encourages orginizations to contribute resouces to the grid by not requireing resources be dedicated

solely for Legion

Cannot make changes to the interconnection network◦ Legion cannot assume that all resource networks will be within the user's control

Cannot insist that it be run as "root"◦ For security reasons, Legion does not require root access to its resources to function

Design Constraints

Everything is an object◦ Defined as an active process that responds to member function invocations from other objects in the system

Objects are independent and abstracted from the logical address space

Objects communicate with each other through non-blocking method calls

Legion handles the message format and high-level protocol for object interaction, but not the programming language or communication protocol

Each object maintains a local binding cache

Objects may be one of two different states: Active or Inert

Legion Object

Active Objects◦ Run as a process that is ready to accept

member function invocations◦ Object state is maintained in the address space

of the process

Inert Objects ◦ represented by an object-persistent

representation (OPR)◦ OPR is a set of associated bytes residing in

stable storage in the Legion System◦ OPR contains the information that enables the

object to move from Inert to Active

Object States: Inert or Active

Replicating an object ◦ create an Object Address with multiple physical addresses in its list◦ Assign address semantics◦ Bind the LOID of the object to this Object Address

Object Replication

Legion Object Address (LOA)◦ physical address or set of addresses in the case of replicated objects

Legion Object Identifiers (LOIDs)

Context Names◦ Mapped by a directory service called Context Space ◦ human readable strings◦ each Context Name is mapped to a LOID

Legion’s 3 Level Naming System

Contexts A typical context space has a well known root context which in turn “points” to other contexts, forming a directed graph

Support operations that lookup a single string, return all mappings, add a mapping, and delete a mapping

Object Address Element contains 2 basic parts

◦ 32 bit address type field◦ 256 bits of address specific

information

Object Addresses

An Object Address◦ a list of Object Address Elements◦ semantic information that describes how to utilize the

list

Represents an arbitrary communication endpoint, such as a TCP socket

Every Legion object is named by a Legion Object Identifier (LOID)

Legion Object Identifiers (LOIDs)◦ location independent identifiers◦ includes an RSA public key (public-key cryptosystem)◦ each LOID is mapped to a LOA

LegionClass is responsible for handing out unique Class Identifiers to each new class

LOID’s

Bindings from LOID’s to Object Addresses are implemented as triples

Bindings Bindings are first class entities that can be passed around the system and cached within objects

A binding Consists of:◦ LOID◦ Object Address◦ Time that the binding becomes

invalid

Binding Agents are responsible for returning a binding to an Object Address for the object that the LOID names

The persistent state of each Legion Object contains the Object Address of its Binding Agent

Binding Agents◦ objects that map LOIDs with LOAs

Context Objects◦ objects that map Context Names with LOIDS

Host Objects◦ Represents processes

Implementation Objects◦ Executable to handle creation or activation of an object◦ Is transferred from a class object to a host object to

enable the host to create processes with the correct characteristics

Vault◦ Represents persistent storage, for the purpose of

maintaining state, in OPR’s, of the inert Legion objects supported by the vault

Legion Core Objects

Every Legion object is defined and managed by its Class object

Class objects are given System-Level responsibility◦ Create new instances◦ Schedule them for execution◦ Activate/Deactivate objects◦ Provide information about their current location to client objects that wish to communicate with them

Classes and Metaclasses

Millions and MILLIONS of hosts and TRILLIONS of objects, yo

Legion is designed to be decentralized and fully distributed

Applications at the client

Legion Object shared across the Legion System

Scalable Architecture

Every object publishes an interface◦ Inheritable◦ Extendable◦ Specializable

As technology changes and improves, resources in the Legion Grid can be changed or replaced without hindering the system

Extensible Core

High Performance via Resource Selection◦ Choosing hosts with lowest load or greatest

processing power◦ User-level scheduling agents

High Performance via Parallism◦ Support libraries such as MPI◦ Support parallel languages such as MPL◦ Offer wrap parallel component◦ Exporting the run-time library interface to

library, toolkit, and compiler writers

High Performance

Scheduling policies are chosen by the user

Users can create their own schedulers for specific applications

The Legion Scheduling Model

Problems:

Installing Legion without causing significant risk to the system it is installed on

How to protect and control resources

Solutions:

Legion does not require any special privileges or "root" access

Legion allow users to choose what types and levels of security they want for their own objects

In addition, every Legion object contains a function called "MayI"

Security

Public-key cryptography based on RSAREF 2.0.

Three message-layer security modes: private (encrypted communication), protected (fast digested communication with unforgeable secrets to ensure authentic replies to message calls), and no security.

Caching secret-keys for faster encryption of multiple messages between communicating parties.

Auto-encrypted bearer credentials with free-form rights. Propagation of security modes and certificates through calling trees (e.g., if a caller demands encryption, all downstream calls will use it automatically).

Security

Security

Drop-in addition of MayI functionality to existing objects.

Persistent authentication objects that serve as the representation for users in a trust domain.

Secure legion shell to allow users to login to their authentication objects and obtain associated credentials and environment information.

Isolation and protection of objects using local OS accounts.

Easily checked Process Control Daemon for granting limited OS privileges to Legion Host Objects.

Context space configured with access control for multiple users.

Automatic failure detection and recovery

◦ Hosts, jobs, and queues automaticall back up their current state to prevent loss of information

◦ Dynamic configuration allows processes to change resources without interupting operations

◦ If a host is lost or unavailable, the job is automatically migrated to another host

Fault Tolerance

Of the early grid-computing solutions, Legion is unique in that it took an object-orientated approach

It metamorphosed from an academic project to a commercial vendor with Avaki

Avaki pushed the LOID naming conventions as an industry secure naming protocol in 2002, which Compaq, Hewlet Packard, IBM, Platform Computing, and Sun Microsystem all welcomed

IBM adopted the System for their Life Sciences research

Though the platform in its commercial state is proprietary, it can be assumed that the Legion->Avaki->Sybase->SAS ownership chain has continued the growth and expansion of the system

Contribution

Scalability claims refers to communication traffic required as part of the implementation model◦ LOID binding lookups from objects to Binding Agents◦ Binding Agent traffic required to satisfy object binding requests

Assumes that most accesses will be local◦ Same organization◦ Within a department or university campus

Inter user-level object communication inside of an application may or may not contain a bottleneck

◦ User implementation may have a centralized object that acts as shared memory for a large number of workers

Resource starvation results in increasingly poor performance

Drawbacks?

Comparison of Globus against Legion with Matrix Multiplication using the MPI libraries

Performance

Too many requirements and decisions are placed on the shoulders of the resource owners and users. It contradicts the overall goal of Legion being easy to use

It aspires to be multi-organizational, but lacks easy scalability across organizations

Performance

Fault Tolerance is not explicitly covered in the available documentation, and Avaki continued to develop the code-base and likely solved these issues, but that is commercial and proprietary.

Performance measurements may be volatile as it cannot be predicted how Legion scales across hosts

◦ Bottlenecks may occur at the application level

Personal Opinions for Improvements

Reasonable, well phrased questions?