20
Feb 6-7, 2003 1 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter Workshop February 6-7, 2003

Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Embed Size (px)

Citation preview

Page 1: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 1

OptIPuter Software Research and Architecture

Andrew A. Chien

Computer Science and Engineering

University of California, San Diego

OptIPuter Workshop

February 6-7, 2003

Page 2: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 2

OptIPuter Software Research

• Key driving technology changes– advent of massive bandwidth; orders of magnitude increases both in the local-

area and wide-area for wired systems,– lambda programmed “end to end” connections which can be used as private

networks and can provide guaranteed bandwidth, – endpoint machines which cannot terminate more than a single lambda, due to

performance scaling, – large-scale network-attached storage, instruments, displays, and other

peripherals, and– Grids and flexible wide-area sharing.

• Key research areas suggest opportunities new capabilities in– High performance communication/data movement (bandwidth, time to

bandwidth)– Tight-coupling data/storage with computing, visualization, other devices across

wide area– Proactive use of communication, data, and compute resources to enhance

applications

Page 3: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 3

Network Impact of Lambda’s

• Optical “circuit switching” with DWDM– Bandwidth: more from the same fiber infrastructure– Dedicated: controllable latency, low jitter, predictable

bandwidth– Private: security, data integrity– Avoid routing (cost, variable latency)

Page 4: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 4

Exploiting s for an Application

• Applications request -connections• Networks/endpoints automatically recognize high bandwidth flows

and allocate/configure transparently• Ad hoc point-to-point connections

Page 5: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 5

A System View

• Patch Panel computers? Array processors? Systolic processors? – Connections form a virtual system abstraction

• How do we think of the Computing Elements and Network connected together as a SYSTEM?

• Based on connections, what are the potential capabilities?• => Scenarios for composition into a virtual computer

Page 6: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 6

Scenario #1

• Dynamic Virtual Computer (DVC)– User (any entity or collection of entities) forms on-demand

– Dynamic configuration of -network and binding of resources

– Possibilities– Centralized control/management of resources in virtual computer– Novel security properties for distributed resources– Novel performance properties for distributed resources

Dynamically formed Virtual Computer (VC)

Page 7: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 7

Scenario #2

• Pseudo-Static Virtual Computer (PSVC)– Administrator(s) cooperate to form PSVC configuration– Users (or any entity) can instantiate PSVC on-demand– Slower configuration of -network and binding of resources– Possibilities

– Centralized control/management of resources in virtual computer– Novel security properties for distributed resources– Novel performance properties for distributed resources

Pseudo-static Configuration

Page 8: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 8

Scenario #3

• Some Devices can’t run at speeds; should they be left out?– Storage, instruments (microscopes? ), frame buffers, legacy devices”– Enabling “slower” devices to participate in a virtual computer

• Extend the capabilities of s thru traditional networks to these devices (or sharing connections)– “Direct access” to shared devices– Preserve unique -capabilities

– Dedicated: controllable latency, low jitter, predictable bandwidth– Private: security, data integrity

??

Page 9: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 9

OptIPuter Software Research

• Near Term Goals and Activities– Define Testbeds and Support Use

– Standard OptIPuter node and on-ramp network infrastructure – Define scope of testbed experiments and stability – Distributed Configuration Management For OptIPuter Systems

(nodes, networks)

– Control Plane Software For DWDM Management And Dynamic Setup – High Speed IP-based Protocols (RBUDP, SABUL, hsTCP, …)

– Jumpstart application “rethinking” for -enabled environments– Computer science and application teams intimate with OptIPuter

potentials and application needs

Page 10: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 10

Long Term Goals

• System Models– Novel system mechanisms and abstractions; exploit/expose unique

-capabilities

• Component Technologies– Communication– Security Models– Data Abstractions– Real-time Objects– -configuration management– Virtual Computer configuration management

• Technical foundation for widespread use – Ex. New capabilities, new models, radical new applications

• Enable the driving applications (and many others)– Make easy, high leverage use of OptIPuter capabilities– Demonstrate models for next-generation Distributed E-science

Page 11: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 11

Component Technologies

• Communication Protocols which deliver novel capabilities and make -based easy to use– Bandwidth, latency, parallel stripes

• Security models– Leverage l-capabilities and support virtual computer models– Low-overhead integration of resources into virtual computer models and delivery

of performance• Proactive Data Placement, Movement & Management supports new

capabilities– Expend (“waste”) communication resources to enhance applications– Intelligently replicate and migrate data– Proactive optimization

• Real-time Virtual Computers for distributed applications– Ease programming, performance modeling– Enable novel applications

• Virtual Computer Configuration and management– Integrates control plane management into resource management

Page 12: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 12

OptIPuter Communication Challenges

• Terminating A Terabit Link In An Application System– --> Not A Router

• Parallel Termination With Commodity Components– N 10GigE Links -> N Clustered Machines (Low Cost)– Community-Based Communication

• What Are:– Efficient Protocols to Move Data in Local, Metropolitan, Wide Area?

– High Bandwidth, Low Startup, “Time to Bandwidth”– Dedicated Channels, Shared Endpoints

– Good Parallel Abstractions For Communication?– Coordinate Management And Use Of Endpoints And Channels– Convenient For Application, Storage System

– Secure Communication Models For “Single System View”– Enabled By “Lambda” Private Channels– Exploit Flexible Dispersion Of Data And Computation

Page 13: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 13

Communication Challenges (Example)

• Communicate FAST (Quick)– How to scale to a Terabit

and sustain it– Parallel endpoints

– TCP and alternatives– Psockets, SABUL 2.1,

RBUDP 0.1, hsTCP, XCP– Bandwidth; Latency

– Lightweight bypass protocols

– FM, AM, BIP, Hamlyn, ST

• Communicate FAIR– How to share resources

(contention at the endpoints, if not in the network)

– Coexistence compatibility; robustness of applications performance

Page 14: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 14

OptIPuter Storage Challenges

• DWDM Enables Uniform Performance View Of Storage– How To Exploit Capability?

– Other Challenges Remain: Security, Coherence, Parallelism

– “Storage Is a Network Device”

• Storage Federation: Grid View (High-Level) vs Single-System (Low-level)– Grid: GridFTP, NAS, w/ Access-control and Security in Protocol (Performance

Challenges)

– Single system: Secure Single System View, SAN direct access (Security Challenges)

– Tradeoffs: Performance, Security, and Access Control

• Plentiful Bandwidth enables Proactive Data Management– “Waste” storage, bandwidth, and computation empower applications

– Drive via models, speculation, application hints, replication and data movement

Page 15: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 15

Storage Challenges (Example)

• Earthscope SAR Application– High speed data integration/visualization

– 32 gigabytes, delivered in less than 0.5 seconds

– Presumed to be sourced from MANY disks distributed throughout the OptIPuter network

– How many disks? How many streams? What are the critical performance factors?

Page 16: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 16

Parallel Transfer Performance

• Assume physical network no longer the bottleneck

• Access time Elements– 10Gbps link: identify, authenticate, connect, xfer data, complete (~33 seconds)– 128 x 10 Gbps links (and storage): <same steps, parallel transfer> (~0.75 seconds– ...but disk and network variability + scaling become key issues

Contribution to Total Access Time

32

0.25

0.32

0.32

0.1

0.1

0.05 0.07

0%

20%

40%

60%

80%

100%

1 Disk 128 Disks

Connection

Security

TCP slow start

Data Transfer

Page 17: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 17

OptIPuter Software Architecture

• Approach:– Leverage advances in Grid Software (e.g. Globus 2.2 and 3.0)– Add software/protocols/API’s for managing Lambdas

• Explore what else must/can change– To capture the potential of Lambda networks– To simplify where it is now possible– To deliver higher performance– To deliver greater capability

Page 18: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 18

OptIPuter Software Architecture v0.1

• Network -configuration enables “virtual computer” view

• OptIPuter middleware technologies expose/exploit unique capabilities based on s

• Virtual computer abstraction enables challenging, novel applications

-setup, Mgmt“Classic” Grid Middleware

Fast ProtocolsReal-Time Objects

Security ModelsData

Access Protocols

Node Operating SystemNetwork Routers/Switches

Compute/Storage Physical Resources

Virtual Computer Abstraction

OptIPuter Applications

Page 19: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 19

OptIPuter Software Architecture

• Not a strict layering atop Globus• Some Features implemented as new services

– -management and configuration

– Security configuration services– Fast Protocols– Real-time objects

• Some Features implemented as modifications to– Communication: Globus_IO/XIO and network management– Resource Management: GRAM/GARA/SNAP– Data Movement/Management: GASS/GridFTP/Replication– Security: GSI, GSS

Page 20: Feb 6-7, 20031 OptIPuter Software Research and Architecture Andrew A. Chien Computer Science and Engineering University of California, San Diego OptIPuter

Feb 6-7, 2003 *** OptIPuter Software *** 20

OptIPuter SW Research Summary

• Near Term Goals and Activities– Define Testbeds and Support Use (HW, node SW, management,

level of experimentation)– Control Plane Software – High Speed IP-based Protocols– CS and App teams “meeting of minds”

• Long Term Goals and ActivitiesSystem Models

– Novel system abstractions; exploit/expose unique -capabilitiesComponent Technologies

– Communication, Security Models, Data Abstractions, Real-time Objects, -management, Virtual Computer configuration management

Technical foundation for widespread use and great utility– Ex. New capabilities, new models, radical new applications– Enable the driving applications (and many others)– Demonstrate models for next-generation Distributed E-science