33
VFAN: Extension of the Flow-Aware Networking (FAN) architecture to the Grid environment César Cárdenas and Maurice Gagnaire ENST Paris 21ème Congrès DNAC 14-16 novembre 2007

VFAN: Extension of the Flow-Aware Networking (FAN

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: VFAN: Extension of the Flow-Aware Networking (FAN

VFAN: Extension of the Flow-Aware Networking (FAN) architecture to

the Grid environment

César Cárdenas and Maurice Gagnaire

ENST Paris

21ème Congrès DNAC14-16 novembre 2007

Page 2: VFAN: Extension of the Flow-Aware Networking (FAN

Agenda• NGI and Grid networks• Grid networks• Grid QoS architectures• Grid middleware• FAN architectures• Grid traffic• GoFAN (GoIP) vs. GoDS• GoFAN(PFQ) vs. GoFAN(PDRR)• Conclusions• Current and future work• References

Page 3: VFAN: Extension of the Flow-Aware Networking (FAN

NGI and Grid Networks• Cees de Laat vision.• + new trends

– IPTV– Online gaming– Internet of things– Internet of machines– Internet of embeeded systems– Etc.

• The large installed base of Internet services, equipment, and providers slow network development => GoBE

• Difficulty to introduce disruptive technology.• Possible Solutions:

– Adding new features to an existing network.– Contructing a new independent network.– Realizing new features on a virtual network on top of the existing

network (overlay technologies such as Grid and P2P, abstraction/virtualization, Web Services, etc.) then constructing new independent network.

• Therefore, in the years to come: GoIP and P2PoIP.

Page 4: VFAN: Extension of the Flow-Aware Networking (FAN

Grid Networks• Definition: large-scale distributed systems with mechanisms for sharing heterogeneous

resources under locally defined policies: computing, storage, instruments, sensors, RFID, visualization devices, etc.

• “Grid” term analog to the power Grid; allude a commodity (produit de base) • Main objective: enable the creation of virtual organizations (VO’s) on-demand: utility-

computing, utility-storage, virtual laboratories, e-Science, etc..• Main trend (R&D): Network resources and their components as first-class entities

(often the most shared resources) => a more coplex middleware.

Re

trend

Related Concepts

Page 5: VFAN: Extension of the Flow-Aware Networking (FAN

Grid QoS Architectures• GARA: General-purpose Architecture for Reservation and Allocation.• NRSE: Network Resource Scheduling Entity.• G-QoSM: Grid QoS Management.• GNRB: Grid Network-aware Resource Broker.• GNB: Grid Network Broker.• Virtual workspaces (Globus Toolkit project).• Other approaches (mainly from the network community)

Page 6: VFAN: Extension of the Flow-Aware Networking (FAN

Grid QoS Architectures• GARA:

– A layer 3 Grid network services prototype (Bandwidth Broker approach).– Provides QoS for different types of resources.– Allows advance and immediate QoS reservations (local or remote).– Enables easy expansion of services provided to users.

• Advance QoS reservations:– Promises that an application will receive a level of service from a resource.– GARA-API: uniform interface through simple Resource Specification Language

(RSL). Links grid services with layer 3 services.– Can be created (handled to the application), modified, binded/claimed,

cancelled, and monitored.• Resource managers (Admission Control, configure, monitor, report):

– Network: DiffServ (DS): (EF, TB, PQ, Traffic Shaping)– Real-time CPU scheduling: DSRT– Disk access: DPSS and GRIO.– Others can be developed.

Page 7: VFAN: Extension of the Flow-Aware Networking (FAN

Grid QoS Architectures

• GARA limitations:– Reservations vs. diversity: local (disk space) vs distributed (network capacity).– For multidomain: must exists in all the traversed domains => scalability

problems because authentification.

Page 8: VFAN: Extension of the Flow-Aware Networking (FAN

Grid QoS Architectures• NRSE:

– Provides DiffServ QoS. – But stores per-flow/per-application state only at the end sites involved in the

communication.– Service requests: dynamic or in-advance.– Automatically negotiate a multi-domain reservation by communicating with its

counterpart on the remote network, on behalf of its user => highly scalable.– API not clearly defined => Not clear how a Grid application developper would

make use of.• G-QoSM:

– Framework to support QoS management in computational Grids in the context of OGSA (Open Grid Service Architecture).

– Generic modular system.– Supports various types of resource QoS.– Functions:

• Resource and service discovery based on QoS properties.• Provides QoS guarantees at application, middleware and network levels,

and the establishement of SLAs to enforce QoS parameters.• Management levels: guaranteed, controlled load, and best effort.

– Adaptates to share capacity between user levels.

Page 9: VFAN: Extension of the Flow-Aware Networking (FAN

Grid QoS Architectures• GNRB:

– Grid Resource Broker + Network Resource Manager capabilities.– Enables the design and implementation of new mapping/scheduling

mechanisms to take into account both network and computational resources.

– Allows to request information about the network status and, if necessary, can reserve network resources to satisfy the QoS requirements of applications.

– The architecture is centralized being one GNRB per administrative domain => scalability and failure problems.

• GNB:– Check the status of the network and will create both network reservations

and service allocation in order to provide provide network QoS to Grid applications.

– Rely on basic functions provided by the Grid middleware.– Consider DiffServ (best effort or guaranteed).– Implements resource allocation and admission control.– Needs global knowledge of the topology of the network.

Page 10: VFAN: Extension of the Flow-Aware Networking (FAN

Grid QoS Architectures• Virtual workspaces (Globus Toolkit project):

– Provides very fine grain reservations of CPU time, disk, and network bandwidth through the use of virtual machines (currently support Xen) and its tools.

– Main idea: one reserves resources and runs the jobs on the top of them.– Currently handles bridging VM NICs to various networks, can manage IP pools,

and can deliver networking information to VMs via the DHCP protocol.– Currently handle multiple partitions, some decompression, and also

blankspace creation.• Other approaches:

– EMERGE: experimental, DiffServ.– UKERNA: experimental, DiffServ and MPLS, multidomain, bandwidth broker

approach.– Equivalent Differentiated Services (EDS): experimental, layer-4 solution with a

new layer-3 relative DiffServ model.– QoSinus: programmable network approach.– A-Grid: active network adaptation.– IntServ – DiffServ: polict-based management and active network.– Adaptive network reservation: compared against DiffServ (AF & EF).

Page 11: VFAN: Extension of the Flow-Aware Networking (FAN

Grid Middleware• Globus Toolkit (GT): The facto standard for Grid computing.• Others: Condor, Legion, Unicore, and propetary solutions.• GT4: first OGSA-compliant implementation (Web services as well as Grid services

based on WSDL and WSRF).

• Some GARA (pre-WS-GRAM) functionalities have been implemented in WS-GRAM (mainly general advance reservation support). FAN

Page 12: VFAN: Extension of the Flow-Aware Networking (FAN

FAN Architectures

Internet traffic tends toward Poisson and independent as the load increases.

Page 13: VFAN: Extension of the Flow-Aware Networking (FAN

FAN Architectures

Page 14: VFAN: Extension of the Flow-Aware Networking (FAN

Grid Traffic• Wide-variety of network QoS needs.

– More challenging than other network applications (e.g.VoIP)– Visualization applications:

• Low-bandwidth but low-latency control flows.• High-bandwidth and low-latency data transfers.

– Moving large amounts of data for analysis and other purposes:• High-bandwidth but not necessarily low-latency data transfers.

• Two basic services:– A premium service offering a low-delay virtual leased line.– A guaranteed rate service.

• Classified in:– Bulk file transfers (e.g., replications of large databases)– Time-sensitive transfers (e.g., real-time applications or interactive web traffic).

• Often initiated by machines (processes) instead of humans (Internet of Grid Resources).

• At the time we done our work we were not aware about Grid traffic studies so we assume Grid traffic was composed only by GridFTP requests (GoIP).

Page 15: VFAN: Extension of the Flow-Aware Networking (FAN

GridFTP• Extension of FTP to Grid networks. Currently version 2.• The transport protocol in Grids is standard TCP (GridFTP uses default TCP).• File-transfer-only features:

– Parallel data transfers (increase throughput):• Multiple TCP streams between 2 network endpoints.

– Striped data transfers (increase throughput):• Allows multiple server instantiations at either logical or physical nodes can be

set to work on the same data file, acting as a single FTP server.• 1 or more TCP streams between m network endpoints on the sending side

and n network endpoints on the receiving side (including cases where m and n may be different).

– Partial file transfer that allows the transfer of a portion of a file.– Support for reliable and restartable transfers that includes fault recovery methods

for handling transient network failures, server outages and so on.– Third party file transfer allows a third party to transfer files between two servers.

• Performance:– GridFTP achieve ~90% utilization of a 30Gbps link in mem-to-mem transfer.– In a disk-to-disk transfer, resulted a throughput of 58.3% over same link.

• Configuration:– It has been shown that between 4 to 6 parallel TCP are enough to get between

90% and 95% of throughput using TCP Sack.

Page 16: VFAN: Extension of the Flow-Aware Networking (FAN

GoFAN (PFQ) vs. GoDS• Mimicking FAN with DS => fair comparison.

– 1 physical queue with 2 virtual queues each.

– Strict PQ.

– TSW2CM policer with CIR = FR and updated at same time period (100ms).

– RED parameters: 0.2 and 0.8 per virtual queue.

– RIO-D => packet rejection probability is estimated with individual virtual queue sizes, maximal probability = 0.5, default queue weight = 0.002

– Packets that do not meet CIR are deprecated to the second virtual queue (lose priority). In FAN, an accepted flow sending more than FR is deprecated to second priority.

• Packet loss policy: per-flow.• Admission policy: total GridFTP session.• Partition policy: jobs sizes are equally divisible per

individual TCP flow within a GridFTP session.• Scheduling policy: per individual flow.• FR = 0.25 and PL = 0.8• FR and PL estimation periods = 100ms.

• Grid traffic ~PP(λ), λЄ[5,20] GridFTP sessions per minute.

• Grid traffic is composed only by GridFTP sessions.

• Jobs sizes ~exp(.), μЄ[100, 500]MB.• Equal cross traffic => 2λ.• TCPs per GridFTP session: fixed during

all simulation (3 or 9).• TCP Reno is the most used by Grid

community for parallel connections.• Average TCP packet size: 1000B.• Maximum TCP window: 5000 packets.• Grid metrics and congestion control

metrics: Average delay and goodput.

Page 17: VFAN: Extension of the Flow-Aware Networking (FAN

Simulations and Results• Simulations: ns-2.31 under OS Debian 2.6.15 and Intel Xeon at 3.00Ghz with SMP, one-hour

(minimum reservation time in Grids), checked first 5 minutes are transient period for reaching the equilibrium regime. 30 replications per scenario. Inverse-method based on time discretization to generate Poisson process. Proper selection and configuration of random number generators.

Page 18: VFAN: Extension of the Flow-Aware Networking (FAN

GoFAN(PFQ) vs. GoFAN(PDRR)

Page 19: VFAN: Extension of the Flow-Aware Networking (FAN

GoFAN(PFQ) vs. GoFAN(PDRR)

Page 20: VFAN: Extension of the Flow-Aware Networking (FAN

Conclusions• In general, FAN outperforms DS.• FAN(PFQ) outperforms FAN(PDRR) even if its computational complexity is

superior.• Compared with packet-based router, FAN offers enhanced performance in

terms of packet processing (tested over IXP2800).• Flow level bandwidth guarantees have been shown to be achievable with

admission control and an order of magnitude in jitter and latency of individual flows in cluster networks.

• FAN approaches have been patented, implemented and commercialized (CASPIAN and ANAGRAN).

• We evaluated the advantages of FAN over DS under Grid environment.• With FAN, admissions decisions become network-aware and bring the

network as first-class resource.• Flow-based admission decisions are based on real-time measurement of the

network performance, then network resources are allocated according to current network state.

• PFL can be used for advance reservations.

Page 21: VFAN: Extension of the Flow-Aware Networking (FAN

Internet and Grid Traffic over Multilayer-FAN (IGoMFAN)

MFAN is the "dual-core" version of FAN!!!

Page 22: VFAN: Extension of the Flow-Aware Networking (FAN

Current and Future Research• Fairness comparison of FAN architectures under GridFTP traffic.• IGoMFAN (previous slide).• VFAN and VMFAN with network, CPU and storage resources.• Grid traffic.

– Model and/or real traces.– Advance reservations.

• Optimal admission control policies for FAN, VFAN and VMFAN.

Page 23: VFAN: Extension of the Flow-Aware Networking (FAN

References• Cárdenas, Gagnaire, “Performance comparison of Flow-Aware Networking (FAN)

architectures under GridFTP traffic”. 23rd ACM SAC 2008 (SIGAPP). accepted.• Cárdenas, Gagnaire, López, Aracil, “Performance Evaluation of the Flow-Aware

Networking (FAN) architecture under Grid environment”. 20th IEEE/IFIP NOMS 2008. Notification: décembre 1, 2007.

• Cárdenas, Gagnaire, López, Aracil, “Admission control for Grid services in IP networks”. 1st IEEE/ANTS 2007. accepted.

• López, Cárdenas, Hernández, Aracil, Gagnaire, “Extension of the Flow-Aware Networking (FAN) architecture to the IP over WDM environment”. 4th QoS-IP 2008. accepted.

• Cárdenas, C. “Optimal Admission Control Policy of Flow-Aware Networking (FAN) Architectures” Conférence de la SMAI sur l’optimisation et la décision 2007. accepted (poster).

• Cárdenas, “Open Problems on Optimal Admission Control for Flow-Aware Networking (FAN) Architectures” submitted to Annals of Operations Research Special Issue on Stochastic Performance Models for Resource Allocation in Communication Systems, 2007.

• Cárdenas, “Optimal Admission Control Policy of Flow-Aware Networking Architectures (FAN): A Problem Approximation Approach” submitted to ITC 20.

Page 24: VFAN: Extension of the Flow-Aware Networking (FAN

Many Thanks!

Questions?

Page 25: VFAN: Extension of the Flow-Aware Networking (FAN

Related Concepts• Grids vs. P2P: Both concerned with organization of resource sharing within VOs.

Both take overlay approach. Different target communities, resources, applications, scale and failures, services and infrastructure. Grid allows many-to-many sharing, the sharing is more than file resources, focuses on aggregating distributed high-end machines (clusters). P2P focuses on sharing low-end systems (PC’s).

• Grids vs. Web Services: WS are used in Grids, WS are not Grids, they support access to distributed resources but not their coordination. WS focused on communication, Grids enable collaborative resource interplay toward common goals. WS provides standard infrastructure for data exchange between two different distributed applications, Grids provide infrastructure for aggregation of high-end resources for solving large-scale problems.

Page 26: VFAN: Extension of the Flow-Aware Networking (FAN

Related Concepts• Grids vs. Clusters: Cluster use centralized resource manager and scheduling

system, all nodes work as a single unified resource. In Grids, each node has its own resource manager, does not aim at providing a single system view.

• Grids vs. Virtualization: Resource virtualization is the abstraction of resources to make them available dynamically for sharing (inside and outside); offers customers the opportunity to build an IT infrastructure without constraints; focuses on local resources; solves the problem of dedicated resources in a data center and lacks of granularity; Offers a way of moving resources from one application to another dynamically; logical functions are separated from physical; virtualize a single system Grids enable virtualization of IT resources; they enable virtualization of broad-scale and disparate IT resources.

• The best approach depends on the organizational problem!!!• Back

Page 27: VFAN: Extension of the Flow-Aware Networking (FAN

IP-QoS Architectures

• Overprovision of IP Best Effort (rule of thumb): Link underutilization, uncontrolled network availability, unaccepted packet delays, congestion control based on user cooperation, etc.

• IntServ: Scalability problems, etc.• Diffserv: Effective only during overload conditions. Requires complicated traffic engineering to be efficient, etc.• Best-Effort, IntServ and DiffServ do not take into account the probabilistic relation between demand-performance-

capacity because they are not based on mathematical models but in practical solutions.

Granularity

Complexity

($, management)Packets

Flow Classes

Flows

Classes

Best

Effort

Diffserv

Intserv

FAN

Packet Classes

Page 28: VFAN: Extension of the Flow-Aware Networking (FAN

FAN Architectures• FAN F. Kelly (2000)

– Self-managed Internet– Avoid resource reservation and service classes– Use Explicit Congestion Notification (ECN) with utility cost.– Users can modulate their facture by reacting to marks or ignoring marks and paying for advantages

received. – No needs signalization.– No need to distinguish real time and data traffic.– Minimize the delay by means of a virtual queue in ECN marking.– Very complex pricing scheme.

• FAN-FT-1G Jim Roberts et al. (2000)– Flows identified on the fly.– Packets are marked.– Real time traffic is policed and prioritized.– Service differentiation (two classes).– Flow-aware admission control and fair scheduling.– Vulnerable to user misbehavior, measurement implementation problems

• Caspian Networks (2002)– Flows identified on the fly– Assign QoS parameters and route per flow (based on SLA and client specifications).– Route calculations based on QoS parameters and network resources utilization.– Traffic classification.– Scheduling at the input (policing) and at the output (per flow WFQ).– Admission control is applied in case of route congestion.– Private methods for admission control, WFQ need to know specific parameters per flow.

Internet traffic tends toward Poisson and independent as the load increases.

Page 29: VFAN: Extension of the Flow-Aware Networking (FAN

Cross-Protect (2G-FAN)• Based on a TRANSPARENCY concept:

– Delay conservation => Minimal delay and losses for streaming flows.– Throughput conservation => Satisfactory throughput for elastic flows.

• Able to offer adequate quality of service to both elastic and streaming flows.• Increases network efficiency.• Avoids relying on users correctly implementing TCP to ensure fair bandwidth sharing.• It requires no change to existing protocols and no new protocols.• Performs implicit differentiation (no packet marking, no resource reservation).

– Flows emitting at less than the current fair rate are given priority.• Take into account the statistical nature of IP traffic and its impact on network performance.• Simple and robust.

Router View

Admission Control

Switching Scheduling

EstimatorsExponential

Filters

Single-interface View

Flow Arrivals

Accepted Flows

Accepted flows from other interface

Accepted flows to another interface

Incoming Flows

Served Flows

Page 30: VFAN: Extension of the Flow-Aware Networking (FAN

Cross-Protect (2G-FAN)• Traffic controls are applied at flow-level while traffic flows are implicitly classified

only into two broad categories (streaming and elastic flows):– Per-flow fair queuing (PFQ, PDRR):

• Ensures that link bandwidth is shared equitably between contending flows.

• Guarantee low packet delay and loss for real-time flows (whose rate is less than the fair rate) in very high capacity links (2.5 Gbps and above).

– Per-flow admission control (Simple threshold AC policy):• Ensures the scheduler performs correctly even under heavy traffic

(utilization greater than 90%) by maintaining the fair rate above a minimum threshold.

– Per-flow routing• Adaptive routing at flow level would not suffer from the problems of

instability observed when routing is based on shifting traffic aggregates depending on the value of periodically updated accessibility metrics (Bonald et al. 1999). To route flows, rather than packets or aggregates of flows, allows the application of techniques already perfected in the telephone network and appears as the more stable and controllable alternative (Roberts et al. 00).

Page 31: VFAN: Extension of the Flow-Aware Networking (FAN

Cross-Protect 1 (PFQ)• Complexity O(log(N)).• PFQ = Start-Time Fair Queuing + Priority Queue.

AD

FwD

G/M/1 - PFQ

Estimators :FRPL

λf

IDfiPFL

Reject

Admit

Y

N

N Y

PFL[IDf, IPs, IPd, idle-periodf]

λfi FR>U1PL<U2

Admit &idle-

periodfi=0

λra

IDfi & idle-

periodfi=0→PFL

P≤Up

Y

RD

λfj

IPdfi=IPloc

N

Y

Forward toIPIPloc

Decreasing order based on

time_stampQDB

[IDf,Fs,ML,time_stamp,flow_time_stamp,backlog,bytes,virtual_time]

QueueCongested

Y

Reject a packet (i.e. at head of longest backlog)

λrq

Y

d

N

μs

Pointer

Bytes<MTU

N

IDf Є AFL

backlog(F) += L

Y

Page 32: VFAN: Extension of the Flow-Aware Networking (FAN

Cross-Protect 2 (PDRR)• Similar performance as with PFQ (Kortebi et al. 2005).• Complexity of O(1).• Similar scheduler in new CISCO routers (possibility of commercial success).

FwD

AD M/M/1 - PDRR

Estimators :FRPL

Y

N

λf

IDfiPFL

Reject

Admit

Y

N

N Y

PFL[IDf, IPs, IPd]

λfi FR>U1PL<U2

Admit

λra

RD

λfj

IPdfi=IPloc

N

Y

Forward toIPIPloc

AFL[ IDf,, DCi, Qi, Pointers, ByteCount(i)]

QueueCongested

YReject a packet (i.e. at head of longest

backlog)

ByteCount_i ≤ Q_i

λrq

if P≤UpIDfi →PFL

N

d::

ByteCount_i += size(p)IDf Є AFL

InsertActiveList(i)DC_i = 0

ByteCount_i = size(p)

Y

Enqueue(PQ, p)

Enqueue(Queue_i, p)

FIFO

DRR

Page 33: VFAN: Extension of the Flow-Aware Networking (FAN

• Estimators– Service-rate type estimators => Throughput Estimators or Quality of Service

Estimators– Both estimators are the same for PFQ and PDRR queuing systems.– Fair Rate:

• Flow-rate estimator• Attained rate of a virtual flow continuously backlogged during an interval.• Related to available bandwidth => FRmax = C. • Sampling interval of hundreds of milliseconds (TFR)

– Priority Load:• Packet-rate estimator. • Number of bytes emitted by the priority queue.• Related to packet delay.• PLmax = C.• Sampling interval of tens of milliseconds (TPL)

• Simple AC policy:– reject all flows under same conditions.

Current MBAC

2 1

2 1

max , ( ) ( ) 8

( )

FR FR

FR FR

ST C FB t FB tFR

t t

2 1

2 1

( ) ( ) 8

( )

PL PL

PL PL

PB t PB tPL

t t C

min , thYes FR FR PL PLAdmit

No Other

PL

FR

Admission

Region

PLth

FRmin

C

C

Back