BivioPerformanceWP0605

Bivio 2000System Performance

W H I T E P A P E R

Uncompromising performance. Unmatched flexibility.

Uncompromising performance. Unmatched flexibility.

™

2006 PRODUCT EXCELLENCE AWARDS

System Performance OverviewThe most innovative and fastest growing network applications in the areas of security, VoIP (voice

over IP), multimedia, wireless and IPv6 are increasingly being implemented as software applications

running on “network appliance” hardware, signaling a major change in the way network services are

implemented. Many companies in networking are moving away from the hardware-centric product

development philosophy that dominated in the past. Software has become a key service enabler, and

as such the key core expertise for companies seeking dominance in the future network infrastructure.

However, these software applications require the underlying hardware platform to deliver on a

variety of challenging performance requirements:

• Scalable Processing Power: Emerging networking applications require very high computing

capacity to execute sophisticated packet handling, pattern matching and payload inspection

functions that are key enablers for advanced network services. To avoid processing bottlenecks, it is

imperative for next generation network appliances to feature scalable multi-processor architectures

that can be tailored to suit evolving applications’ needs.

• Flexibility: The onslaught of new services, ever changing

standards, protocols and security threats have turned

change into the only constant in today’s networking

landscape. Flexibility through total programmability

is indispensable to deliver on the requirements of the

emerging infrastructure, and the only way to cost-effective

deliver on new services and the long-standing promise of

service convergence.

• Multi-Gigabit Throughput: Network traffic continues

to grow at exponential rates, thus the more traditional

networking requirement of delivering on multi-gigabit

speeds and feeds as well as deterministic QoS (quality of

service) in real-time remains a top priority in network system design.

Performance discussions in networking have traditionally focused on throughput, latency and packet

loss criteria, but it is becoming clear that is too narrow a definition of performance when evaluating

the suitability of network appliance architectures for a particular application.

While network end users may very well continue to measure network performance by the usual

throughput-latency-loss criteria; network application developers need to define the capability set of

the network appliance that will run their software with more holistic system design criteria in mind,

and also consider computing processing power and overall architectural flexibility. Otherwise, several

functional and performance bottlenecks will revealed in service deployment that will limit market

acceptance of the combined software hardware system solution.

Bivio 2000 System ArchitectureThe Bivio 2000 Multi-Gigabit Network Appliance sets the performance standard among network

appliances optimized for the requirements of emerging network applications, which require a

combination of deep packet inspection and multi-gigabit throughput. The fundamental system

philosophy of the Bivio 2000 is to strive to operate as an ideal “intelligent wire”, offering a seamless

cut-through path to network traffic.

The system architecture of the Bivio 2000 is shown in Figure 1. As traffic flows through the Bivio 2000,

it traverses the following building blocks:

• Network Interface Module (NIM)

• Network Processing Card (NPC)

• Switched Stack System Interconnect (SSI)

• Application Processing Card (APC)

Gbps

MIP

S

Applicatio

ns

Router

Serv

ers

Software

�

Figure 1:

Bivio 2000 Basic Architecture.

The following paragraphs provide a functional overview of these building blocks.

Network Interface Module (NIM)The Network Interface Modules (NIM) provide external network connectivity and converts the external

interfaces to the internal serial packet interface (SPI) interface supported towards the NPC. The basic

building block of a Bivio 2000 system is a 2U chassis that accommodates two NIMs. Each NIM has a

dedicated 2.1 Gbps SPI interface to forward ingress traffic into the NPC, and receives egress traffic from

the NPC the same SPI interface, which is in turn forwarded to the outgoing user interfaces. Figure 1

shows the NIMs in “unfolded” form, representing ingress and egress path separately, to clearly illustrate

the system’s traffic flow. As Figure 2 indicates, currently available NIMs can oversubscribe per-slot

ingress bandwidth by a factor of 2:1 in order to ensure that system resources can be efficiently utilized

for applications which tolerate or even demand internal resource oversubscription. Consequently,

the two ingress NIMs can forward a sustained aggregate load of 4.2 Gbps to the Network Processor

subsystem. The transient maximum offered load to the system can reach up to 8 Gbps given the

current generation of 4-port GE network interface modules. Naturally, NIMs do offer local buffering and

traffic management capabilities to support transient bursts.

Network Processor Card (NPC)For any incoming packet, the NPC determines both the internal system destination and the packet’s

forwarding priority in real time. As Figure 1 indicated, the NPC can be thought of as an intelligent

switch that provides truly non-blocking connectivity between the following interfaces:

• 2 x 2.1 Gbps ingress interfaces, one from each NIM.

• A full-duplex 10 Gbps interface towards the application processing layer or towards other NPCs

through the Switched Stack System Interconnect (SSI).

• 2 x 2.1 Gbps egress interfaces, one towards each outgoing NIM.

Inco

min

g Ne

twor

kIn

terfa

ces 2.1G

2.1G

2.1G

2.1G

Outg

oing

Net

wor

kIn

terfa

ces

Chas

sis Ex

tens

ion

Chas

sis Ex

tens

ion

NIM 14 x 1GE

in

NIM 24 x 1GE

in

NIM 14 x 1GE

out

NIM 24 x 1GE

out

NetworkProcessing

Card(NPC)

ApplicationProcessing

Card(APC)

NetworkProcessing

Card(NPC)

10G

10G

10G

10G

10G SSI 10G SSI

Figure 2.

Network Interface Module (NIM)

2.1G SPI

1 GE

1 GE

1 GE

1 GE

Buffering+

InterfaceConversion

Optio

nal F

ailo

ver M

odul

e

MAC

�

Figure 3.

NPC OverivewThe NPC architecture allows the Bivio 2000 to

ensure rigorous deterministic bandwidth, delay,

and jitter guarantees for all packet forwarding

plane functions.

The NPC implements an entirely programmable

packet forwarding plane, and delivers on all real-

time data path tasks such as buffer management,

traffic shaping, data modification, and policing.

It performs these tasks at full wire speed by

performing complex pattern or signature

recognition within the header or payload of

the packet. Typical packet forwarding decisions

executed by the

NPC include:

• Forward a packet to one of 8k egress queues

that will schedule the packet delivery to the

outgoing NIM interface based on configurable

QoS criteria.

• Forward a packet to one of 8k queues towards

the application processing layer using the SSI;

and for this purpose the NPC also implements

the load sharing algorithm towards the fully parallelized PowerPC subsystems in one or more APCs.

• Broadcast a packet to several egress queues towards an outgoing NIM, or towards the application

processing layer, or any configurable combination thereof.

• Block/drop a packet.

All the tables and instructions that control forwarding, classification, traffic management and packet

modification within the NPC can be dynamically re-programmed within the system at any point in

time. This powerful system capability can be exploited to accelerate the packet forwarding plane

for any particular application, since it offers the capability to dynamically maximize the “cut through

match” probability, and consequently allows the system to dramatically maximize throughput and

minimize delay and jitter. This architectural capability is what allows the Bivio 2000 to continuously

adapt in an attempt to emulate the ideal system behavior of “intelligent multi-gigabit wire”. The

resulting architectural flexibility also allows the Bivio 2000 to continuously integrate new network

protocols and services.

Switched Stack System Interconnect (SSI)The SSI is an internal full-duplex 10 Gbps system interface. A 4x4 SSI switch implements the Bivio

2000 “backplane” function, providing connectivity between the following subsystems, as illustrated in

Figure 1:

• NPC to APC intra-system connectivity, each connected to one 5 Gbps SSI port.

• Inter-system connectivity between individual Bivio 2000 2U chassis in order to build large virtual

system architectures Bivio literature refers to as “virtual racks” or “stacks”.

The SSI is designed to provide reliable and non-blocking system connectivity between up to four

NPCs and any number of load-sharing APCs, resulting in a highly scalable system architecture

potentially that can in theory consist of up to 256 individual Bivio 2000 2U systems.

Application Processing Card (APC)The APC acts as the primary host for the Linux networking application(s). One APC hosts four fully

parallelized PowerPC-based CPU subsystems, each running its own individual Linux execution

environment. Each CPU subsystem features its own independent DDR400 memory subsystem and a

PCI/PCI-X expansion slot that can host hardware acceleration daughter boards.

SAR

and

Clas

sifica

tion

Traf

fic M

anag

emen

t(S

ched

ulin

g, Q

oS, C

os)

Mod

ifica

tion

Load

Shar

ing,

Mul

ti-Ca

stin

gan

d Br

oadc

astin

g

Data in4.2 G

10G 10G

Data out4.2 G

SSI

�

By implementing a fully parallelized

architecture, the Bivio 2000 overcomes

bottlenecks that typically plague

network application performance.

Since many network applications are

limited by memory access, and not

necessarily by CPU-power, traditional

symmetrical multiprocessing

architectures that perform so well for

CPU-intensive commercial computing

applications fail to provide similar

performance benefits in networking,

since the CPUs share access to the

same bandwidth-limited system

memory resource.

With the Bivio 2000, application developers can tailor the system architecture to achieve any

performance objectives. Sufficient CPU power or cumulative memory bandwidth can be achieved by

assigning enough parallel CPU subsystems to the task, furthermore the load-sharing algorithms can

be tailored to optimally suit the particular application requirements.

The contribution of the APC towards system performance does not necessarily lie in the fact that it

hosts four CPU subsystems that each deliver on over 3,000 MIPS of raw processing power, but rather

in the APC subsystem’s ability to provide seamless scalability. This characteristic allows it to overcome

any potential performance bottlenecks at the application layer – be it CPU power or memory access

– through elegant resource parallelization and optimized load-sharing.

In all accuracy, it should be pointed out that the NPC card itself also hosts two identical Application

Processing CPU subsystems, however for the purpose of this white paper a purposely abstracted

functional system architecture is used for clarity. Suffice to say that this simply results in additional

CPU horsepower and memory bandwidth residing in a 2U Bivio 2000 system populated with an NPC

and APC card.

Bivio Software ArchitectureThe software architecture and its contribution to system performance is a topic discussed in the Bivio

2000 Network Appliance white paper. But any discussion about Bivio 2000 system performance has

to highlight the fact that, wherever network services are implemented as a software application –as

is increasingly the case in networking- it is myopic to focus exclusively on the hardware architecture

and its performance while taking optimal software behavior as a given. While software engineering

has always been a critical tool for network performance, as – for example – any IP routing table

convergence discussion shows, with critical network services increasingly being implemented in

software it is going to become of paramount importance. In the Bivio 2000, the following software

topics are of particular relevance:

• Linux network drivers: Bivio has developed network drivers designed to accelerate a variety of

network appliance system primitives. The API (application program interface) allows application

developers to take full advantage of the Bivio system performance benefits. However, it must

be pointed out that standard Linux applications experience acceleration even without taking

advantage of specific Bivio API features.

• Linux application: The software application implementing the network service itself has to

be designed with performance in mind. While Bivio system performance is consistently high

irrespective of the path packets take through the system, the single most effective way to

dramatically accelerate any application is to take advantage of the network forwarding cut-through

path in the NPC as often as possible. That way, the Bivio 2000 will approximate “intelligent wire”

performance ideal.

10G

SSI

10G

AP2 AP4

AP1 AP3

2G 2G

2G

2G 2G

Network ServiceApplication Control

Linux ExecutionEnvironment

CPU

Controller

DDR4

00M

emor

y

PCI-X

Exte

nsio

n

Figure 4.

APC Subsystem Overivew

�

Bivio 2000 System Performance BenchmarksFor any network system architecture, the basic test configuration is outlined in Figure 5. The offered

load into the ingress interfaces is gradually

increased, and the maximum forwarding rate

is established as soon as the system discards

traffic. When testing a network appliance, one

must choose challenging real world traffic

patterns and system configurations that are

truly representative of the environment the

system ultimately targets. But while many

specialized network system architectures can

afford to target very specific performance

criteria, a platform as universal as the Bivio

2000 network appliance must provide the

architectural flexibility to deliver deterministic

system performance for a wide variety

of potential applications and network

environments.

Previous sections established that the Bivio 2000 application processing plane gracefully scales

to eliminate processing power and memory bandwidth bottlenecks. Anyone would probably

agree that over 256 APCs providing up to over 3 million MIPS and virtually unlimited memory

bandwidth represent a very powerful computing grid capable of taking on any network application

at multiples of any existing interface speed. Therefore – as Figure 5 shows – when discussing Bivio

2000 performance, in most cases it is permissible to idealize the Application Processing subsystem

as a system path that always provides a predictable 4.2 Gbps pipe, albeit one with variable latency

depending on the application’s processing complexity. That means the entire maximum forwarding

rate of 4.2 Gbps that the NIMs can offer can be sent to the Application Processing layer without

any internal system bandwidth bottlenecks, and subsequently be received back by the Network

Processing layer and forwarded to the egress NIMs. The following sections provide throughput and

latency data for traffic traversing the Bivio 2000 through the APC subsystem path unless specifically

mentioned otherwise.

Bivio 2000 Maximum ThroughputAs the preceding architectural discussion showed,

the Bivio 2000 should and indeed does provide

a sustained maximum system throughput rate

of 4.2 Gbps per 2U stand-alone system for real

world traffic patterns. The Bivio 2000 supports

2.1 Gbps per NIM slot with currently available

NIMs. Figure 6 illustrates the linear, deterministic

throughput behavior of the Bivio 2000 system.

Since the maximum offered load (MOL) that the 8

GE interfaces can provide can reach up to 8 Gbps

of aggregate throughput, application “goodput”

starts to linearly decrease when 4.2 Gbps of

offered load get exceeded. At 8 Gbps MOL, system

goodput decreases to 52.5%, and the remaining

47.5% of traffic is dropped. In summary, real

world traffic patterns – which include correlated

traffic streams, and not just statistical Bernoulli traffic patterns – are supported at a fully deterministic

aggregate throughput rate of 4.2 Gbps.

The robust CoS and QoS mechanisms implemented by the Bivio 2000 give system architects the

tools to assign scheduling priorities on a per class or per flow basis. That way, the available bandwidth

1 2 3

8 x 1GE Tester

Rx 1

Rx 2

Rx 3

Rx 4

Rx 5

Rx 6

Rx 7

Rx 8

4 5 6 7 8

Tx 1

Tx 2

Tx 3

Tx 4

Tx 5

Tx 6

Tx 7

Tx 8

ApplicationProcessing

NIM 1

NIM 2

NIM 1

NIM 2

Simplified Performance Model

NetworkProcessing

2.1G

4.2G

2.1G

2.1G

2.1G

4.2G

Figure 5.

Benchmarking

80%

60%

40%

20%

0%

100%

80%

60%

40%

1 2 3 4 5

Offered Load in Gbps

LossGoodput

�

pool can be allocated in a controlled way, and protecting certain traffic classes against loss within the

Bivio 2000 is a straight forward exercise in CoS parameter definition.

The Bivio 2000 tolerates short traffic bursts that can significantly exceed 4.2 Gbps, but it would be

misguiding to quote figures higher than that as the sustainable throughput rate for the Bivio 2000,

despite lax standards in that regard being somewhat commonplace within the networking industry.

NPC scaling introduces potential blocking behavior when more than four active NPCs are active in the

stack. With two NPCs active in a stack, the aggregate throughput of the system doubles to 8.4 Gbps,

since the SSI easily maintains its truly non-blocking switching characteristic. Stacks containing three

or four active NPCs can display marginal blocking behavior with typical real world traffic distributions,

even though under perfectly even traffic distribution among four systems the system is still non-

blocking in theory. Only stacks hosting more than four active NPCs are blocking. In summary, NPC

scaling enables system configurations of up to 16 Gbps of combined real world aggregate throughput.

Bivio 2000 LatencyEarly in the discussion of the Bivio 2000

architecture, the system philosophy of striving to

behave as an “intelligent wire” was mentioned.

System latency obviously is the system

performance parameter that will benefit the most

from such a design philosophy, and indeed the

Bivio 2000 displays very deterministic and thus

class-leading latency behavior.

It is important to note that, since not all

applications will benefit from low system latency,

this is one parameter – along with loss – that is

typically relevant on a per CoS basis. Figure 7 is

best understood as the typical range of system

latency observed as real world traffic traverses the

Bivio architecture, since without CoS configuration

traffic will tend to experience an average latency within the priority and best effort traffic results.

On a single 2U system, even at loads approximating 4 Gbps of typical real world traffic while running

a computing intensive application –such as intrusion prevention-, the highest average system

latency will not exceed approximately 200 microseconds. In fact, most tests show the Bivio 2000 to

consistently operate with an average system latency of 60 microseconds even under high loads. It

should be re-emphasized this represents the end to end system latency, from ingress to egress GE

port on the NIM, and through the NPC to the SSI and APC.

For the cut-through path, much lower latencies – down to single digit microsecond values – can be

guaranteed for applications that demand low and strictly deterministic latency behavior.

SummaryThe Bivio 2000 product family delivers 4.2 Gbps of aggregate throughput, an average system latency

of about 60 microseconds, and unlimited application processing power in a fully programmable

architecture. Furthermore, the Bivio architecture delivers on utterly deterministic, yet fully configurable,

system behavior in order to support the strict CoS and QoS criteria that also characterize many

emerging network applications.

The Bivio 2000 is the first and only deep packet handling appliance to satisfy the multi-layered set of

performance requirements of emerging network applications. These demand maximum throughput

and minimum latency combined with scalable application processing performance and complete

architectural programmability; and the Bivio 2000 has been designed from the ground up with precisely

these objectives in mind.

250

200

150

100

50

100

80

60

40

20

1 2 3 4 5

Offered Load in Gbps

Best EffortAverage Latency in µs

Priority TrafficAverage Latency in µs Figure 6.

Throughput and Loss

�

Bivio Networks, Inc.

4457 Willow Road, Suite 200

Pleasanton, California 94588

Phone: 925-924-8600

Fax: 925-924-8650

www.bivio.net

About Bivio Networks

Bivio Networks has developed an award-winning, deep-packet

inspection and processing platform that combines unparalleled

scaling of network performance, processing power, and application

agility. Bivio’s network appliance platform features a groundbreaking

architecture specifically optimized for wire-speed execution of

emerging network services that increasingly demand deep packet

processing combined with high network throughput. Based on

open industry standards, Bivio Networks fuses unmatched flexibility

with uncompromising performance to enable its customers to

overcome existing bottlenecks and deliver the foundation of the

next generation network infrastructure.

Copyright ©2001-2006 Bivio Networks, Inc. Uncompromising performance. Unmatched flexibility.

Documents

BivioPerformanceWP0605