Upload
pablo-liesenberg
View
8
Download
0
Embed Size (px)
Citation preview
Bivio 2000System Performance
W H I T E P A P E R
Uncompromising performance. Unmatched flexibility.
Uncompromising performance. Unmatched flexibility.
™
2006 PRODUCT EXCELLENCE AWARDS
System Performance OverviewThe most innovative and fastest growing network applications in the areas of security, VoIP (voice
over IP), multimedia, wireless and IPv6 are increasingly being implemented as software applications
running on “network appliance” hardware, signaling a major change in the way network services are
implemented. Many companies in networking are moving away from the hardware-centric product
development philosophy that dominated in the past. Software has become a key service enabler, and
as such the key core expertise for companies seeking dominance in the future network infrastructure.
However, these software applications require the underlying hardware platform to deliver on a
variety of challenging performance requirements:
• Scalable Processing Power: Emerging networking applications require very high computing
capacity to execute sophisticated packet handling, pattern matching and payload inspection
functions that are key enablers for advanced network services. To avoid processing bottlenecks, it is
imperative for next generation network appliances to feature scalable multi-processor architectures
that can be tailored to suit evolving applications’ needs.
• Flexibility: The onslaught of new services, ever changing
standards, protocols and security threats have turned
change into the only constant in today’s networking
landscape. Flexibility through total programmability
is indispensable to deliver on the requirements of the
emerging infrastructure, and the only way to cost-effective
deliver on new services and the long-standing promise of
service convergence.
• Multi-Gigabit Throughput: Network traffic continues
to grow at exponential rates, thus the more traditional
networking requirement of delivering on multi-gigabit
speeds and feeds as well as deterministic QoS (quality of
service) in real-time remains a top priority in network system design.
Performance discussions in networking have traditionally focused on throughput, latency and packet
loss criteria, but it is becoming clear that is too narrow a definition of performance when evaluating
the suitability of network appliance architectures for a particular application.
While network end users may very well continue to measure network performance by the usual
throughput-latency-loss criteria; network application developers need to define the capability set of
the network appliance that will run their software with more holistic system design criteria in mind,
and also consider computing processing power and overall architectural flexibility. Otherwise, several
functional and performance bottlenecks will revealed in service deployment that will limit market
acceptance of the combined software hardware system solution.
Bivio 2000 System ArchitectureThe Bivio 2000 Multi-Gigabit Network Appliance sets the performance standard among network
appliances optimized for the requirements of emerging network applications, which require a
combination of deep packet inspection and multi-gigabit throughput. The fundamental system
philosophy of the Bivio 2000 is to strive to operate as an ideal “intelligent wire”, offering a seamless
cut-through path to network traffic.
The system architecture of the Bivio 2000 is shown in Figure 1. As traffic flows through the Bivio 2000,
it traverses the following building blocks:
• Network Interface Module (NIM)
• Network Processing Card (NPC)
• Switched Stack System Interconnect (SSI)
• Application Processing Card (APC)
Gbps
MIP
S
Applicatio
ns
Router
Serv
ers
Software
�
Figure 1:
Bivio 2000 Basic Architecture.
The following paragraphs provide a functional overview of these building blocks.
Network Interface Module (NIM)The Network Interface Modules (NIM) provide external network connectivity and converts the external
interfaces to the internal serial packet interface (SPI) interface supported towards the NPC. The basic
building block of a Bivio 2000 system is a 2U chassis that accommodates two NIMs. Each NIM has a
dedicated 2.1 Gbps SPI interface to forward ingress traffic into the NPC, and receives egress traffic from
the NPC the same SPI interface, which is in turn forwarded to the outgoing user interfaces. Figure 1
shows the NIMs in “unfolded” form, representing ingress and egress path separately, to clearly illustrate
the system’s traffic flow. As Figure 2 indicates, currently available NIMs can oversubscribe per-slot
ingress bandwidth by a factor of 2:1 in order to ensure that system resources can be efficiently utilized
for applications which tolerate or even demand internal resource oversubscription. Consequently,
the two ingress NIMs can forward a sustained aggregate load of 4.2 Gbps to the Network Processor
subsystem. The transient maximum offered load to the system can reach up to 8 Gbps given the
current generation of 4-port GE network interface modules. Naturally, NIMs do offer local buffering and
traffic management capabilities to support transient bursts.
Network Processor Card (NPC)For any incoming packet, the NPC determines both the internal system destination and the packet’s
forwarding priority in real time. As Figure 1 indicated, the NPC can be thought of as an intelligent
switch that provides truly non-blocking connectivity between the following interfaces:
• 2 x 2.1 Gbps ingress interfaces, one from each NIM.
• A full-duplex 10 Gbps interface towards the application processing layer or towards other NPCs
through the Switched Stack System Interconnect (SSI).
• 2 x 2.1 Gbps egress interfaces, one towards each outgoing NIM.
Inco
min
g Ne
twor
kIn
terfa
ces 2.1G
2.1G
2.1G
2.1G
Outg
oing
Net
wor
kIn
terfa
ces
Chas
sis Ex
tens
ion
Chas
sis Ex
tens
ion
NIM 14 x 1GE
in
NIM 24 x 1GE
in
NIM 14 x 1GE
out
NIM 24 x 1GE
out
NetworkProcessing
Card(NPC)
ApplicationProcessing
Card(APC)
NetworkProcessing
Card(NPC)
10G
10G
10G
10G
10G SSI 10G SSI
Figure 2.
Network Interface Module (NIM)
2.1G SPI
1 GE
1 GE
1 GE
1 GE
Buffering+
InterfaceConversion
Optio
nal F
ailo
ver M
odul
e
MAC
�
Figure 3.
NPC OverivewThe NPC architecture allows the Bivio 2000 to
ensure rigorous deterministic bandwidth, delay,
and jitter guarantees for all packet forwarding
plane functions.
The NPC implements an entirely programmable
packet forwarding plane, and delivers on all real-
time data path tasks such as buffer management,
traffic shaping, data modification, and policing.
It performs these tasks at full wire speed by
performing complex pattern or signature
recognition within the header or payload of
the packet. Typical packet forwarding decisions
executed by the
NPC include:
• Forward a packet to one of 8k egress queues
that will schedule the packet delivery to the
outgoing NIM interface based on configurable
QoS criteria.
• Forward a packet to one of 8k queues towards
the application processing layer using the SSI;
and for this purpose the NPC also implements
the load sharing algorithm towards the fully parallelized PowerPC subsystems in one or more APCs.
• Broadcast a packet to several egress queues towards an outgoing NIM, or towards the application
processing layer, or any configurable combination thereof.
• Block/drop a packet.
All the tables and instructions that control forwarding, classification, traffic management and packet
modification within the NPC can be dynamically re-programmed within the system at any point in
time. This powerful system capability can be exploited to accelerate the packet forwarding plane
for any particular application, since it offers the capability to dynamically maximize the “cut through
match” probability, and consequently allows the system to dramatically maximize throughput and
minimize delay and jitter. This architectural capability is what allows the Bivio 2000 to continuously
adapt in an attempt to emulate the ideal system behavior of “intelligent multi-gigabit wire”. The
resulting architectural flexibility also allows the Bivio 2000 to continuously integrate new network
protocols and services.
Switched Stack System Interconnect (SSI)The SSI is an internal full-duplex 10 Gbps system interface. A 4x4 SSI switch implements the Bivio
2000 “backplane” function, providing connectivity between the following subsystems, as illustrated in
Figure 1:
• NPC to APC intra-system connectivity, each connected to one 5 Gbps SSI port.
• Inter-system connectivity between individual Bivio 2000 2U chassis in order to build large virtual
system architectures Bivio literature refers to as “virtual racks” or “stacks”.
The SSI is designed to provide reliable and non-blocking system connectivity between up to four
NPCs and any number of load-sharing APCs, resulting in a highly scalable system architecture
potentially that can in theory consist of up to 256 individual Bivio 2000 2U systems.
Application Processing Card (APC)The APC acts as the primary host for the Linux networking application(s). One APC hosts four fully
parallelized PowerPC-based CPU subsystems, each running its own individual Linux execution
environment. Each CPU subsystem features its own independent DDR400 memory subsystem and a
PCI/PCI-X expansion slot that can host hardware acceleration daughter boards.
SAR
and
Clas
sifica
tion
Traf
fic M
anag
emen
t(S
ched
ulin
g, Q
oS, C
os)
Mod
ifica
tion
Load
Shar
ing,
Mul
ti-Ca
stin
gan
d Br
oadc
astin
g
Data in4.2 G
10G 10G
Data out4.2 G
SSI
�
By implementing a fully parallelized
architecture, the Bivio 2000 overcomes
bottlenecks that typically plague
network application performance.
Since many network applications are
limited by memory access, and not
necessarily by CPU-power, traditional
symmetrical multiprocessing
architectures that perform so well for
CPU-intensive commercial computing
applications fail to provide similar
performance benefits in networking,
since the CPUs share access to the
same bandwidth-limited system
memory resource.
With the Bivio 2000, application developers can tailor the system architecture to achieve any
performance objectives. Sufficient CPU power or cumulative memory bandwidth can be achieved by
assigning enough parallel CPU subsystems to the task, furthermore the load-sharing algorithms can
be tailored to optimally suit the particular application requirements.
The contribution of the APC towards system performance does not necessarily lie in the fact that it
hosts four CPU subsystems that each deliver on over 3,000 MIPS of raw processing power, but rather
in the APC subsystem’s ability to provide seamless scalability. This characteristic allows it to overcome
any potential performance bottlenecks at the application layer – be it CPU power or memory access
– through elegant resource parallelization and optimized load-sharing.
In all accuracy, it should be pointed out that the NPC card itself also hosts two identical Application
Processing CPU subsystems, however for the purpose of this white paper a purposely abstracted
functional system architecture is used for clarity. Suffice to say that this simply results in additional
CPU horsepower and memory bandwidth residing in a 2U Bivio 2000 system populated with an NPC
and APC card.
Bivio Software ArchitectureThe software architecture and its contribution to system performance is a topic discussed in the Bivio
2000 Network Appliance white paper. But any discussion about Bivio 2000 system performance has
to highlight the fact that, wherever network services are implemented as a software application –as
is increasingly the case in networking- it is myopic to focus exclusively on the hardware architecture
and its performance while taking optimal software behavior as a given. While software engineering
has always been a critical tool for network performance, as – for example – any IP routing table
convergence discussion shows, with critical network services increasingly being implemented in
software it is going to become of paramount importance. In the Bivio 2000, the following software
topics are of particular relevance:
• Linux network drivers: Bivio has developed network drivers designed to accelerate a variety of
network appliance system primitives. The API (application program interface) allows application
developers to take full advantage of the Bivio system performance benefits. However, it must
be pointed out that standard Linux applications experience acceleration even without taking
advantage of specific Bivio API features.
• Linux application: The software application implementing the network service itself has to
be designed with performance in mind. While Bivio system performance is consistently high
irrespective of the path packets take through the system, the single most effective way to
dramatically accelerate any application is to take advantage of the network forwarding cut-through
path in the NPC as often as possible. That way, the Bivio 2000 will approximate “intelligent wire”
performance ideal.
10G
SSI
10G
AP2 AP4
AP1 AP3
2G 2G
2G
2G 2G
Network ServiceApplication Control
Linux ExecutionEnvironment
CPU
Controller
DDR4
00M
emor
y
PCI-X
Exte
nsio
n
Figure 4.
APC Subsystem Overivew
�
Bivio 2000 System Performance BenchmarksFor any network system architecture, the basic test configuration is outlined in Figure 5. The offered
load into the ingress interfaces is gradually
increased, and the maximum forwarding rate
is established as soon as the system discards
traffic. When testing a network appliance, one
must choose challenging real world traffic
patterns and system configurations that are
truly representative of the environment the
system ultimately targets. But while many
specialized network system architectures can
afford to target very specific performance
criteria, a platform as universal as the Bivio
2000 network appliance must provide the
architectural flexibility to deliver deterministic
system performance for a wide variety
of potential applications and network
environments.
Previous sections established that the Bivio 2000 application processing plane gracefully scales
to eliminate processing power and memory bandwidth bottlenecks. Anyone would probably
agree that over 256 APCs providing up to over 3 million MIPS and virtually unlimited memory
bandwidth represent a very powerful computing grid capable of taking on any network application
at multiples of any existing interface speed. Therefore – as Figure 5 shows – when discussing Bivio
2000 performance, in most cases it is permissible to idealize the Application Processing subsystem
as a system path that always provides a predictable 4.2 Gbps pipe, albeit one with variable latency
depending on the application’s processing complexity. That means the entire maximum forwarding
rate of 4.2 Gbps that the NIMs can offer can be sent to the Application Processing layer without
any internal system bandwidth bottlenecks, and subsequently be received back by the Network
Processing layer and forwarded to the egress NIMs. The following sections provide throughput and
latency data for traffic traversing the Bivio 2000 through the APC subsystem path unless specifically
mentioned otherwise.
Bivio 2000 Maximum ThroughputAs the preceding architectural discussion showed,
the Bivio 2000 should and indeed does provide
a sustained maximum system throughput rate
of 4.2 Gbps per 2U stand-alone system for real
world traffic patterns. The Bivio 2000 supports
2.1 Gbps per NIM slot with currently available
NIMs. Figure 6 illustrates the linear, deterministic
throughput behavior of the Bivio 2000 system.
Since the maximum offered load (MOL) that the 8
GE interfaces can provide can reach up to 8 Gbps
of aggregate throughput, application “goodput”
starts to linearly decrease when 4.2 Gbps of
offered load get exceeded. At 8 Gbps MOL, system
goodput decreases to 52.5%, and the remaining
47.5% of traffic is dropped. In summary, real
world traffic patterns – which include correlated
traffic streams, and not just statistical Bernoulli traffic patterns – are supported at a fully deterministic
aggregate throughput rate of 4.2 Gbps.
The robust CoS and QoS mechanisms implemented by the Bivio 2000 give system architects the
tools to assign scheduling priorities on a per class or per flow basis. That way, the available bandwidth
1 2 3
8 x 1GE Tester
Rx 1
Rx 2
Rx 3
Rx 4
Rx 5
Rx 6
Rx 7
Rx 8
4 5 6 7 8
Tx 1
Tx 2
Tx 3
Tx 4
Tx 5
Tx 6
Tx 7
Tx 8
ApplicationProcessing
NIM 1
NIM 2
NIM 1
NIM 2
Simplified Performance Model
NetworkProcessing
2.1G
4.2G
2.1G
2.1G
2.1G
4.2G
Figure 5.
Benchmarking
80%
60%
40%
20%
0%
100%
80%
60%
40%
1 2 3 4 5
Offered Load in Gbps
LossGoodput
�
pool can be allocated in a controlled way, and protecting certain traffic classes against loss within the
Bivio 2000 is a straight forward exercise in CoS parameter definition.
The Bivio 2000 tolerates short traffic bursts that can significantly exceed 4.2 Gbps, but it would be
misguiding to quote figures higher than that as the sustainable throughput rate for the Bivio 2000,
despite lax standards in that regard being somewhat commonplace within the networking industry.
NPC scaling introduces potential blocking behavior when more than four active NPCs are active in the
stack. With two NPCs active in a stack, the aggregate throughput of the system doubles to 8.4 Gbps,
since the SSI easily maintains its truly non-blocking switching characteristic. Stacks containing three
or four active NPCs can display marginal blocking behavior with typical real world traffic distributions,
even though under perfectly even traffic distribution among four systems the system is still non-
blocking in theory. Only stacks hosting more than four active NPCs are blocking. In summary, NPC
scaling enables system configurations of up to 16 Gbps of combined real world aggregate throughput.
Bivio 2000 LatencyEarly in the discussion of the Bivio 2000
architecture, the system philosophy of striving to
behave as an “intelligent wire” was mentioned.
System latency obviously is the system
performance parameter that will benefit the most
from such a design philosophy, and indeed the
Bivio 2000 displays very deterministic and thus
class-leading latency behavior.
It is important to note that, since not all
applications will benefit from low system latency,
this is one parameter – along with loss – that is
typically relevant on a per CoS basis. Figure 7 is
best understood as the typical range of system
latency observed as real world traffic traverses the
Bivio architecture, since without CoS configuration
traffic will tend to experience an average latency within the priority and best effort traffic results.
On a single 2U system, even at loads approximating 4 Gbps of typical real world traffic while running
a computing intensive application –such as intrusion prevention-, the highest average system
latency will not exceed approximately 200 microseconds. In fact, most tests show the Bivio 2000 to
consistently operate with an average system latency of 60 microseconds even under high loads. It
should be re-emphasized this represents the end to end system latency, from ingress to egress GE
port on the NIM, and through the NPC to the SSI and APC.
For the cut-through path, much lower latencies – down to single digit microsecond values – can be
guaranteed for applications that demand low and strictly deterministic latency behavior.
SummaryThe Bivio 2000 product family delivers 4.2 Gbps of aggregate throughput, an average system latency
of about 60 microseconds, and unlimited application processing power in a fully programmable
architecture. Furthermore, the Bivio architecture delivers on utterly deterministic, yet fully configurable,
system behavior in order to support the strict CoS and QoS criteria that also characterize many
emerging network applications.
The Bivio 2000 is the first and only deep packet handling appliance to satisfy the multi-layered set of
performance requirements of emerging network applications. These demand maximum throughput
and minimum latency combined with scalable application processing performance and complete
architectural programmability; and the Bivio 2000 has been designed from the ground up with precisely
these objectives in mind.
250
200
150
100
50
100
80
60
40
20
1 2 3 4 5
Offered Load in Gbps
Best EffortAverage Latency in µs
Priority TrafficAverage Latency in µs Figure 6.
Throughput and Loss
�
Bivio Networks, Inc.
4457 Willow Road, Suite 200
Pleasanton, California 94588
Phone: 925-924-8600
Fax: 925-924-8650
www.bivio.net
About Bivio Networks
Bivio Networks has developed an award-winning, deep-packet
inspection and processing platform that combines unparalleled
scaling of network performance, processing power, and application
agility. Bivio’s network appliance platform features a groundbreaking
architecture specifically optimized for wire-speed execution of
emerging network services that increasingly demand deep packet
processing combined with high network throughput. Based on
open industry standards, Bivio Networks fuses unmatched flexibility
with uncompromising performance to enable its customers to
overcome existing bottlenecks and deliver the foundation of the
next generation network infrastructure.
Copyright ©2001-2006 Bivio Networks, Inc. Uncompromising performance. Unmatched flexibility.