3
HyperGen: High-Performance Flexible Packet Generator Using Programmable Switching ASIC Zhaowei Xi, Yu Zhou, Dai Zhang, Jinqiu Wang, Sun Chen, Yangyang Wang, Xinrui Li, HaoMing Wang, Jianping Wu * Tsinghua University CCS CONCEPTS Networks Programmable networks; KEYWORDS Programmable Switching ASIC, P4, Packet Generator ACM Reference format: Zhaowei Xi, Yu Zhou, Dai Zhang, Jinqiu Wang, Sun Chen, Yangyang Wang, Xinrui Li, HaoMing Wang, Jianping Wu. 2019. HyperGen: High-Performance Flexible Packet Generator Using Programmable Switching ASIC . In Pro- ceedings of SIGCOMM ’19: ACM SIGCOMM 2019 Conference, Beijing, China, August 19–23, 2019 (SIGCOMM ’19), 3 pages. https://doi.org/10.1145/3342280. 3342301 1 INTRODUCTION Packet generators are vital in network researches and network operations. For network researchers, packet generators serve as essential tools to test the performance of researching prototypes. Network operators can also use packet generators to produce test trac for latency measurement and failure troubleshooting [6]. Meanwhile, the development of today’s network raises demands on packet generators from two perspectives. One is the ability of high performance trac generation to meet the increasing network bandwidth (from 10GbE to 100GbE), the other is the ability of exible packet customization with given properties to meet the constant appearance of new protocols and functions. Existing packet generators can be categorized into hardware approaches and software approaches. Commodity packet genera- tors based on proprietary hardware can achieve high rate packet generation up to above 1Tbps per device. Nevertheless, hardware packet generators [1] have limited exibility. For instance, it is hard for users to customize proprietary hardware to test new protocols or functions. Besides, hardware packet generators are typically ex- pensive, and a two 10GbE port packet generation module can cost as much as $25,000 [2]. Software approaches [5], on the contrary, enable users to customize packet generation logic, yielding high exibility. However, software approaches have to make an undesir- able trade-o between performance and cost. A 8-core commodity server can only achieve less than 100Gbps [5]. If operators have * This work is supported by National Key R&D Program of China (2017YFB0801701) and the National Science Foundation of China (No.61872426). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. SIGCOMM ’19, Beijing, China ACM ISBN 978-1-4503-6886-5/19/08. . . $15.00 https://doi.org/10.1145/3342280.3342301 Table 1: Comparison of current approaches. Metrics (1Tbps) Device Required Flexibility Cost Hardware 1 limited $100,000 Software 10 high $30,000 HyperGen 1 high $3,000 higher requirements on generation rates, more servers are needed, which makes the cost rise rapidly. Motivated by the limitations of existing approaches, we present HyperGen, a high-performance, exible, and low-cost packet gener- ator using programmable switching ASIC in this paper. HyperGen can produce above 1Tbps trac, and supports rate control with high accuracy and high precision. Users can exibly recongure HyperGen to execute various tasks like throughput testing, latency measurement, denial-of-service attack emulation and so on. Hyper- Gen can be implemented in a single programmable switch whose cost per Tbps is much lower than the proprietary hardware and mid-range server. Table 1 summarizes the comparison between HyperGen and existing approaches. The design of HyperGen is non-trivial, and the main challenge is that there is no straight way to generate high rate, task-specic trac in programmable switching ASICs that are limited in pro- grammablity and resources. We solve the challenge from two per- spectives. Firstly, we present template-based packet generation and take advantage of switch CPU to improve the exibility of the switching ASIC. More concretely, we use switch CPU to generate the control logic and template packets with predened properties (e.g., packet size and payload), which are hard for the switching ASIC to customize. Secondly, we present a new pipeline design to improve the capacity of packet generation in the switching ASIC. The pipeline performs a series of operations sequentially includ- ing acceleration, replication and edition on template packets to generate testing packets for various tasks. 2 DESIGN Figures 1 shows the architecture overview of HyperGen. The work- ow can be described into two steps: (1) Compiling tasks to generate testing packets. HyperGen compiles testing tasks into three elements: switch congurations (e.g., port settings and table entries), template packets, and a P4[4] program. It is hard for the switching ASIC to customize some prop- erties (e.g., packet size and payload). Therefore, instead of directly generating packets in the switching ASIC, we present template- based packet generation that generate template packets with pre- dened properties in switch CPU, and forward template packets to the switching ASIC for further processing. The P4 program denes the packet generation logic in the switching ASIC, which can be

HyperGen: High-Performance Flexible Packet …netarchlab.tsinghua.edu.cn/~chensun/papers/hypergen-sig...The P4 program de˙nes the packet generation logic in the switching ASIC, which

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HyperGen: High-Performance Flexible Packet …netarchlab.tsinghua.edu.cn/~chensun/papers/hypergen-sig...The P4 program de˙nes the packet generation logic in the switching ASIC, which

HyperGen: High-Performance Flexible Packet Generator UsingProgrammable Switching ASIC

Zhaowei Xi, Yu Zhou, Dai Zhang, Jinqiu Wang, Sun Chen, Yangyang Wang, Xinrui Li,HaoMing Wang, Jianping Wu∗

Tsinghua University

CCS CONCEPTS• Networks → Programmable networks;

KEYWORDSProgrammable Switching ASIC, P4, Packet GeneratorACM Reference format:Zhaowei Xi, Yu Zhou, Dai Zhang, Jinqiu Wang, Sun Chen, Yangyang Wang,Xinrui Li, HaoMing Wang, Jianping Wu. 2019. HyperGen: High-PerformanceFlexible Packet Generator Using Programmable Switching ASIC . In Pro-ceedings of SIGCOMM ’19: ACM SIGCOMM 2019 Conference, Beijing, China,August 19–23, 2019 (SIGCOMM ’19), 3 pages. https://doi.org/10.1145/3342280.3342301

1 INTRODUCTIONPacket generators are vital in network researches and networkoperations. For network researchers, packet generators serve asessential tools to test the performance of researching prototypes.Network operators can also use packet generators to produce testtra�c for latency measurement and failure troubleshooting [6].Meanwhile, the development of today’s network raises demandson packet generators from two perspectives. One is the ability ofhigh performance tra�c generation to meet the increasing networkbandwidth (from 10GbE to 100GbE), the other is the ability of�exible packet customization with given properties to meet theconstant appearance of new protocols and functions.

Existing packet generators can be categorized into hardwareapproaches and software approaches. Commodity packet genera-tors based on proprietary hardware can achieve high rate packetgeneration up to above 1Tbps per device. Nevertheless, hardwarepacket generators [1] have limited �exibility. For instance, it is hardfor users to customize proprietary hardware to test new protocolsor functions. Besides, hardware packet generators are typically ex-pensive, and a two 10GbE port packet generation module can costas much as $25,000 [2]. Software approaches [5], on the contrary,enable users to customize packet generation logic, yielding high�exibility. However, software approaches have to make an undesir-able trade-o� between performance and cost. A 8-core commodityserver can only achieve less than 100Gbps [5]. If operators have∗This work is supported by National Key R&D Program of China (2017YFB0801701)and the National Science Foundation of China (No.61872426).

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci�c permission and/or afee. Request permissions from [email protected] ’19, Beijing, ChinaACM ISBN 978-1-4503-6886-5/19/08. . . $15.00https://doi.org/10.1145/3342280.3342301

Table 1: Comparison of current approaches.

Metrics (1Tbps) Device Required Flexibility Cost

Hardware 1 limited $100,000Software ≈ 10 high $30,000

HyperGen 1 high $3,000

higher requirements on generation rates, more servers are needed,which makes the cost rise rapidly.

Motivated by the limitations of existing approaches, we presentHyperGen, a high-performance, �exible, and low-cost packet gener-ator using programmable switching ASIC in this paper. HyperGencan produce above 1Tbps tra�c, and supports rate control withhigh accuracy and high precision. Users can �exibly recon�gureHyperGen to execute various tasks like throughput testing, latencymeasurement, denial-of-service attack emulation and so on. Hyper-Gen can be implemented in a single programmable switch whosecost per Tbps is much lower than the proprietary hardware andmid-range server. Table 1 summarizes the comparison betweenHyperGen and existing approaches.

The design of HyperGen is non-trivial, and the main challengeis that there is no straight way to generate high rate, task-speci�ctra�c in programmable switching ASICs that are limited in pro-grammablity and resources. We solve the challenge from two per-spectives. Firstly, we present template-based packet generationand take advantage of switch CPU to improve the �exibility of theswitching ASIC. More concretely, we use switch CPU to generatethe control logic and template packets with prede�ned properties(e.g., packet size and payload), which are hard for the switchingASIC to customize. Secondly, we present a new pipeline design toimprove the capacity of packet generation in the switching ASIC.The pipeline performs a series of operations sequentially includ-ing acceleration, replication and edition on template packets togenerate testing packets for various tasks.

2 DESIGNFigures 1 shows the architecture overview of HyperGen. The work-�ow can be described into two steps:(1) Compiling tasks to generate testing packets. HyperGencompiles testing tasks into three elements: switch con�gurations(e.g., port settings and table entries), template packets, and a P4[4]program. It is hard for the switching ASIC to customize some prop-erties (e.g., packet size and payload). Therefore, instead of directlygenerating packets in the switching ASIC, we present template-based packet generation that generate template packets with pre-de�ned properties in switch CPU, and forward template packets tothe switching ASIC for further processing. The P4 program de�nesthe packet generation logic in the switching ASIC, which can be

Page 2: HyperGen: High-Performance Flexible Packet …netarchlab.tsinghua.edu.cn/~chensun/papers/hypergen-sig...The P4 program de˙nes the packet generation logic in the switching ASIC, which

SIGCOMM ’19, August 19–23, 2019, Beijing, China Z. Xi et al.

DUT

up to100Gbpsper link

Packet Generation Pipeline

Compile Collect

P4 programs, configurations,template packets

Testing statistics

Switch CPU

Switching ASIC AnalyzeTask-oriented data

Testing Task 1

Testing Task 2

Testing Task N

(1) (2)

Figure 1: Architecture overview of HyperGen.

Design: Overview

AcceleratorSwitch

CPU

PCIe

… Editor …

Template packets in <10Mbps Template packets in 100Gbps

Generated packets identical to template packets in given rate

Generated packets with configured properties in given rate

Replicator

Figure 2: Design of Packet Generation Pipeline.

described as the packet generation pipeline. §2.1 presents the designof the pipeline.(2) Testing statistics analysis and collection. HyperGen cananalyze and record task-oriented data while generating packets andreceiving packets from devices under test (DUT). Recorded datacan be per-packet data like a particular �eld value, or statisticalproperties like packet count, throughput, delay and so on. SwitchCPU could attain aforementioned data via pulling from data planecounters or receiving digests pushed by the switching ASIC. Then,switch CPU will feed back testing statistics to testing tasks.

2.1 Packet Generation PipelinePacket generation pipeline consists of three components that exe-cute operations sequentially on template packets. Figure 2 showsthe pipeline design. We will present the accelerator and replicatorin detail in the following parts, since the editor’s main functionis to modify generated packets from replicator to produce spe-ci�c tra�c(e.g., choosing forwarding port, using speci�c �elds forstate storage), and all operations needed are well supported byP4-programmable hardware.Accelerator. Using the accelerator, we can accelerate templatepackets from switch CPU to 100GbE full line rate in negligible time.The accelerator forwards template packets into a loop via recircula-tion, which is a general primitive supported by P4-programmablehardware. Our experiments testify that To�no [3] can recirculatepackets at a speed of no less than 100Gbps, and round trip time of 64-byte template packets is about 600ns, which means the maxmiumnumber of template packets that one loop-back port can acceleratesequentially is nearly one hundred and increasing loop-back portslinearly improves the maxmium number. Through recirculation, theaccelerator act as a stable high rate packet source for the replicator.Replicator. Taking the looping template packets in the acceleratoras input, the replicator performs conditional packet replication atgiven rates. The replicator implements rate control via adjusting

6 4 9 6 1 2 8 1 6 0 1 9 2 2 2 4 2 5 60

5 0

1 0 0

1 5 0

Throu

ghpu

t (Mpp

s)

P a c k e t S i z e ( b y t e s )

H G - 1 0 0 G b p s H G - 4 0 G b p s M G - 4 0 G b p s - 1 C o r e M G - 4 0 G b p s - 2 C o r e s

(a) Single-port rate

6 4 9 6 1 2 8 1 6 0 1 9 2 2 2 4 2 5 601 0 02 0 03 0 04 0 05 0 06 0 0

Throu

ghpu

t (Mpp

s)

P a c k e t S i z e ( b y t e s )

H G - 2 P o r t s M G - 2 P o r t s H G - 3 P o r t s M G - 4 o r t s H G - 4 P o r t s M G - 8 P o r t s

(b) Multi-port rate

Figure 3: Packet generation rate in Mpps.

1 0 1 0 0 1 0 0 0 1 0 0 0 00 . 0

0 . 1

0 . 2

0 . 3

Frequ

ency

I n t e r - d e p a r t u r e T i m e ( n s )

H G : 0 . 1 M p p s 1 M p p s 1 0 M p p s M G : 0 . 1 M p p s 1 M p p s 1 0 M p p s

(a) Inter-departure time distribution

6 4 1 2 8 2 5 6 5 1 2 1 0 2 4 1 2 8 00

5 0 0

1 0 0 0

1 5 0 0

Inter-

depart

ure Ti

me (n

s)

P a c k e t S i z e ( b y t e s )

H G M e a n M G M e a n H G R M S E M G R M S E

01 0 02 0 03 0 04 0 0

RMSE

(ns)

(b) Precision

Figure 4: Accuracy and precision comparison.

inter-departure time. More concretely, the replicator maintains ananosecond-level periodic timer and performs replication as soonas the timer expires. Through changing the timer threshold, thereplicator manages to adjust inter-departure time of replication.The replicator implements replication on template packets throughmulticast, which is a general primitive supported by switching ASICand can replicate packets to multiple ports simultaneously.

3 EVALUATIONWe implement a prototype of HyperGen (HG) on Wedge 100BF32Xequipped with To�no and 32 100GbE ports. We use 1 port for recir-culation and 4 ports for multicasting in our testbed for the considera-tion of reserving ports for other use in practical circumstance. Moreports for recirculation increases the maxmium supported number oftemplate packets and more ports for multicasting guarantee higherperformance. We use a server with 64GB RAM and 8 2.10GHz CPUcores to run MoonGen (MG) [5] as the countermeasure. We evaluateHyperGen with the following two objectives.Packet generation rate. Figure 3 shows the experiment results.We compare HyperGen and MoonGen’s rate under di�erent packetsizes using single-port (Figure 3(a)) and muti-port con�gurations(Figure 3(b)). As the results show, in our testbed HyperGen canachieve 4-port 100GbE full line rate, which is higher than MoonGen.Accuracy and precision. We con�gure HyperGen and MoonGento generate 64-byte packets in di�erent rates. HyperGen keepsaccurate at all rates, while MoonGen becomes inaccurate whenrate grows high (Figure 4(a)). Besides, we test the average inter-departure time of di�erent packet sizes at a rate of 1Mpps. Hyper-Gen has better accuracy when packet size is relatively small, andhas much lower Root Mean Squared Error(RMSE) (Figure 4(b)).

4 FUTUREWORKOur future work will focus on further analyzing generated packetsand building a general network testing system. Besides, developinga convenient way to specify network testing intents of operators isanother important work we keep studying.

Page 3: HyperGen: High-Performance Flexible Packet …netarchlab.tsinghua.edu.cn/~chensun/papers/hypergen-sig...The P4 program de˙nes the packet generation logic in the switching ASIC, which

HyperGen SIGCOMM ’19, August 19–23, 2019, Beijing, China

REFERENCES[1] A Keysight Business. 2019. IXIA test modules. Website. (2019). https://www.

ixiacom.com/products.[2] Gianni Antichi, Muhammad Shahbaz, Yilong Geng, Noa Zilberman, Adam Cov-

ington, Marc Bruyere, Nick McKeown, Nick Feamster, Bob Felderman, MichaelaBlott, and others. 2014. OSNT: Open source network tester. IEEE NetworkMagazine 28, 5 (2014), 6–12.

[3] Barefoot Networks. 2019. Barefoot To�no Switch. Website. (2019). https://barefootnetworks.com/technology/.

[4] Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, JenniferRexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, andothers. 2014. P4: Programming protocol-independent packet processors. ACMSIGCOMM Computer Communication Review 44, 3 (2014), 87–95.

[5] Paul Emmerich, Sebastian Gallenmüller, Daniel Raumer, Florian Wohlfart, andGeorg Carle. 2015. Moongen: A scriptable high-speed packet generator. InProceedings of the 2015 Internet Measurement Conference. ACM, 275–287.

[6] Chuanxiong Guo, Lihua Yuan, Dong Xiang, Yingnong Dang, Ray Huang, DaveMaltz, Zhaoyi Liu, Vin Wang, Bin Pang, Hua Chen, and others. 2015. Pingmesh:A large-scale system for data center network latency measurement and analysis.ACM SIGCOMM Computer Communication Review 45, 4 (2015), 139–152.