Upload
cody-blake
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
High PerformanceUser-Level Sockets over
Gigabit Ethernet
Pavan BalajiOhio State University
Piyush ShivamOhio State University
D.K. PandaOhio State University
Pete WyckoffOhio Supercomputer Center
Presentation Overview
Background and Motivation
Design Challenges
Performance Enhancement Techniques
Performance Results
Conclusions
Background and Motivation
SocketsFrequently used APITraditional Kernel-Based Implementation
Unable to exploit High Performance NetworksEarlier Solutions
Interrupt Coalescing Checksum Offload Insufficient
It gets worse with 10 Gigabit NetworksCan we do better
User-level support
Kernel Based Implementation of Sockets
NIC
IP
TCP
Sockets
Application or Library
Hardware
Kernel
User Space Pros• High Compatibility
Cons• Kernel Context Switches• Multiple Copies• CPU Resources
Alternative Implementations of Sockets (GigaNet cLAN)
“VI aware” NIC
IP
TCP
Sockets
Application or Library
Hardware
Kernel
User Space Pros• High Compatibility
Cons• Kernel Context Switches• Multiple Copies• CPU Resources
IP-to-VI layer
Sockets over User-Level Protocols
Sockets is a generalized protocol Sockets over VIA
Developed by Intel Corporation [shah98] and ET Research Institute [sovia01]
GigaNet cLAN platform
Most networks in the world are Ethernet Gigabit Ethernet
Backward compatible Gigabit Network over the existing installation base MVIA: Version of VIA on Gigabit Ethernet
Kernel Based A need for a High Performance Sockets layer over
Gigabit Ethernet
User-Level Protocol over Gigabit Ethernet
Ethernet Message Passing (EMP) Protocol Zero-Copy OS-Bypass NIC-driven User-Level
protocol over Gigabit Ethernet Developed over the Dual-processor Alteon NICs Complete Offload of message passing functionality to
the NIC
• Piyush Shivam, Pete Wyckoff, D.K. Panda, “EMP: Zero-Copy OS-bypass NIC-driven Gigabit Ethernet Message Passing”, Supercomputing, November ’01
• Piyush Shivam, Pete Wyckoff, D.K. Panda, “Can User-Level Protocols take advantage of Multi-CPU NICs?”, IPDPS, April ‘02
EMP: Latency
0
50
100
150
200
250
4 8 16 32 64 128 256 512 1K 2K 4K
Message Size (bytes)
Late
ncy
(us)
TCP
EMP
A base latency of 28s compared to an ~120 s of TCP for 4-byte messages
EMP: Bandwidth
0
200
400
600
800
1000
1200
4 8 16 32 64 128 256 512 1K 2K 4K 8K
Message Size (bytes)
Ban
dw
idth
(M
bp
s)
EMP
TCP
Saturated the Gigabit Ethernet network with a peak bandwidth of 964Mbps
Proposed Solution
Gigabit Ethernet NIC
Sockets over EMP
Application or Library
Hardware
Kernel
User Space
• Kernel Context Switches• Multiple Copies• CPU Resources• High Performance
OS Agent
EMP Library
Presentation Overview
Background and Motivation
Design Challenges
Performance Enhancement Techniques
Performance Results
Conclusions
Design Challenges
Functionality Mismatches
Connection Management
Message Passing
Resource Management
UNIX Sockets
Functionality Mismatches and Connection Management
Functionality Mismatches
No API for buffer advertising in TCP
Connection Management
Data Message Exchange
Descriptors required for connection
management
Message Passing
Message PassingData Streaming
Parts of the same message can be read potentially to different buffers
Unexpected Message ArrivalsSeparate Communication Thread
• Keeps track of used descriptors and re-posts• Polling Threads have high Synchronization cost• Sleeping Threads involve OS scheduling granularity
Rendezvous ApproachEager with Flow Control
Rendezvous Approach
Sender Receiver
SQ
RQ
SQ
RQ
send()
receive()
Request
ACK
Data
Eager with Flow Control
Sender Receiver
SQ
RQ
SQ
RQ
send()
Data
ACK
Data
receive()
Resource Management and UNIX Sockets
Resource ManagementClean up unused descriptors (connection
management)Free registered memory
UNIX SocketsFunction OverridingApplication ChangesFile Descriptor Tracking
Presentation Overview
Background and Motivation
Design Challenges
Performance Enhancement Techniques
Performance Results
Conclusions
Performance Enhancement Techniques
Credit Based Flow Control
Disabling Data Streaming
Delayed Acknowledgments
EMP Unexpected Queue
Credit Based Flow Control
Sender Receiver
SQ
RQ
SQ
RQCredits Left: 4Credits Left: 3Credits Left: 2Credits Left: 1Credits Left: 0Credits Left: 4
• Multiple Outstanding Credits
Non-Data Streaming and Delayed Acknowledgments
Disabling Data Streaming Intermediate copy required for Data Streaming Place data directly into user buffer
Delayed Acknowledgments Increase in Bandwidth
Lesser Network Traffic NIC has lesser work to do
Decrease in Latency Lesser descriptors posted Lesser Tag Matching at the NIC
550ns per descriptor
EMP Unexpected Queue
EMP Unexpected QueueEMP features unexpected message queue
Advantages: Last to be checkedDisadvantage: Data Copy
Acknowledgments in the Unexpected Queue
No copy, since acknowledgments carry no dataAcknowledgments pushed out of the critical
path
Presentation Overview
Background and Motivation
Design Challenges
Performance Enhancement Techniques
Performance Results
Conclusions
Performance Results
Micro-benchmarksLatency (ping-pong)Bandwidth
FTP ApplicationWeb Server
HTTP/1.0 SpecificationsHTTP/1.1 Specifications
Experimental Test-bed
Four Pentium III 700Mhz Quads
1GB Main Memory
Alteon NICs
Packet Engine Switch
Linux version 2.4.18
Micro-benchmarks: Latency
0
50
100
150
200
250
Message Size (bytes)
Lat
ency
(u
s) TCP
Data Streaming
Non-Data Streaming
EMP
Up to 4 times improvement compared to TCP
Overhead of 0.5us compared to EMP
Micro-benchmarks: Bandwidth
0
100
200
300
400
500
600
700
800
9004 8 16 32 64 128
256
512
1K 2K 4K 8K 16K
32K
64K
128
Message Size (bytes)
Ban
dw
idth
(M
bp
s)
Data Streaming
Non-Data Streaming
TCP
Enhanced TCP
An improvement of 53% compared to enhanced TCP
FTP Application
0
2
4
6
8
10
12
Tra
nsf
er T
ime
(sec
s)
1 4 16 64 256
File Size (Mbytes)
Data Streaming
Non-Data Streaming
TCP
Up to 2 times improvement compared to TCP
Web Server (HTTP/1.0)
0
2000
4000
6000
8000
10000
12000
Response Size (bytes)
Tra
nsa
ctio
ns
per
sec
on
d
TCP
Data Streaming
Non-Data Streaming
Up to 6 times improvement compared to TCP
Web Server (HTTP/1.1)
0
2000
40006000
8000
10000
1200014000
16000
18000
Response Size (bytes)
Tra
nsa
ctio
ns
per
sec
on
d
TCP
Data Streaming
Non-Data Streaming
Up to 3 times improvement compared to TCP
Conclusions
Developed a High Performance User-Level Sockets implementation over Gigabit Ethernet
Latency close to base EMP (28 s) 28.5 s for Non-Data Streaming 37 s for Data Streaming sockets 4 times improvement in latency compared to TCP
Peak Bandwidth of 840Mbps 550Mbps obtained by TCP with increased Registered space
for the kernel (up to 2MB) Default case is 340Mbps with 32KB Improvement of 53%
Conclusions (contd.)
FTP Application shows an improvement of nearly 2 times
Web Server shows tremendous performance improvement HTTP/1.0 shows an improvement of up to 6 times HTTP/1.1 shows an improvement of up to 3 times
Future WorkDynamic Credit Allocation
NIC: The trusted component Integrated QoS
Currently on Myrinet ClustersCommercial applications in the Data
Center environmentExtend the idea to next generation
interconnects InfiniBand 10 Gigabit Ethernet
For more information, please visit the
http://nowlab.cis.ohio-state.edu
Network Based Computing Laboratory,
The Ohio State University
Thank You
NBC Home Page