Upload
arella
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Disk-to-tape performance tuning. CASTOR workshop 28-30 November 2012 Eric Cano on behalf of CERN IT-DSS group. Network contention issue. Not all disk servers at CERN have 10Gb/s interfaces (yet) Output on NIC in disk servers is a contention point - PowerPoint PPT Presentation
Citation preview
Data & Storage Services
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
DSS
Disk-to-tape performance tuning
CASTOR workshop
28-30 November 2012
Eric Cano
on behalf of CERN IT-DSS group
Data & Storage Services
Network contention issue
• Not all disk servers at CERN have 10Gb/s interfaces (yet)
• Output on NIC in disk servers is a contention point• Tape servers equally compete with other streams• Tape write speed now fine with buffered tape marks,
yet…• A tape server’s share can drop below 1MB/s
– 100s of simultaneous connection on the same disk server
• With data taking, this can lead to tape server starvation, spreading on all castor instances
Castor Workshop 28-30 Nov 2012 2 Disk-to-tape performance tuning
Data & Storage Services
First solution: software level
• We turned on scheduling, which allowed capping the number of clients per disk server to a few 10s
• Cannot go lower as a client can be slow as well, and we want to keep the disk server busy (from bandwidth starvation to transfer slot starvation)
• We need a bandwidth budgeting system tolerating a high number of sessions, yet reserving bandwidth to tape servers
Castor Workshop 28-30 Nov 2012 3 Disk-to-tape performance tuning
Data & Storage Services
Second solution: system level
• Using Linux kernel traffic control• Classify outbound traffic in disk servers between
favoured (tape servers) and background (the rest)• Still in test environement• The tools:
– tc (qdisc, class, filter)– ethtool (-k, -K)
• Some technicalities:– with tcp segmentation offload kernel sees too big packets,
fails to shape traffic
Castor Workshop 28-30 Nov 2012 4 Disk-to-tape performance tuning
Data & Storage Services
Shaping details
• 3 priority queues by default:– Interactive, best effort, bulk
• Retain the mechanism (using tc qdisc prio)• Within the best effort queue, classify and prioritize outbound
traffic (filter)– Tape servers, but also– ACK packets, helping incoming traffic (all big streams are one-way)– ssh, preventing non-interactive ssh (wassh) from timing out
• Token bucket filter (tbf) and hierarchical token bucket (htb) did not give expected result
• Using class based queuing (cbq)• Keep 90/10 mixing between low and high priority classes to
keep all connections alive
Castor Workshop 28-30 Nov 2012 5 Disk-to-tape performance tuning
Data & Storage Services
#!/bin/bash
# Turn off TCP segmentation offload: kernel sees the details of the packets routing
/sbin/ethtool -K eth0 tso off
# Flush the existing rules (gives an error when there are none)
tc qdisc del dev eth0 root 2> /dev/null
# Duplication of the default kernel behavior
tc qdisc add dev eth0 parent root handle 10: prio bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
# Creation of the class based queuing
tc qdisc add dev eth0 parent 10:1 handle 101: cbq bandwidth 1gbit avpkt 1500
tc class add dev eth0 parent 101: classid 101:10 cbq weight 90 split 101: defmap 0 bandwidth 1gbit \
prio 1 rate 900mbit maxburst 20 minburst 10 avpkt 1500
tc class add dev eth0 parent 101: classid 101:20 cbq weight 10 split 101: defmap ff bandwidth 1gbit\
prio 1 rate 100mbit maxburst 20 minburst 10 avpkt 1500
# Prioritize ACK packets
tc filter add dev eth0 parent 101: protocol ip prio 10 u32 match ip protocol 6 0xff\
match u8 0x05 0x0f at 0 match u16 0x0000 0xffc0 at 2 match u8 0x10 0xff at 33\
flowid 101:10
# Prioritize SSH packets
tc filter add dev eth0 parent 101: protocol ip prio 10 u32 match ip sport 22 0xffff flowid 101:10
# Prioritize network ranges of tape servers
tc filter add dev eth0 parent 101: protocol ip prio 10 u32 match ip dst <Network1>/<bits1> flowid 101:10
tc filter add dev eth0 parent 101: protocol ip prio 10 u32 match ip dst <Network2>/<bits2> flowid 101:10
<etc..>
The gory details
Castor Workshop 28-30 Nov 2012 6 Disk-to-tape performance tuning
5-Filtering of privileged traffic overrides default for 101: traffic2-Best effort (=1) FIFO replaced by CBQ3-Two classes share priority, but with different bandwidth allocation1-Packets sorted in usual FIFOs4-Default class is 101:20
Data & Storage Services
0 1 2 5 10 20 500
20000000
40000000
60000000
80000000
100000000
120000000
140000000
Traffic control, TCP segmentation offload on
Client bandwidth (no client)
Background bandwidth (no client)
Client bandwidth (1 client)
Background bandwidth (1 client)
Clients bandwidth (2 clients)
Background bandwidth (2 clients)
Clients bandwidth (3 clients)
Background bandwidth (3 clients)
Number of background streams
Ban
dw
idth
(b
ytes
/s)
Traffic control results
0 1 2 5 10 20 500
20000000
40000000
60000000
80000000
100000000
120000000
140000000
No traffic control
Client bandwidth (no client)
Background bandwidth (no client)
Client bandwidth (1 client)
Background bandwidth (1 client)
Clients bandwidth (2 clients)
Background bandwidth (2 clients)
Clients bandwidth (3 clients)
Background bandwidth (3 clients)
Number of background streams
Ban
dw
idth
(b
ytes
/s)
Castor Workshop 28-30 Nov 2012 7 Disk-to-tape performance tuning
Data & Storage Services
Traffic control results
0 1 2 5 10 20 500
20000000
40000000
60000000
80000000
100000000
120000000
140000000
No traffic control
Client bandwidth (no client)
Background bandwidth (no client)
Client bandwidth (1 client)
Background bandwidth (1 client)
Clients bandwidth (2 clients)
Background bandwidth (2 clients)
Clients bandwidth (3 clients)
Background bandwidth (3 clients)
Number of background streams
Ban
dw
idth
(b
ytes
/s)
Castor Workshop 28-30 Nov 2012 8 Disk-to-tape performance tuning
0 1 2 5 10 20 500
20000000
40000000
60000000
80000000
100000000
120000000
140000000
Traffic control, no TCP segmentation offload
Client bandwidth (no client)
Background bandwidth (no client)
Client bandwidth (1 client)
Background bandwidth (1 client)
Clients bandwidth (2 clients)
Background bandwidth (2 clients)
Clients bandwidth (3 clients)
Background bandwidth (3 clients)
Number of background streams
Ban
dw
idth
(b
ytes
/s)
Data & Storage Services
• Test system (for reference):• Intel Xeon E51500 @ 2.00GHz• Intel 80003ES2LAN Gigabit (Copper, dual port, 1 used)• Linux 2.6.18-274.17.1.el5
• ~122 tape servers in production: per-host rules won’t fit• Per network service appropriate at CERN (11 IP ranges)
Cost of filtering rules
0 50 100 150 200 250 300 350 400 450 500
-0.00002
1.6940658945086E-20
0.00002
0.00004
0.00006
0.00008
0.0001
Filtering rules impact
Number of filtering rules
Tim
e p
er p
acke
t (S
)
Castor Workshop 28-30 Nov 2012 9 Disk-to-tape performance tuning
• Time per packet linear with number or rules for n>100
• Average rule processing time: ~200-250 ns
• Packet time: (1500b/1Gb/s ~12 μs
• => 48-60 rules maximum
Data & Storage Services
Conclusions
• Traffic shaping has been well understood in test environment, and prioritizes work appropriately
• Tape traffic will remain on top in any disk traffic conditions
• Other traffic will not be brought to 0 and timeout• Bidirectional traffic should be helped too• Yet, filtering rules budget is small
– Ad-hoc rules necessary (will work for CERN)– No easy one-size-fits-all tool (showqueue/cron based for
exemple)
Castor Workshop 28-30 Nov 2012 10 Disk-to-tape performance tuning