Monitoring and Tuning TCPIP Networking (2001)

Embed Size (px)

Citation preview

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    1/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Monitoring and Tuning TCP/IP Networking

    [email protected]

    1

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    2/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Contents

    Introduction

    Communications Principles

    TCP - A Simple Approach TCP Measurements

    TCP Tunables

    How Web Servers Use TCP TCP Behavior Plots

    2

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    3/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Introduction

    Keep it simple to start with

    Understand the basic high level measurements

    Look at common usage by web servers

    Focus on monitoring TCP activity

    ......then......

    Look at the complexities of TCP in more detail

    See plots of TCP behaving and misbehaving

    Learn how to look for problems

    Learn how to tune TCP

    3

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    4/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Communications Principles

    How do youcommunicate remotely?

    Making phone calls to a person at a phone number.

    Phone calls are bidirectional connections.

    You keep trying until you connect to the right personand have your conversation.

    You know when the information arrives at itsdestination.

    TCP stands for Transmission Control Protocol, it islike the protocol you use to make a phone call.

    4

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    5/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Connection Based Communications

    A phone call has a complex calling sequence:

    5

    Dial number

    This is Bill, is Jim there?

    Hi Jim, heres my message.

    Thanks, are we done?

    Good-bye.

    Hang up.

    Ringing, pick up

    Yes, this is Jim.

    OK Bill, heres my reply.

    Yes, good-bye.

    Hang up.

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    6/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Sources and Destinations

    For phone calls there is a person at each location.

    Network locations are specified using InternetProtocol (IP) addresses - like 192.161.1.100

    The IP address only specifies the location, there is anadditional port numberthat identifies who to talk to at

    that location - like 192.161.1.100:80 for port 80 (HTTP).

    Each TCP/IP connection is uniquely identified by twoaddresses and their port numbers.

    Special port numbers are reserved for well knownservices and higher level protocols

    Other port numbers are assigned when needed

    6

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    7/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Listening to the Hold Musak

    Lots of people phone into a switchboard

    There may not be enough lines to hold on so youcould get an unobtainable signal

    You reach the operator and ask for someone

    You wait while the operator tries to find the person You are on hold, in a listen queuewith a very tinnyversion of Abbas greatest hits playing....

    Hopefully your call will be accepted

    You may be rejected or give up waiting (time out)

    7

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    8/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    The TCP Listen Queue

    TCP also has a listen queue! All incoming connectionsthat have not yet completed the handshake and been

    accepted are in the queue. The listen queue size is fixed by the application, a

    systemwide tunable limit sets the maximum of 1024. The limit you set on Solaris is tcp_conn_req_max_q0

    You need to tune your web server configuration file as well

    You can calculate the average queue length, and setthe maximum to at least 3 times the average (95%ile)

    mean listen queue length = mean connect setup time * mean connect ratemaximum listen queue length > 3 * mean connect setup time * mean connect rate

    Slow Internet users: 1024 > 3 * 0.5s * 600 conn/s

    Fast local users: 1024 > 3 * 0.001s * 300000 conn/s

    8

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    9/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Remembering and Rejecting Calls

    After you talk to someone you remember them for awhile, and can decide whether to talk to them again.

    Caller-id tells you who is ringing your cellphone.

    You keep numbers for people you dont want to talk toin your phone so you can just ignore the call.

    TCP/IP remembers every connection for 1-4 minutes.

    If one end keeps transmitting when the other hasclosed down, then it will get sent reset packets to tell

    it that no-one wants to listen.

    With high connection rates, its hard to remember verymany connections, and TCP needs to be tuned.

    9

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    10/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Start Slow - Then Speed Up

    We always exchange a short greeting, then maybe alonger sentence, before talking for a long time.

    The other person interrupts all the time saying OK,yes, uh-huh, which acknowledges receipt.

    If you dont get any acknowledgement or you get a

    query - sorry, could you repeat that? you repeatyourself in a short phrase and stop to check its OK.

    TCP/IP doesnt know how fast the link to the client isso it cant just send data at full speed immediately.

    TCP/IP sends one packet, waits for the ACK, thensends two packets, waits for the ACK and keepsdoubling until it reaches full speed.

    10

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    11/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    I Repeat Myself When Under Stress, I Repeat MyselfWhen Under Stress, I Repeat Myself When Under Stress

    If you dont think the other person is listening, you

    repeat what you are saying until you get acknowledged. If there is noise on the line, or too many people talkingat once you cant hear so you repeat yourself.

    TCP/IP resends packets if it doesnt get an ACK withinits time-out period. The time-out varies adaptively, andis remembered for routes that have been used before.

    Unlike a phone line, the time taken to get packets overa connection varies a lot, depending on packet size,congestion and changes in the route.

    It is normalto retransmit over Internet connections

    11

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    12/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Stop! Youre Talking Too Fast!

    When you are listening and cant take in theinformation fast enough, you ask the other person to

    slow down or pause until you catch up, because youcan only remember a few sentences at a time.

    The receiving end of a TCP/IP connection controls theflow of data into its buffer space by sending a sliding

    window size to the sender along with its ACKs.

    If the window is too small and latency is high thenthroughput is affected. Default 8KB needs to be tuned.

    For sustained max speed: window size > mean bandwidth * mean latency

    56Kbit modem is OK with typical latency: 8KB > 6KB/s * 1.3s

    100MBit LAN is marginal: 8KB > 10000KB/s * 0.8ms (with routers 2-3ms)

    Set to at least 32KB for Gigabit ethernet and DSL or Cable Modem Internet

    12

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    13/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Denial of Service Attack

    Its as if someone is jamming your cellphone, and itcant get through to make a call.

    TCP can be (and has been) jammed as well.

    Special programs talk directly to the network card. They construct badly formed TCP SYN packets.

    The destination is your machine, but the return address iscorrupted so the reply does not go back to the sender.

    Your machine uses up a listen queue slot, replies and waits untilit times out. When the listen queue is full no-one can talk to you.

    Solaris detects this strange pattern and throws out bad packets.

    Two queues are used, so completed connections are separated

    Latest variation attempts to saturate the network. Hackers take over lots of high bandwidth systems and use them

    to flood the victim with so much traffic its network saturates.

    13

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    14/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    How Do Protocols Use TCP/IP?

    TCP/IP is used underneath HTTP for web servers withvery high connection rates.

    TCP/IP is used underneath FTP for file transfers withmoderate connection rates.

    Database client/server connections use TCP/IP

    underneath with low connection rates.

    NFSTM started off running over UDP/IP on reliablenetworks with 8KB transfers.

    NFS now defaults to TCP/IP between two SolarisTMmachines and can work better over congested networkswith 32KB transfers. The connection rate is low.

    14

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    15/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Measuring and Monitoring TCP

    Capacity and Throughput Metrics to Watch

    Connections

    Current number of established connections New outgoing connection rate (active opens)

    Outgoing connection attempt failure rate

    New incoming connection rate (passive opens)

    Incoming connection attempt failure rate (resets) Throughput

    Input and output byte rates

    Input and output segment rates

    Output byte retransmit percentage Duplicate input byte percentage

    15

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    16/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    TCP Measurements

    EstablishedConnections

    ThroughputMeasures

    OutgoingActive Opens

    Attempt Failures

    IncomingPassive Opens

    Connection RejectedReset sent

    16

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    17/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Obtaining Measurements

    Generic: get the TCP MIB via SNMP

    Unix: netstat -s shows the TCP counters

    Standard TCP metric names:tcpCurrEstab: current number of established connections

    tcpActiveOpens: number of outgoing connections since boot

    tcpAttemptFails: number of outgoing failures since boot

    tcpPassiveOpens: number of incoming connections since boottcpOutRsts: number of resets sent to reject connection

    tcpEstabResets: resets sent to terminate established connections

    (tcpOutRsts - tcpEstabResets): incoming connection failures

    tcpOutDataSegs, tcpInDataSegs: data transfer in segmentstcpRetransSegs: retransmitted segments

    Byte level throughput statistics are vendor specific

    17

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    18/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Check that you understand the basics

    Which protocol is like a phone call?

    What goes with an IP address to identify the service?

    What happens if you try to talk and no-one islistening?

    What are the two kinds of activity to monitor? What does a tcpAttemptFail mean?

    How do you know your message got through?

    18

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    19/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Internet Server Issues

    TCP Connections are expensive

    TCP is optimized for reliable data on long lived connections

    Making a connection uses a lot more CPU than moving data Connection setup handshake involves several round trip delays

    Each open connection consumes about 1 KB plus data buffers

    Pending connections cause listen queue issues

    Each new connection goes through a slow start ramp up

    Other TCP Issues

    TCP windows can limit high latency high speed links

    Lost or delayed data causes time-outs and retransmissions

    HTTP persistent connections carry several ops on one connect

    Look at behavior plots next to see TCP in action

    Turn snoop data into plots with the packet shell tcp.analysis tool

    Get it from http://playground.sun.com/psh

    19

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    20/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    TCP in Pictures

    Medium sized HTTP request and response with retransmit congestion

    RequestPacket ACK

    Packets

    WindowAdvance

    RetransPackets

    20

    Time inseconds

    SequenceNumber

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    21/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    TCP in Pictures

    Clean transfer with slow start ramp up - Transfer with window close flow control

    ACK

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    22/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    TCP in Pictures

    Problem with increased minimum retransmit time-out

    Persistent connections showing two small transfers and a large image over highspeed net

    Packet

    gets lost

    retranstoo slow

    Threetransfers

    22

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    23/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Tuning TCP Configuration Parameters

    TCP Tunables (use Solaris ndd command)

    tcp_close_wait_interval (actually tcp_time_wait_interval)

    default 240000 ms. Reduced to 60000 for SPECweb96 runstcp_conn_req_max_q0 - incomplete listen connection limit

    default 1024 - no need to tune unless tcpListenDropQ0 seen

    tcp_conn_req_max_q - pending completed connection limit

    default 128 - no need to tune it unless tcpListenDrop seen

    tcp_slow_start_initial - number of initial packets to send default 1 - set to 2 to be the same as other vendors

    tcp_xmit_hiwat and tcp_recv_hiwat - window size

    default 8192 - set to 32768 (which is Windows and MacOSdefault)

    TCP Connection Hash Table (add to /etc/system)

    Increase from default of 256, must be power of 2, up to 262144

    set tcp:tcp_conn_hash_size=32768

    23

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    24/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    TCP Rules and the SE Toolkit

    Let SE watch your TCP stack for you

    ignore everything if TCP throughput is less than 2 KB/s

    warn if retransmit rate is over 15%, problem if over 25% warn if listen drops are seen, problem if over 0.5/s

    increase the listen queue size until this problem stops

    warn of SYN service denial attack if tcpHalfOpenDrop over 2/s

    try to block the source of the attack at high levels

    warn if connection refused - sending reset packets at 0.5/s increase this threshold if it becomes annoying

    possible port scanner attack if resets at over 2/s

    this threshold is also too low, set higher if you get false alarms

    warn if attempted outgoing connection fails over 2/s try to find the process on your system that is failing to connect

    warn if incoming duplicate packets over 15%

    problem if duplicates at 25% - remote is retransmitting at you

    24

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    25/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Tools that use the TCP rule virtual_adrian.se - text log of problems seen

    zoom.se - GUI front end to rules

    percollator.se - long term web server data collector

    Also use tcp_monitor.se to check and tune TCP

    25

  • 8/6/2019 Monitoring and Tuning TCPIP Networking (2001)

    26/26

    3/16/01 Page

    Adrian Cockcroft - Sun Microsystems Monitoring and Tuning TCP/IP Networking

    Conclusion

    Think about TCP/IP behaving like the phone system

    Monitor the simple measurements

    Tune TCP parameters when needed

    Look at TCP packet flow diagrams to understand what

    is really going on Use TCP rules to automatically watch for problems,but customize the rule thresholds to your situation

    26