Application Performance Pitfalls and TCP's Nagle Algorithm

  • Upload
    deepsri

  • View
    238

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    1/9

    Application performance pitfalls and TC P's Na gle algorithmGreg Minshall [email protected]

    Siara Systems

    Jeffrey C. Mogul mogul@p a.dec. comCompaq Computer Corp. Western Research Lab.

    250 University Ave., Palo Alto, CA, 94301

    Yasushi Saito [email protected] o/ Computer Science and Engineering

    University o/ WashingtonBen Verghese [email protected] ec.com

    Compaq Computer Corp. Western Research Lab.

    A b s t r a c tPerformance improvements to networked applicationscan have unint ended consequences. In a stu dy of theperformance of the Network News Transport Protocol(NNTP), the initial results suggested it would be use-ful to disable TCP's Nagle algorithm for this applica-tion. Doing so signif icantly improved latencies. How-ever, closer observation revealed that with the Naglealgorithm disabled, the application was transmittingan order of magnitude more packets. We found thatproper application buffer management significantly im-proves performance, but that the Nagle algorithm stillslightly increases mean latency. We suggest that modi-fying the Nagle algorithm would eliminate this cost.1 I n t r o d u c t i o nThe performance of a client/server application dependson the appropriate use of the transpo rt protocol overwhich it runs. In today' s Interne t, most applica-tion protocols run over the Transport Control Protocol(TCP) [12] or the User Datag ram Pro tocol (UDP) [11].Usenet, one of the main Internet applications, runsover TCP using an application protocol known as theNetwork News Transport Protocol (NNTP) [7]. Sev-eral of us made a prior study [15] assessing the perfor-mance of a popular NNTP implementa tion. We mea-sured the response time for NNTP interactions, andnoticed a sharp peak in the response-time distributionat exactly 200ms. We discovered that this 200ms delaywas a consequence of interactions between the appli-cation and two TCP mechanisms: the Nagle algorithmand delayed acknowledgments.

    In this paper, we analyze this situation in more de-tail. We show that , once we resolved several buffering

    problems, use of the Nagle algorithm still leads to aslight increase in mean latency for NNTP interactions,and we analyze the mechanism behind this: Finally, wedescribe a simple modification to the Nagle algorithmtha t should eliminate the problem.2 T C P a n d t h e N a g l e A l g o r it h mTCP controls the transmission of data between theclient and server processes. It at tem pts to balance anumber of requirements, including fast responsiveness(low latency), high through put, protection against net-work congestion, and observing the receiver's buffer lim-its. As part of this balancing act, TCP attemp ts to in-ject the fewest possible packets into the network, mainlyto avoid congesting the network, and to avoid addingload to routers and switches.

    In the early 1980s, terminal (telnet) traffic consti-tuted a large amount of the traffic on the Internet. Of-ten a client would send a sequence of short packets,each containing a single keystroke, to the server. Atthe time, these short packets placed a significant loadon Internet, especially given the relatively slow links,routers, and servers.

    To alleviate this glut of small packets, in 1984 JohnNagle proposed what has become known as the Na-gle algorithm [9]. The algorithm applies when a TCPsender is deciding whether to t ransmi t a packet of dataover a connection. If it has only a "small" amount ofdata to send, then the Nagle algorithm says to send thepacket only if all previously trans mitte d d ata has beenacknowledged by the TCP receiver at the other end ofthe connection. In this situation, "small" is defined asless data than the TCP Maximum Segment Size (MSS)for the connection, the largest amount of data tha t canbe sent in one datagram.

    The effect of the Nagle algorithm is that at mostone "small" packet will be transmitted on a given con-nection per round trip time (RTT). Here, "round triptime" means the time it takes to transmit data andsubsequently receive the acknowledgment for that data.

    36

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    2/9

    The algorithm imposes no limit on the number of largepackets transmi tted within one round trip time.

    The Nagle algorithm has been very effective in reduc-ing the number of small packets in the Internet. Today,most TCP implementations include this algorithm.3 Delayed Acknow ledgmentsTCP da ta packets are acknowledged by the recipient. Inorder to reduce the number of packets in the network,the receiver can "delay" sending the acknowledgmentpacket for a finite interval. This is done in the hope tha tthe delayed acknowledgment can be "piggy-backed" ona dat a packet in the opposite direction. The delayedacknowledgment can also carry window update infor-mation, informing the TC P sender tha t the applicationat the receiver has "read" the previously transmitteddata, and th at t he TCP sender can now send that ma nymore bytes of data. Delayed acknowledgments are par-ticularly useful in client/serve r applications, where datafrequently flows in bo th directions.

    Many TCP implementations are derived from the4.x BSD code base (described in great detail by Wrightand Stevens [19]). These TCP stacks mark a connec-tion that has just received new data by setting theTF_DELACK flag, indicating that the connection needsa delayed acknowledgment. Subsequent t ransmission ofan acknowledgment on the connection clears this flag.A separate timer-driven background activity polls thelist of active TCP connections once every 200ms, send-ing an acknowledgment for each connection that hasTF_DELACK set. This means that, on average, anacknowledgment will be delayed for lOOms, and nevermore than about 200ms.

    TCP senders make use of returning acknowledgmentpackets to "clock" the transmission of packets [5]. Thisconflicts with the desire to reduce the number of pack-ets in the network. For this reason, TCP receivers musttransmit at least one acknowledgment for every two full-sized packets received. Thus, a typica l transmission se-quence (as seen from node B) might look like: A- >B(data), A->B (data), B->A (ack), A->B (data), ...4 Th e Ne t w or k Ne w s T r a ns fe r P r o t oc o lThe Network News Transfer Protocol (NNTP) [7] isused to transfer Usenet articles between news servers,and between news servers and news readers (clients).NNTP is a request-response protocol. Similar to pro-tocols such as SMTP [13] and FTP [14], requests andresponses in NNTP are expressed in ASCII, rathe r thanin a binary format. When used between a client and aserver, the client waits for one response before sendingthe next request. (Server-to-server transfer s are oftenpipelined.)

    5 T h e O r i g in a l E x p e r i m e n tIn a prior study [15], several of us (Saito, Mogul, andVerghese) measured the response time for NNTP tr ans-actions (from the time the client sends its request to thetime the client receives the last byte of the response)from a server runnin g the popular INN [16] software.We found a sharp peak at 200ms; figure 1 shows the cu-mulative distr ibution function for article retrieval laten-cies. Note that approx imately 25% of the transactionscompleted in almost exactly 200ms.

    10.90.80.70.60.50.40.30.20.1

    0o

    o r i g i n a l = server - -I I I I I100 200 300 400 500 600 7 0 0

    r e s p o n s e t i m e (ms)Figure 1: CDF for retrieval latencies: original code

    We realized that this 200ms peak was an artifactof an interaction between the Nagle algorithm and th edelayed-acknowledgment mechanism. When the serverapplication sends its response, part of the data is pre-sented to the TCP stack in an amount smaller than theMSS. Because the previous response packets have notyet been acknowledged, the Nagle algorithm causes thesender to refrain from sending the small packet until anacknowledgment is received. However, the client appli-cation has not read enough data to generate a windowupdate (because the server has not sent enough datasince the last acknowledgment from the client), so theacknowledgment for the previous response packet(s) isdeferred until the timer expires for the delayed acknowl-edgment mechanism.

    Humans notice, and dislike, a 200ms delay in articleretrieval. This delay also increased the mean delay mea-sured by a benchmark we were developing. In order toreduce this delay, we modified the INN server to disablethe Nagle algorithm when communicating with clients.(The TC P specification requires tha t an implementa-tion allow an application to disable this algorithm [1].)We were pleased to note, as shown in figure 2, that themode at 200ms had disappeared. The mean article re-trieval latency dropped, in our benchmark, from 122msto 71ms.

    37

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    3/9

    10. 90. 80. 70. 60 .50 .40 .30 .20.1

    0 I , o r i g in a l s e r v e r , N a g l e tu rned o f f ,1O 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0r e s p o n s e t ime (ms )

    Figure 2: CDF for latencies: Nagle algorithm disabled

    Therefore, we recommended tha t NNTP servers dis-able the Nagle algorithm for client connections [15].

    6 A C loser LookOne of us (Minshall) reacted to this recommendat ion byasking whether this 200ms latency was a fundamentalproblem with the Nagle algorithm, or whether it wasin fact an aspect of the NNTP server software designthat could be corrected in the server software itself. Wetherefore collected tcpdump [6] traces to investigate thedetailed behavior of the system.

    10. 90. 80. 70. 60. 50. 40. 30.20.1

    0 0i5 O O

    o r i g i n a l s e r v e r , N a g l e t ur ne d o f f - -, , , I r i g ir a l s e T e r - ' , . . . .

    1000 1500 2000 2500 3000 3500 4000 4500p a c k e t s i z e ( b y t e s )

    Figure 3: CDFs for packet sizes: orig inal serverThese traces revealed a new anomaly. The median

    packet size (TCP payload) with the Nagle algorithmenabled was 4312 bytes (equal to the MSS of the FDDILAN used in these tests), but was only 79 bytes withthe algorithm disabled (see figure 3). In the former case,a trial transferred 26.7 Mbytes in 9,600 packets; in the

    latter case, a trial transferred 20.1 Mbytes in 147,339packets.

    This discrepancy led us to discover that the INNserver (nnrpd), due to an unexpected consequence of theway these experiment s were run, was doing line-bufferedI/O on its NNTP sockets (rather than fully buffer-ing each response). Therefore, the server was sendingeach response using many write() system calls. Withthe Nagle algorithm enabled, the TCP stack bufferedall but the first line of data until an acknowledgmentwas received from the news reader, or until at leasta packet's worth of response data accumulated at theserver. The Nagle algorithm, as intended, prevented thenetwork from being flooded with tiny response packets.However, with the Nagle algorithm disabled, there wasnothing to prevent the server TCP from sending a tinypacket for each line written by nnrpd,and so it sent lotsof them.

    7 Ensur ing Buf fe r ing in the ServerWe were surprised to learn that nnrpd was using line-buffered I/O . Investigation revealed a subtle cause, spe-cific to the pa rticul ar experimen tal setup. When it ac-cepts a new connection, nnrpdarranges for that socket'sfile descriptor to be associated with the stdout stream,closing the existing file descriptor initially associatedwith tha t stream. This allows nnrpd to write its out-put using the printfO library routine, which implicitlynames the stdout stream. It means, however, th at thenetwork connection inherits its buffering mode from std-out.

    In normal use, nnrpd is automat ical ly invoked at sys-tem start-ti me from a script, a nd stdout is fully buffered.However, in the original experiments, we had inserteda debugging printfO to stdout, and started nnrpd fromfrom the command line of a terminal. The stdio librarycauses terminal-output streams to be line-buffered, bydefault. Thus, the nnrpdnetwork connections inheritedthis line-buffered mode, resul ting in multiple writes pernews article.

    We modified nnrpd to ensure that it always usedfully-buffered stdio streams, regardless of the circum-stances of its invocation. Even so, we found responsessplit into too many packets. We then discovered thatnnrpd uses a special buffer for some of its ou tput , inter-mingled with writes via stdout. This mingling requiredthe use of extraneous fflushO calls. We therefore modi-fied nnrpdagain, to use only stdio buffering. This elim-inated all of the excess packet transmissions.

    We refer to the original (accidentally line-buffered)nnrpd as "poorly-buffered"; the first modified nnrpd as"mostly-buffered" (this is the model that nnrpd wouldnormally use in practice), and our final modified nnrpdas "fully-buffered."

    38

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    4/9

    ] ~ - - . . . . I 1

    0.9 / 0.90.8 0.80.7 0.70.6 0.60.5 0.50.4 0.40.3 0.30.2 0.20.1 mostly-buffered sew er, Nagle turne d off 0.10 I t I mst ly ]bu f fe red lserver i . . . . . . 0

    0 1O0 200 300 400 500 600 700 0 1O0 200 300 400 500 600 700response t ime (ms) response t ime (ms)

    Figure 4: CDFs for latencies: mostly-buffered I/O Figure 6: CDFs for latencies: all six configurations

    10.90.80.70.60.50.40.30.20.1

    0 0fully-buffered sew er, Nag le turned offi i i fu lly]buffered t sew er .~ .. .. .

    1oo 200 300 400 500 600 7'00response t ime (ms)

    Figure 5: CDFs for latencies: fully-buffered I/ O

    Figure 4 shows the results for mostly-buffered I/O ,with and without the Nagle algorithm; figure 5 showsthe results for fully-buffered I/O. Figure 6 combines allof the latency results into one figure, and table 1 showsstatistics for each of the six configurations.

    From table 1, we see that the mos tly-buffered serverwith the Nagle algorithm disabled has the lowest la-tency. However, th e two fully-buffered server configu-rations (with and wi thout the Nagle algorithm disabled)show better performance, and fewer packets, than theoriginal poorly-buffered server, even when the poorly-buffered server was running with Nagle disabled.

    We are still trying to understand why the mostly-buffered server beats the fully-buffered server; thismight be a measurement artifact.

    8 Analys is of a lgor i thm interact ionsAs table 1 shows, even after we ensured the use offully-buffered I/ O in nnrpd, our benchmark still yieldedbett er results with the Nagle algor ithm disabled. TheNagle algorithm increases mean latency by 19%, andmedian latency by 4%. What causes this effect? Wepresent a somewhat simplified explanation (see also [4]).

    Consider the number of full-sized TCP segments re-quired to convey an NNTP response (such as a news ar-ticle). If the response exactly fits an integral number ofsegments, then the Nagle algorithm has no effect, andthe TCP stack sends each packet immediately. How-ever, response sizes vary considerably, so most do notexactly fit an integral number of segments.

    Now consider an article that requires an odd num-ber of segments; i.e., that requires 2N + 1 segments.TCP will quickly dispose of the first 2N segments, sinceeach pair will result in an immediate acknowledgment;the delayed-acknowledgment mechanism will not be in-voked. When the time comes to send the last segment,all previous segments will have been (or will quickly be)acknowledged, so the Nagle algorithm also will not beinvoked; there will be no delay.

    Finally, consider an article that requires, but doesnot fill, an even number of segments, i.e., that requires2N segments but contains less than 2N * M S S bytesof data . TCP will quickly send the 2N - 1 full-sizedsegments, because the Nagle algorithm is not invoked.However, the receiver will delay its acknowledgment ofthe final full-sized segment, since it is waiting for apair of segments. Simultaneously, the sender will de-lay its transmission of the final, partial segment, sincethe Nagle algorithm is triggered by the o utstandi ng un-acknowledged data and the lack of a full segment tosend.

    With a uniform distribution of response sizes, this

    39

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    5/9

    S e rv e r M e a n M e d i a n M e a n M e d i a nconf igura t ion response response packe t packe t

    t ime (ms) t ime (ms) s ize s izepoor ly -buffe red , Nag le enab led 122 135 2913 4312poor ly -buffe red , Nag le d i sab led 71 27 143 79mos t ly -buffe red , Nag le enab led 53 28 3114 4312mos t ly -buffe red , Nag le d i sab led 28 20 3040 3864fu l ly -buffe red , Nag le enab led 43 26 3192 4312fu l ly -buffe red , Nag le d i sab led 36 25 3131 3864

    Tab le 1 : R esponse t ime and p acke t s i ze s ta t i s ti cs , one run o f the b enchm ark fo r each con f igura t ion

    ana lys i s sugges t s tha t j u s t und er ha l f o f the responsesw o u l d b e d e l a y e d b y t h e i n t e r a c ti o n b e t w e e n t h e N a g l ea lgo r i thm and the de layed -acknow ledgmen t mechan ism .In real i ty , the art icle s ize d is tr ibut ion is not at al l uni-fo rm [15 ] . At l eas t 50% o f Usene t a r t i c les w ould fi t in toa s ing le E thern e t packe t , and a subs tan t ia l ly l a rger f rac -t ion would f i t in to a s ing le FD DI packe t . Thus , m os tUsene t a r t i c les requ i re an odd numb er o f s egmen ts anddo no t encoun te r a de lay . We found , fo r our bench -mark runs , th a t 8 . 7% o f the responses do requ i re aneven num ber o f FD DI segmen ts , wh ich exp la in s whyd isab l ing the N ag le a lgo r i thm leads to improved la tencystat is t ics .8 . 1 R e s u lt s w i t h E t h e r n e t M T U

    1

    o7 / i i i i!:::::::. . . . . . . . . . . . . . . . . . . . . . . . . .0 .50 .40 .30 . 2 u f f e r e d s e r v e r , Nagle disabled -[ f u l l y - b u f f e r e d s e r v e r , Nagle disabled . . . . . . .0.1 ~- fully-buffered server . . . . . . . .0 mo st ly -bu f fere d se rve r . .. .. .. .. .. ..0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0

    response t ime ( m s )Figure 7 : CDF s fo r l a tenc ies wi th E the rne t M TU

    Encouraged by our success in p red ic t ing the f re -quency o f packe t de lays us ing the FD DI segmen t s i ze ,we dec ided to ver i fy our mo del us ing a d i f fe ren t packe ts ize. We ran t r i a l s over an FD DI ne tw ork tha t hadbeen reconf igured to use the same MSS as would beused on an E therne t ( 1460 by tes ) . By us ing a recomf i g u re d F D D I n e t w o rk , r a t h e r t h a n a n a c t u a l E t h e rn e tne twork , we avo ided chang ing severa l i r re levan t var i -ab les (p r imar i ly , the hardware imp lemen ta t ion and the

    assoc ia ted kerne l d r iver so f tware ) tha t migh t have a f -fec ted the resu l t s .We ran t r i a l s us ing bo th the mos t ly -buffe red and

    fu l ly -buffe red se rvers , wi th and w i thou t the Nag le a lgo -r i thm enab led . F igure 7 shows the CDF s fo r these fourt r i a ls . Tab le 2 shows the mean a nd med ian l a tenc iesand packet s izes .

    Again , we found tha t the Nag le a lgo r ithm doess l igh tly increase la tency . How ever , the increase wasmuch smal le r than we expec ted , s ince the s imp le modelin Sec t ion 8, app l ied to the d i s t r ibu t ion o f responses izes , sugges t s tha t 45% o f the responses shou ld en -coun te r de lays . Th is c lea r ly does no t happen .

    I n d e e d , o u r m o d e l i s t o o s im p l e. Th e B S D T C Pimplem en ta t ion o f the N ag le a lgo r ithm never de lays apacke t t ransmiss ion i f the connec t ion was id le when thetcp_output0 funct io n is cal led , so the f i rs t "clus ter" ofda ta can be sen t wi tho u t de lay . On our sys tems , theclus ter s ize is 8192 bytes , large enough to cover a s ig-n i f ican t f rac t ion o f the N NT P responses .

    8 . 2 O t h e r c o n t e x t s w h e r e t h i s e f f e c t o c c u r sWe are no t the f i r s t to po in t ou t the per fo rmance im-p l ica t ions o f the the in te rac t ion b e twe en the Nag le a l -go r i thm and the de layed -acknow ledgmen t mechan ism .The e f fec t occurs in con tex t s bes ides NNTP.

    Ear ly examples inc lud ing the u se o f mul t i - by te"funct ion keys" in in teract ive terminal sess ions [18] , andt h e i n t e r a c ti v e X w i n d o w s y s t e m [ 17 ] . I n b o t h o f t h e s ecases, imp lemen to rs qu ick ly rea l i zed the need to d i sab lethe Nag le a lgo r i thm [3 ] .

    M o re r e c e n t l y , t h e i n t ro d u c t i o n i n H TTP o f "p e r -s i s t en t connec t ions" ( the use o f a s ing le TC P connec-t ions fo r severa l HTTP reques t s ) has l ed severa l re -searchers [4 , 10] to s imilar conclusions .

    I n g e n e r a l , a n y a p p l i c a t i o n t h a t u s e s T C P a s a t r a n s -p o r t f o r a s e r i e s o f r e c i p r o c a l e x c h a n g e s c a n s u f f e r f r o mt h i s i n t e r a c t i o n . T h e e f f e c t w o u l d n o t a p p e a r w h e n a ne n t i r e r e q u e s t o r r e s p o n s e a l w a y s f i t s i n o n e T C P s e g -m e n t , b u t m a y b e s e e n w h e n t h e r e q u e s t o r r e s p o n s e i ss o l a r g e a s t o n e e d t o b e t r a n s m i t t e d i n m o r e t h a n o n ep a c k e t .

    40

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    6/9

    Table 2: ResponseMTU

    Server Meanconfiguration responsetime (ms)

    Median Mean Medianresponse packet packet

    time (ms) size sizemostly-buffered, Nagle enabled 52mostly-buffered, Nagle disabled 37fully-buffered, Nagle enabled 38fully-buffered, Nagle disabled 35

    31 1283 146025 1236 146026 1294 146025 1247 1460

    time and packet size statistics, one run of the benchmark for each configuration with Ethernet

    9 Phase E f fec ts in the Or ig ina l Exp er ime nt

    Figure 1 shows something odd. As we explained in sec-tion 3, the BSD delayed acknowledgments mechanismshould produce delays uniformly distributed in the in-terval [0...200] ms. A graph of the cumulative distri-bution of response times should include a nearly linearsection starting at the undelayed response time, andextending 200ms beyond this point. Instead, figure 1shows a sharp jump at 200ms, as if the delayed ac-knowledgment mechanism were waiting exactly 200msbefore sending the delayed acknowledgment.

    While human users of NNTP servers often pausefor several seconds between articles, the benchmarkwe used in these experiments sends client requests asrapidly as possible; this allows us to measure worst-case server behavior. When the delayed acknowledg-ment first delays the completion of an interaction in aseries, we would indeed expect the delay to be unifo rmlydistributed. However, the subsequent interact ion nowstarts shortly after a 200ms timer "clock tick," and ifit is delayed, then it will complete upon the subsequenttimer expiration - almost exactly 200ms after it starts .

    Floyd and Jacobson [2] discuss synchronization ofnodes in the Internet, in the context of periodic routingmessages. As they point out , such synchronization canproduce significant burst loads. In the case we analyzedin this paper, the synchronizat ion is based on the clockof the news reader (client) host. In practice, m any newsclients with independent clocks access one server, whichshould reduce the overall synchron ization effect. Ourbenchmark, which uses multiple news reader processeson a single client machine, leads to a worst-case synchro-nization effect. We note, however, th at if the bulk ofthe dat a had been flowing from many senders to one re-cipient, this might have synchronized the delays to therecipient's clock, thereby inducing large synchronizedbursts of dat a arriving at the recipient. For example,an HTTP proxy retrieving responses from many Inter-net servers, using persistent connections, could createsynchronized bursts of incoming data through an inter-action between Nagle's algorithm and delayed acknowl-edgments.

    10. 90 .80. 70. 60. 50. 40. 30. 20.1

    0 0

    f u ll y -b u f fe r e d s e r v e r , m o d i f i e d N a g l e - -f u l l y - b u f fe r e d s e r v e r , o r i g i n a l N a g l em o s t l y - b u f fe r e d s e r v e r , m o d i f ie d Naglem o s t l y - b u f fe r e d serve r , or ig inal N agleI f I I I " i . . . . . . . . . . . . . .1O0 200 300 40 0 500 600 700

    r e s p o n s e t ime (ms )Figure 8: Effect of modified Nagle algorithm

    10 A Proposed So lu t ionThe Nagle algorithm, as we saw when running the "ac-cidentally unbuffered" version of the NNTP software,does as intended prevent the transmission of excessivenumbers of small packets. Can we preserve this featurewithou t adversely affecting the performance of properlybuffered applications? And can we avoid the need forapplication programmers to determine whether or notto disable the Nagle algorithm?

    One of us (Minshall) has recently proposed a modi-fication to the Nagle algorithm [8]. Instead of delayingthe transmission of a short packet when there is anyunacknowledged data, this modified version would de-lay only if the unacknowledged data was itself sent ina short packet. This approach would preserve the al-gorithm's original intention, preventing a flood of smallpackets, but would not add delay to the transmission ofresponses requiring 2N segments.

    10 . 1 Va l ida t ion o f p roposed so lu tionTo validate the proposed modifi cations to Nagle's algo-rithm, we implemented a somewhat simplified versionof Minshall's approach. This version requires only onebit of additional state per TCP connection, and allows

    41

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    7/9

    S e rv e r M e a nconf igura t ion responset ime (ms)

    M e d i a n M e a n M e d i a nresponse packe t packe t

    t ime (ms) s ize s izemos t ly -bu ffe red , o r ig ina l Nag le 50mos t ly -buffe red , mod i f ied Nag le 48ful ly-b uffered , orig inal Nagle 36fu l ly -buffe red , mod i f ied Nag le 34

    26 3019 431228 2975 386421 3092 431221 3043 3864

    Tab le 3 : Respo nse t ime and packe t s i ze s ta t is t i cs , w i th o r ig ina l and mod i f ied Nag le a lgo r i thms

    10. 90. 80. 70. 60. 50. 40. 30. 20.1

    0

    - - - - - r ~ o s t l y - b u f f e r e d s e rv e r , m o d if le d - N a gl e. . . . . . . fu l l y -buf fered server , moc l i f i ed-Nagle I. . . . . . . . mos t l y -buf fered server ]................ fully -bu ffere d se rve r I. . . . . . . . mos t l y -buf fered server , Nag le o f f Jfu l l y -buf fered sewer , Nag le o f f I, . . . . I

    5 0 0 1 0 0 0 1 5 0 0 2 0 0 0 2 5 0 0 3 0 0 0 3 5 0 0 4 0 0 0 4 5 0 0p a c k e t s i z e (by tes)

    Figure 9 : CDFs fo r packe t s i zes

    two smal l packe t s in a row (wi thou t de lay ) i f the f i r s tone resu l t s f rom the kerne l ' s use o f a f ixed c lus te r s i zewhen f i l l ing the TCP ou tpu t buf fe r . The mod i f ica t ionsare shown in Append ix A.

    We ran t r i a l s us ing bo th the mos t ly -buffe red andfu l ly -buffe red se rvers, wi th e i ther the s ta ndard o r mod -i f ied Nag le a lgor i thms . F igure 8 shows the C DF s fo rthese four t r i al s . Tab le 3 shows the mean and me d ianlatencies and packet s izes .

    F igure 8 shows tha t the fu l ly -buffe red se rver gen -e ra l ly ou tper fo rms the mos t ly -buffe red se rver , in tha tthe fu l ly -buffe red d i s t r ibu t ions a re sh i f t ed towards lowerlatencies . This f igure also shows only a s l ight im-p rovemen t f rom the mod i f ied Nag le a lgo r i thm, fo r e i -ther buf fe r ing model. How ever , we ran add i t iona l t r i a l sand found tha t , on the who le , the mod i f ied Nag le a l -go r i thm genera l ly p rov ides a modes t improvemen t fo rbo t h se rver buf fe ring models . We in tend to do a fu r -ther s tudy us ing a ca re fu l s t a t i s ti ca l ana lys i s o f a l a rgeset of t r ials (a nd apologize for the use of single t r ials inth i s paper ) .

    F igure 9 shows the packe t s i ze d i s t r ibu t ions fo r var -ious con f igura tions o f se rver buf fe ring model and theNagle algori thm , I t is d iff icul t to see th is from the b lackand wh i te vers ion o f the f igure , bu t genera l ly the mod-i f i ed Nag le a lgo r i thm causes s l igh t ly more packe t s a ts i zes under 3880 by tes , wh i le en t i re ly d i sab l ing the Na-

    g le a lgo r i thm leads to the fewes t such packe t s . However ,d i sab l ing the Nag le a lgo r i thm crea tes a l a rge number o f3880 -by te packe t s . Th is th resho ld co r responds to thed i ff e re n c e b e t w e e n M C L B Y T E S ( 8 19 2) a n d t h e F D D IMSS (4312) ; we a re no t en t i re ly su re wh y the Nag lea lgo r i thm spec i f ica l ly suppresses packe t s o f tha t s ize.

    1 1 S u m m a r y a n d c o n cl us io n sWe iden t if i ed a per fo rmance p rob lem tha t appea red toar i se f rom an in te rac t ion be tween TCP ' s Nag le a lgo -r i thm and de layed acknowledgm en t po lic ies . W e foundtha t d i sab l ing the Nag le a lgo r i thm appeared to so lveth i s p rob lem, bu t un leashed a to r ren t o f smal l packe t s .By f ix ing the buffe r ing p rob lem tha t caused th i s to r -ren t , w e showed tha t the Nag le a lgo r i thm s t il l causesa s l igh t increase in mean la tency , and we ana lyzed thecauses beh ind th i s increase . Th is sugges t s tha t a s im-p le mod i f ica t ion to the Nag le a lgo r i thm would improvel a t e n c y w i t h o u t r e m o v i n g th e p ro t e c t i o n a g a i n s t p a c k e tfloods .

    We conc lude tha t :. Disab l ing the Nag le a lgo r i thm can g rea t ly increase

    ne twork conges t ion , i f app l ica t ion buffe r ing i s im-perfect .2 . Even care fu l ly -wr i t t en app l ica t ions can , on occa -

    s ion , exh ib i t sub -op t imal buf fe r ing behav io r .. Even wi th op t imal buf fe r ing , the in te rac t ion be -

    tween the Nag le a lgo r i thm and de layed acknowl -edgmen ts can increase app l ica t ion l a tency , i f on lyslightly.

    , The bes t so lu t ion would be the u se o f ca re fu lbuffe r ing , combined wi th an improved Nag le a lgo -r i thm tha t defends the ne twork aga ins t buf fe r ingerro rs wi thou t add ing much la tency .

    . I m p l e m e n t o r s w h o a r e t e m p t e d t o d i s a b l e th e N a -g le a lgo r i thm shou ld t ake care no t on ly to use care -fu l buf fe r ing , bu t a l so to measure the resu l t to en -su re tha t th e buffe r ing works as expec ted (perhapsby in spec t ing the resu l t ing packe t s t ream us ing atoo l such as t cpdump [6 ] ) .

    42

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    8/9

    12 A cknow l edgem ent sWe would l ike to thank John Nagle fo r h is ins igh t fu lc o mme n t s . We w o u ld a l s o l i ke t o t h a n k t h e WI S P r e -v iewers fo r the i r sugges t ions .A Mod i ficat ions to the Nag le a lgor i thmO ur m o d i f ic a t io n s t o t h e N a g l e a lg o r i t h m, a s o m e w h a ts impl if ied ve r s ion of those proposed by Minsha l l [8 ], a reconf ined to the tcp_output 0 f un c t i o n fr o m th e B S D n e t -working code . We show them as inse r t ions to the codegiven by W r igh t and S tevens [19] , in F igure 10 . Onlyce r ta in l ines o f the code a re shown; th e sou rce l ine num-be r s a r e t a ke n f r o m W r ig h t a n d S t e ve n s, a n d o u r i n s e r-t ions a re shown wi th "***" ins tead of l ine num bers .References

    [1 ] R . Braden . Req ui rem ents fo r in te rne t hos ts - com-mun ic a t i o n la ye rs . R F C 1 1 22 , I n t e r n e t E n g in e e r in gT a s k F o r c e , f t p : / / f t p . i s i . e d u / i n - n o t e s / r f c l 1 22 . t x t .O c to be r 1 989.

    [2 ] Sa l ly F loy d and Van Jacobson . Th e synchroniza -t ion of pe r iod ic rout ing messages . I E E E J A C MTransactions on Networking, 2(2) :122-136, Apr i l1994.

    [3] J im G e t t y s . P e r s o n a l c o mm un ic a t i o n . 1 999 .[4 ] J . He idem ann . Pe r fo rma nce in te rac t ions be tw eenP - H T T P a n d T C P i m p l e m e n ta t io n s . A C M C o m -

    pu t e r C o m muni c a t i on R ev i ew , 27 ( 2 ) : 65 - 73 , h t t p : / /w w w . a c m . o r g / s i g c o m m / c c r / a r c h i v e / . A p r il 1 9 97 .[5 ] Van Jacobson . Conges t ion avo idance and con-t ro l . I n P r o c . S IG C O M M ' 88 , pages 314-32 , S tan -f o r d , CA , f t p : / / f t p . e e . l b l . g o v / p a p e r s / c o n g a vo id .ps .Z . Augus t 1988 .[6] Van Jacobson e t a l . t c pdump(1 ) , B P F , f t p : / / f t p .

    e e . l b l . g o v / t c p d ump . t a r . Z . 1 990 .[7 ] B . Kan tor and P . Laps ley . Ne tw ork news t r ans -

    fe r p ro toco l : A proposed s tanda rd for the s t r eam-based t r ansmiss ion of news . RFC 977, I n te rne tE n g in e e r in g T a s k F o rc e , f t p : / / f t p . i s i . e d u / im n o te s /r f c977 . tx t . Februa ry 1986 .[8 ] G reg Minsha l l. A sugges ted mod if ica t ion to nag le ' s

    a lg o r it h m . I n t e r n e t - D r a f t d r a ft - m in s h a ll - n a g le - 00 ,I n t e r n e t E n g in e e r in g T a s k F o r ce , h t t p : / / w w w . i e t f .o r g / i n t e r n e t - d r a f t s / d r a f t - m in s h a l l- n a g l e - 00 . tx t .December 1998 . This i s a work in p rogress .[9 ] J . Nagle . Cong es t ion cont ro l in I P /T C P in te r -n e tw o r ks . R F C 896, I n t e r n e t E n g in e e r in g T a s k

    F o r c e , f t p : / / f t p . i s i . e d u / i n - n o t e s / r f c 896 . t x t . J a n -ua r y 1 984 .

    [10] H . F . Nie lsen , J . Ge t ty s , A . Ba i rd - Sm ith ,E . P r u d ' h o m me a ux , H . W. L i e , a n d C . L i ll ey . N e t -w o r k p e r f o r ma n c e e f fe c ts o f H T T P/ 1 . 1 , CS S1 , a n dP N G . I n P r o c . S I G C O M M ' 9 7 , pages 155-166,Cannes , F rance , Sep tember 1997 .

    [11 ] J . Pos te l. Use r da t ag ram pro toco l . RFC 768, I n -t e r n e t E n g in e e r in g T a s k F o r c e, f t p : / / f t p . i s i . e d u /i n - n o t e s / r f c 768 . t x t . A ug us t 1 980 .

    [12] J . Pos te l . Transmiss ion cont ro l p ro toco l . Req ues tf o r Co m me n t s ( S t a n d a r d ) S T D 7 , R F C 793, I n te r -n e t E n g in e e r in g T a s k F o r c e , f t p : / / d s . i n t e r n i c . n e t /r f c / rf c 793 . t x t . S e p t e m be r 1 981 .[13] J . Pos te l . S imple ma i l t r ansfe r p ro toco l . RFC 821 ,

    I n t e r n e t E n g in e e r in g T a s k F o r c e, f t p : / / f t p . i s i . e d u /i n - n o t e s / r f c 821 . t x t . A ug us t 1 982 .[14] J . Pos te l and J . Reynolds . F i le t r ansfe r p ro toco l .R F C 959, I n t e r n e t E n g in e e r in g T a s k F o r c e, f t p : / /

    f t p . i s i . e d u / i n - n o t e s / r f c 959 . t x t . O c to be r 1 985 .[15] Yasush i S a i to , J e f f r ey C. M ogul , and BenV e r g he s e. A U s e n e t Pe r f o r ma n c e S tud y ,

    h t t p : / / w w w . r e s e a r c h . d i g i t a l . c o m / w r l / p r o j e c t s /n e w s be n c h / us e n e t . p s . N o ve mbe r 1 998 .[ 1 6] R i c h S M z . I n t e r N e tN e w s : U s e n e t T r a n s p o r t f o rIn te rne t S i tes . I n US EN IX Conference Proceedings,pages 93 - 98 , San Antonio , TX, Summer 1992 .

    U S E N I X .[17] R .W. Sche if ier and J . G e t tys . The X W indow Sys -

    t e m . A C M Trans . on Graphics , 5(2) :79-109, Apr i l1986.[18] W .R . S tevens . TC P/ /IP Il lus trated Volume 1.Addison-W esley , Reading , M A, 1994 .[ 1 9] G . R . W r ig h t a n d W . R . S t e ve ns . T C P / / IP I l l u s -

    trated Volume 2. Addison-Wesley , Reading , MA,1995.

    43

  • 8/8/2019 Application Performance Pitfalls and TCP's Nagle Algorithm

    9/9

    Define a new f lag value for the tcp_f lags f iel ~ in tep_var.h:* * * # d e f i n e T F _ S M A L L _ P R E V O x S O 0On en try to t cp_output 0 ~n par t icula5 be fore the a g a i n labeO, record the number of bytes in the socket buf fer:4 7 s t r u c t s o c k e t * s o = i n p - > i n p _ s o c k e t ;* * * i n t s t a r t _ c c = s o - > s o _ s n d . s b _ c c ;Af te r the check made by the t radi tiona l Nag le a~omthm , wh ich sends a smal l packe t a t the end o f the socke t buf ferin certain s ituations , al low sending in one more s i tuation: ~ the previous packet was not '%mall" (note that thesetwo conditional sta tem en ts could be merged):1 4 4 i f ( ( i d l e II t p - > t _ f l a g s ~ T F _ N O D E L A Y ) ~1 4 5 l e n + o f f > = s o - > s o _ s n d . s b _ c c )1 4 6 g o t o s e n d ;* * * i f ( ( ( t p - > t _ f l a g s ~ T F _ S M A L L _ P R E V ) = = O ) ~* * * ( l e n ~ o f f > = s o - > s o _ s n d . s b _ c c ) ) {* * * / * e n d - o f - b u f f e r ~ p r e v i o u s p a c k e t w a s n o t " s m a l l " * /* * * g o t o s e n d ;* * * }Jus t before sending a pac ker rem em ber that i t was '~mall" ~ the length is less than the M SS, and ~ the originalbuf fer s~e was not a mult iple of the clus ter s ize:2 2 2 send:* * * / * R e m e m b e r e n d o f m o s t r e c e n t " s m a l l " o u t p u t p a c k e t * /* * * i f ( ( l e n < t p - > t _ m a x s e g ) ~ ( ( s t a r t _ c c ~ M C L B Y T E S ) ! = 0 ) ) {* * * t p - > t _ f l a g s l T F _ S M A L L _ P K E V ;* * * }* * * e l s e* * * t p - > t _ f l a g s ~ = ~ T F _ S M A L L _ P R E V ;

    F i g u r e 1 0 : M o d i f i c a t i o n s t o tcp_output 0

    4 4