6
Generating Realistic TCP Workloads Generating Realistic TCP Workloads Felix Hernandez-Campos Felix Hernandez-Campos Ph. D. Candidate Ph. D. Candidate Dept. of Computer Science Dept. of Computer Science Univ. of North Carolina at Chapel Hill Univ. of North Carolina at Chapel Hill Recipient of the 2001 CMG Fellowship Recipient of the 2001 CMG Fellowship Joint work with Joint work with F. Donelson Smith and Kevin Jeffay F. Donelson Smith and Kevin Jeffay 2 2 Experimental Networking Research Experimental Networking Research and Performance Evaluation and Performance Evaluation Evaluating network protocols and Evaluating network protocols and mechanisms requires careful experimentation mechanisms requires careful experimentation Network simulation (NS, Network simulation (NS, Opnet Opnet , , etc. etc. ) ) Network testbeds Network testbeds A critical element of these experiments is the A critical element of these experiments is the trafc workload trafc workload Are current workloads Are current workloads realistic realistic ? ? Let Let s look at some measurements s look at some measurements Sprint Sprint s tier-1 backbone network s tier-1 backbone network 3 3 OC-48 Link in San Jose, CA February 3 rd , 2004 TCP is the dominant transport protocol © 2004 Sprint ATL 4 4 Internet trafc is driven by a large number of applications © 2004 Sprint ATL

Generating Realistic TCP Workloads - cs.unc.educs.unc.edu/~jeffay/talks/CMG-04-slides.pdfGenerating Realistic TCP Workloads Felix Hernandez-Campos Ph. D. Candidate Dept. of Computer

Embed Size (px)

Citation preview

Page 1: Generating Realistic TCP Workloads - cs.unc.educs.unc.edu/~jeffay/talks/CMG-04-slides.pdfGenerating Realistic TCP Workloads Felix Hernandez-Campos Ph. D. Candidate Dept. of Computer

Generating Realistic TCP WorkloadsGenerating Realistic TCP Workloads

Felix Hernandez-CamposFelix Hernandez-Campos

Ph. D. CandidatePh. D. Candidate

Dept. of Computer ScienceDept. of Computer Science

Univ. of North Carolina at Chapel HillUniv. of North Carolina at Chapel Hill

Recipient of the 2001 CMG FellowshipRecipient of the 2001 CMG Fellowship

Joint work withJoint work with

F. Donelson Smith and Kevin JeffayF. Donelson Smith and Kevin Jeffay

22

Experimental Networking ResearchExperimental Networking Researchand Performance Evaluationand Performance Evaluation

•• Evaluating network protocols andEvaluating network protocols andmechanisms requires careful experimentationmechanisms requires careful experimentation–– Network simulation (NS, Network simulation (NS, OpnetOpnet, , etc.etc.))–– Network testbedsNetwork testbeds

•• A critical element of these experiments is theA critical element of these experiments is thetraffic workloadtraffic workload–– Are current workloads Are current workloads realisticrealistic??

•• LetLet’’s look at some measurementss look at some measurements–– SprintSprint’’s tier-1 backbone networks tier-1 backbone network

33

OC-48 Link in San Jose, CA

February 3rd, 2004

TCP is the dominant transport protocol

© 2004 Sprint ATL

44

Internet traffic is driven by a large number of applications

© 2004 Sprint ATL

Page 2: Generating Realistic TCP Workloads - cs.unc.educs.unc.edu/~jeffay/talks/CMG-04-slides.pdfGenerating Realistic TCP Workloads Felix Hernandez-Campos Ph. D. Candidate Dept. of Computer

55

RouterRouter RouterRouter

EthernetEthernetSwitchSwitch

TrafficTrafficGeneratorsGenerators

TrafficTrafficGeneratorsGenerators

100/1000100/1000MbpsMbps

EthernetSwitch

10001000MbpsMbps

10001000MbpsMbps

100100MbpsMbps

100100MbpsMbps

Internet Traffic GenerationInternet Traffic GenerationTestbed Testbed exampleexample

66

RouterRouter RouterRouter

EthernetEthernetSwitchSwitch

TrafficTrafficGeneratorsGenerators

TrafficTrafficGeneratorsGenerators

100/1000100/1000MbpsMbps

EthernetSwitch

10001000MbpsMbps

10001000MbpsMbps

100100MbpsMbps

100100MbpsMbps

Internet Traffic GenerationInternet Traffic GenerationTestbed Testbed exampleexample

•• Evaluate queuing mechanisms in the routersEvaluate queuing mechanisms in the routers

•• Evaluate transport protocolsEvaluate transport protocols

•• Evaluate intrusion/anomaly detection mechanismsEvaluate intrusion/anomaly detection mechanisms

•• Evaluate traffic monitoring techniquesEvaluate traffic monitoring techniques

•• ……

77

Traffic GenerationTraffic GenerationState-of-the-ArtState-of-the-Art

•• Open-loopOpen-loop–– Large number of sophisticated modelsLarge number of sophisticated models

»» Packet-level modelingPacket-level modeling

–– But TCP is a closed-loop protocolBut TCP is a closed-loop protocol»» Open-loop traffic generation breaks reliability, flowOpen-loop traffic generation breaks reliability, flow

control, and congestion controlcontrol, and congestion control

•• Closed-loopClosed-loop–– The idea is to simulate the behavior ofThe idea is to simulate the behavior of

users/applicationsusers/applications»» Source-level modeling of specific applicationSource-level modeling of specific application

88

Application-Specific ModelingApplication-Specific ModelingHTTP Model ExampleHTTP Model Example

Web

Page

Request sizes,

File sizes

Inactive OFF

times

(user think time)

Active OFF times

Requests

to same

server

TimeBase HTML

Base HTML

Embedded

Embedded

Embedded

Embedded

Embedded

Page 3: Generating Realistic TCP Workloads - cs.unc.educs.unc.edu/~jeffay/talks/CMG-04-slides.pdfGenerating Realistic TCP Workloads Felix Hernandez-Campos Ph. D. Candidate Dept. of Computer

99

SURGE HTTP ModelSURGE HTTP Model[[Barford Barford and and CrovellaCrovella, 1998], 1998]

Web

Page

Request sizes,

File sizes

Inactive OFF

times

(user think time)

Active OFF times

Requests

to same

server

TimeBase HTML

Base HTML

Embedded

Embedded

Embedded

Embedded

Embedded

Embedded Object Sizes:Embedded Object Sizes:

Lognormal(Lognormal(µµ = 8.215, = 8.215, = 1.46) = 1.46)

Base HTML Sizes:Base HTML Sizes:

Lognormal(Lognormal(µµ = 7.63, = 7.63, = 1.001) = 1.001) –– BodyBody

Pareto(Pareto(kk = 10K, = 10K, = 1.0) = 1.0) –– TailTail

OFF Time Durations:OFF Time Durations:

Pareto(Pareto(kk = 10K, = 10K, = 1.0) = 1.0)

No. of EmbeddedNo. of Embedded

ObjectsObjects: Pareto(: Pareto(kk = 2, = 2, = =

1.245)1.245)

1010

Application-Specific ModelsApplication-Specific ModelsShortcomingsShortcomings

•• Creating application models is a challenging, time-Creating application models is a challenging, time-consuming taskconsuming task

–– Traffic mixes are driven by a large number of applicationsTraffic mixes are driven by a large number of applications

•• Set of dominant applications evolves quicklySet of dominant applications evolves quickly–– Applications themselves also evolveApplications themselves also evolve–– We cannot even identify a significant fraction of the trafficWe cannot even identify a significant fraction of the traffic

•• Modeling closed application protocols requiresModeling closed application protocols requiresreverse-engineeringreverse-engineering

•• Privacy considerations complicate data acquisitionPrivacy considerations complicate data acquisition–– TCP header tracesTCP header traces are ok, are ok, application dataapplication data is not is not

1111

Our ApproachOur Approach

•• Abstract source-level modelingAbstract source-level modeling–– Application-neutral technique to describe the source-levelApplication-neutral technique to describe the source-level

behavior of any TCP connectionbehavior of any TCP connection–– Efficient analysis applicable any arbitrary trace of TCPEfficient analysis applicable any arbitrary trace of TCP

headers (no analysis of payloads)headers (no analysis of payloads)–– We can also measure network-level parameters, such asWe can also measure network-level parameters, such as

round-trip times, receiver window sizes, round-trip times, receiver window sizes, etc.etc.

•• Source-level trace replaySource-level trace replay–– Replaying abstract source-level behaviorReplaying abstract source-level behavior

•• Validation in network Validation in network testbedtestbed–– We wrote a distributed, scalable traffic generator (We wrote a distributed, scalable traffic generator (tmixtmix))–– Comparison of original and synthetic tracesComparison of original and synthetic traces

1212

Different Views of Internet TrafficDifferent Views of Internet TrafficAbstract Source-level ModelingAbstract Source-level Modeling

UNC Internet

time

DATA

from UNCDATA

from Internet

DATA

UNC

DATA

from Inet

Request URL HTML Source Req. Image

Aggregate Packet

Arrival Level

Single-Flow Packet

Arrivals

Abstract

Source Level

Application Level

(e.g., web traffic)

2,500 bytes 4,800 bytes 800 b 1,800 b

Our Approach

Per-Flow Filtering

Page 4: Generating Realistic TCP Workloads - cs.unc.educs.unc.edu/~jeffay/talks/CMG-04-slides.pdfGenerating Realistic TCP Workloads Felix Hernandez-Campos Ph. D. Candidate Dept. of Computer

1313

Client-Server ApplicationsClient-Server ApplicationsPersistent HTTP ExamplePersistent HTTP Example

TIME329 b

803 b

BROWSER

SERVER

HTTP

Request 1

HTTP

Response 1

403 b

25,821 b

HTTP

Request 2

HTTP

Response 2

356 b

1,198 b

HTTP

Request 3

HTTP

Response 3

0.12

secs

3.12

secs

Epoch 2 Epoch 3Epoch 1

•• We call a pair of application data units (We call a pair of application data units (ADUsADUs) that) thatcarry a request/response exchange an carry a request/response exchange an epochepoch

•• Quiet timesQuiet times are also part of the workload of TCP are also part of the workload of TCP

1414

Sequential A-b-t ModelSequential A-b-t Model

•• Abstract source-level model for describing theAbstract source-level model for describing theworkload of TCP connectionsworkload of TCP connections

•• Each connection is summarized using a Each connection is summarized using a connectionconnectionvectorvector of the form of the form C Cii = ( = (ee11, , ee22,,……, , eenn) with ) with nn 11epochsepochs

–– Each epoch has the formEach epoch has the form e ejj = ( = (aajj, , tatajj, , bbjj, , tbtbjj))

•• Connection vectors can be extracted from trace ofConnection vectors can be extracted from trace ofTCP segment headersTCP segment headers

–– Sequence number directionality, timing analysis, writeSequence number directionality, timing analysis, writesize and packet size interactionssize and packet size interactions

–– OO((n log nn log n) + ) + OO((nn**WW))

1515

Client-Server ApplicationsClient-Server ApplicationsSMTP and NNTP ExamplesSMTP and NNTP Examples

TIME

93 b

SMTP

SENDER

SMTP

RECEIVER

220 Host

Info

32 b

HELO

191 b

250 Domain

Info

MAIL

77 b

59b

250 Ok

RCPT

75b

38b

250 Ok

DATA

6b 22,568 b

Email

Message

50b

250 Ok

44b

250 Ok

TIME

53 b

NNTP

READER

NNTP

SERVER

200 News

Srv 5.7b1

MODE

READER ARTICLE n

12 b

1056 bytes

220 n <id1> article

retrieved [article]

13 b

42 b

200 News

Service

GROUP

unc.help

19 b

42 b

unc.support

status

GROUP

unc.test

15 b

32 b

unccs.test

status

XOVER

15 b

32 b

224 data

[header n]

5.02 secs

1616

Beyond the Client-Server ModelBeyond the Client-Server ModelIcecast Icecast –– Internet Radio Internet Radio

TIME392bPLAYER

SERVER

Request 100

msecs.

48

ms.

217b 1253b 1686b 432b 863b 2442b 436b

105

msecs.

21

ms.

100

msecs.

34

ms.

1671b

108

msecs.

Audio Frames

•• Server PUSH applications do not follow theServer PUSH applications do not follow thetraditional client-server modeltraditional client-server model

•• The sequential a-b-t model is still applicableThe sequential a-b-t model is still applicable–– Make Make aaii and and tbtbii zerozero

Page 5: Generating Realistic TCP Workloads - cs.unc.educs.unc.edu/~jeffay/talks/CMG-04-slides.pdfGenerating Realistic TCP Workloads Felix Hernandez-Campos Ph. D. Candidate Dept. of Computer

1717

Beyond the Client-Server ModelBeyond the Client-Server ModelNNTP in Stream-ModeNNTP in Stream-Mode

TIME52 b

NNTP

PEER

NNTP

PEER

201 Server

Ready

13 b

MODE

STREAM

43 b

203

StreamOK

CHECK

<id1>

41 b

CHECK

<id2>

41 b

CHECK

<id3>

43 b

TAKETHIS <id2>

[article]

15,678 bytes

49 b

438

don’t

send

<id1>

CHECK

<id4>

41 b

42 b

238

send

<id2>

51 b

438

don’t

send

<id3>

49 b

438

don’t

send

<id4>

TIME68 b

PEER A

PEER B

BitTorrent

Protocol

68 b

BitTorrent

Protocol

657 b

Bitfield

657 b

Bitfield

5b

Unchoke

5b

Interested

5b

Interested

17b

16397 b

Piece

i

Request

Piece i

17b

Request

Piece j

17b

Request

Piece k

17b

Request

Piece l

17b

Request

Piece m

16397 b

Piece

j

16397 b

Piece

k

16397 b

Piece

l

16397 b

Piece

m

1818

Concurrent A-b-t ModelConcurrent A-b-t Model

•• Some connections are said to exhibit Some connections are said to exhibit data exchangedata exchangeconcurrencyconcurrency

•• Two reasons:Two reasons:–– Increasing performanceIncreasing performance–– Enabling natural concurrencyEnabling natural concurrency

•• Concurrent a-b-t model describes each side of theConcurrent a-b-t model describes each side of theconnection separatelyconnection separately

((((aa11, , tata11), (), (aa22, , tata22),),……, (, (aann, , tatann))))((((bb11, , tbtb11), (), (bb22, , tbtb22),),……, (, (bbmm, , tbtbmm))))

•• Concurrency can be detected with high probabilityConcurrency can be detected with high probability–– p.p.seqnoseqno > > q.q.acknoackno and and q.q.seqnoseqno > > p.p.acknoackno–– O(O(nn**WW))

1919

Source-Level Trace ReplaySource-Level Trace ReplayTraffic Generation in Lab Traffic Generation in Lab TestbedTestbed

Anonymized Packet

Header Trace

Source-level Trace:

Set of Connection Vectors

Traffic

Generators

Traffic

Generators

Processing

Workload Partitioning

Synthetic Packet

Header Trace

TESTBED

Source-level

Performance

Metrics

2020

Sample ComparisonSample Comparison

Page 6: Generating Realistic TCP Workloads - cs.unc.educs.unc.edu/~jeffay/talks/CMG-04-slides.pdfGenerating Realistic TCP Workloads Felix Hernandez-Campos Ph. D. Candidate Dept. of Computer

2121 2222

ConclusionConclusion

•• New method for modeling traffic mixesNew method for modeling traffic mixes–– Empirically-derived connection vectorsEmpirically-derived connection vectors–– Studied sequential vs. concurrent dichotomyStudied sequential vs. concurrent dichotomy–– Fully automated, efficient analysisFully automated, efficient analysis

•• New traffic generation approachNew traffic generation approach–– Enables comparison of real and synthetic trafficEnables comparison of real and synthetic traffic–– Implemented a distributed traffic generatorImplemented a distributed traffic generator–– Techniques for scaling traffic loadTechniques for scaling traffic load