Characteristic Studies of Distributed Systems

1

Characteristic Studiesof Distributed Systems

Maryam Rahmaniheris & Daniel Uhlig

2

Distributed System Implementations

Characteristics of systems in real worldHow users and systems behave

P2P Track clients and traffic on Kazaa Explore and model usage

Cloud Computing Experiment with Amazon Web Services Measure performance metrics

3

Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D.

Gribble, Henry M. Levy, And John Zahorjan

Measurement, Modeling, and Analysis of Peer-to-Peer File-Sharing Work

Load

Peer To Peer Characteristics

Presented by Daniel Uhlig

4

Modeling of P2P Systems

Observe clients and users behavior in real world

Used large P2P system: Kazaa Closed P2P protocal Fasttrack network (still exists?) Peaked at 2.3 million users in 2003 (more than

Napster’s peak) Now subscription DRM music store

5

Observing P2P

Recorded all Kazaa traffic at U. of Washington 60,000 faculty, staff, and students

203 day trace (late spring semester past the end of fall semester)

Protected privacy by making data anonymousAnalyze data, develop model

compared to web: Is there a better comparison?

6

Data Log

• Recorded all Kazaa traffic– Incoming and outgoing data transfer and searches– HTTP traffic with username in header– KazaaLite showed up 60 days into trace

• Username hardcoded so used IP to differentiated. • 20 terabytes of incoming data• 1.6 Million Requests • 25,000 users• 8.85 TB of unique objects• Paper used requests from university peers to

external peers. • Will this bias results?

7

Kazaa Objects

Fetch at most once 94% of Kazaa files vs 57% of web objects 99% of Kazaa files fetch at most twice Clients download objects just once

What is an object? Authors assume immutable object that is unique Same song

Still fetch at most once?

- Different filename?- Different bitrate?- Different Length?- Encoded by different user?

8

Users are Patient

• Web users want instant access, P2P users will wait • P2P users are patient • Small Objects – 30% 1 hr – 10% nearly a day

• Large Objects– 50% 1 day– 20% wait 1 week

• Is this accurate since client automatically restarts requests?

9

Users Slow Down

Took 30 day traces of ‘new’ usersBytes requested decrease with agePossible reasons?

Loss of interest New P2P app New ID

10

Users Slow Down

Users leave the system foreverUsers request less data as they ageCore clients have constant activity levelBut, less data request

11

Client Activity

• How to measure activity: – Logged in vs downloading?

• Average session length 2.4 minutes– Sessions could be split with a short term break in

transfer• Activity over lifetime = 5.5%• Average transfer 17 minutes, but average

session 2.4 minutes• Many transactions fail, looking for new host

peer

12

Workload

Large (>100 MB) vs Small (<10 MB) files

Request Volume vs Transfer volumeAudio clips vs Video clips

13

Object Dynamics

Clients fetch objects at-most-oncePopular objects quickly cycleNew objects are most popularMost requests are for old objects

Small Objects Large Objects

Top 10 Top 100

Top 10

Top 100

Overlap between first and last 30 days

0 of 10 5 of 100

1 of 10

44 of 100

# of popular objects that a less 30 days old

6 of 10 73 of 95

2 of 9 47 of 56

How does the idea of distinct objects affect this?

14

Zipf

Distribution where most popular objects are most fetched.

Classical result for web pages

15

Non-Zipf

• Authors propose that Kazaa traffic is NOT modeled by Zipf distribution.

• P2P differences from web – ‘fetch-at-most once’– Immutable object (cnn.com changes regularly, a

multimedia file does not)• Simulated model of behaviors and observed

16

Non-Zipf model

• Zipf seen as model in many places– Video on demand, video rental, movie tickets

• Non-Zipf might better explain some of these models• Common characteristics– Birth of new objects– Fetch at most once– Immutable

• Characteristics sometimes seen– Expensive to get object– Object remains

• Does their non-zipf model explain everything?

17

Really Non-Zipf?

• Multiple copies of the same object?– Does fetch at most once still hold

• Requests for established files handled by internal users? (‘cached’)

• Are objects immutable?– Changing names– Is new album/song a new object or an update from

artist? • Non-Zipf in other multimedia– YouTube, Video Rental, DVD purchase, movie tickets?

18

Locality Awareness

• Conserve University P2P bandwidth• 86% of objects requested already at U of W– Cache data (legal issues)– Redirector so request stay internal when possible

• Few key nodes can save significant bandwidth

19

Discussion Points

• What is an unique item– Does this affect the distribution of popular objects?

• Are objects immutable• Apply ideas to other multimedia:

– YouTube video popularity– Still fetch at most once– Non-Zipf for DVD rental or purchase

• How to define a unique object• Should P2P handle large and small objects differently• Caching or other forced locality vs P2P built in

locality

20

P r e s e n t e d b y :

Maryam Rahmaniher is

U n i v e r s i t y o f I l l i n o i s a t U r b a n a - C h a m p a i g nC S 5 2 5 - S p r i n g 2 0 0 9

An Evaluation of Amazon's Grid Computing Services: EC2, S3, SQS

Simon L. Garfinkel

21

Cluster Computing

Building your own cluster Costly

Space Cooling system Staff…

Underutilization Overutilization

Cloud computing Computing as a utility Time-multiplexing of resources No need fro planning ahead Elasticity Amazon’s AWS

You only need a working credit card

22

Amazon AWS

EC2 (Elastic Compute Cloud) Linux virtual machines for 10 cents per CPU hour

S3 (Simple Storage Service): Data storage for 15 cents per gigabyte per month

SQS (Simple Queue Service): Messaging service for 10 cents per thousands

messages

23

An Example Application- GrepTheWeb

24

An Example Application

25

AWS Interface

Creating an Amazon AWS account

Signing up for individual services

Using REST API to start up virtual machines on EC2 and accessing data on S3 HTTP commands

GET PUT DELETE

26

AWS Security

Accessing the account information Stolen account password Resetting the lost password through e-mail

Eavesdropping Stolen e-mail password

Does not provide snapshots or backups Multiple EC2 machines Multiple S3 Multiple AWS accounts

Does not guarantee the privacy of data on S3 Encryption Digital signatures

27

S3 Evaluation-Test Scenarios

Throughput and TPS A bucket containing 15 objects, three each in sizes:

1 byte 1 Kbyte 1 Mbyte 16 Mbyte 100 Mbyte

Different objects are used to minimize the effect of caching

Measuring end-to-end performance: A series of successive probes

The delay between the probes follows Poisson distribution Surge experiment

No delay between the queries Between 1 and 6 threads executing at any moment

A total of 137,099 probes in 2007 Single EC2 instance to S3 (32,269) Distributed test bed (39,269) Surge experiment (74,808)

28

Average Daily Read Throughput

A minor change in the network topology made by Amazon Introducing additional delay b/w EC2 and S3 Small TCP window size

29

CDF of Read and Write Throughput March 22-April8

30

Results

S3 performs better with larger transaction sizes High per-transaction overhead

40% of 1-byte writes are slower than 10 TPS as opposed to 5% of 1-byte reads Writes must be committed to at least 2 different clusters

Amazon reliability guaranteeThe median of 1-mbyte write bandwidth is

roughly 5 times faster Write transactions are being acknowledged when the data

is written to cache when the transaction raises to the cache size the

difference disappears

31

Query Variance

Low Correlation

High Correlation

32

Results

Lower correlation for the 1 byte transactions Sending a second request rather than waiting Issuing two simultaneous requests for time-critical

issues

Higher correlation for the 100 Mbyte transactions Simultaneous requests, once they start providing

data, are likely to take similar amount of time

33

Concurrent Performance :Improving S3 performance by issuing concurrent request

to the same bucket

Performance of 100MB GETs from S3 for one thread and combined threads

Surge experimentTwo VMs on two EC2 clusters

Executing 1 thread for 10 min and then 2 for 10 min and …

The experiment was run for 11 hoursThe Amount of BW received by

each thread is cut in half

The 6 threads have three times the

aggregate BW of 1 thread

34

Other Experiments

Availability From 107,556 non-surge tests consisting of multiple

read and write probes: 6 write retries and 3 write errors 4 read retries

100% availability with proper retry mechanismThroughput

From the analysis of 19,630 1 Mbyte and greater transactions: No write probes with a throughput less than 10 KB/s 6 write probes with a throughput less than 100 KB/s

35

Experience with EC2 and SQS

EC2 instances Fast Responsive Reliable

1 unscheduled reboot, no lost data 1 instance freeze, lost data

SQS Simple API Insert

One message at a time 4 messages per second

Remove In batches of 256: 5 messages per second One message at a time: 2.5 messages per second

36

Conclusion

EC2 provides ready-to-go VMs at a reasonable cost

S3 delivers a high performance only for transactions of size 16Mb or larger High per-transaction overhead

S3 delivers a much higher performance to EC2 than to other locations on the Internet

Limited SQS throughput 4-5 transactions per second per thread

High availability

Security risks

37

Discussion Points

High correlation for large object transactions Load balancer Larger # of replicas Scheduling policy More noticeable variance in smaller scales

Why performance of SQS isn’t sufficient for scheduling tasks faster than seconds or slower than several hours?

Google AppEngine or Amazon EC2 What are the best candidate applications for each of them?

What are the advantages of Amazon AWS over shared resources in Grids?

What are the disadvantages over dedicated clusters?

What will the funders of research projects feel about providing fund to pay Amazon bills instead of building dedicated clusters?

Documents

Characteristic Studies of Distributed Systems