Example: Rumor Performance Evaluation

Preview:

DESCRIPTION

Example: Rumor Performance Evaluation. Andy Wang CIS 5930-03 Computer Systems Performance Analysis. Motivation. Optimistic peer replication is popular Intermittent connectivity Availability of replicas for concurrent updates Convergence and correctness for updates - PowerPoint PPT Presentation

Citation preview

Example: Rumor Performance Evaluation

Andy WangCIS 5930-03

Computer SystemsPerformance Analysis

Motivation• Optimistic peer replication is popular

– Intermittent connectivity– Availability of replicas for concurrent

updates– Convergence and correctness for updates

• Example: Rumor, Coda, Ficus, Lotus Notes, Outlook Calendar, CVS

2

Background• Replication provides high availability• Optimistic replication allows immediate

access to any replicated item, at the risk of permitting concurrent updates

• Reconciliation process makes replicas consistent (i.e., two replicas for peer-to-peer)

3

Background Continued• Conflicts occur when different replicas

of the same file are updated subsequent to the previous reconciliation

4

Optimistic Replication Example

5

Log on Desktop10:00 Update10:25 Update

Log on Portable10:00 Update10:25 Update

connected

Log on Desktop10:00 Update10:25 Update10:40 Update

Log on Portable10:00 Update10:25 Update10:51 Update

disconnected

Example Continued

6

Log on Desktop10:00 Update10:25 Update10:40 Update

Log on Portable10:00 Update10:25 Update10:51 Update

disconnected

Log on Desktop10:00 Update10:25 Update10:40 Update10:51 Update

Log on Portable10:00 Update10:25 Update10:40 Update10:51 Update

connected

• Run reconciliation• Detect a conflict• Propagate updates

Goal• Understand the cost characteristics of

the reconciliation process for Rumor

7

Services• Reconciliation

– Exchange file system states– Detect new and conflicting versions

• If possible, automatically resolve conflicts• Else, prompt user to resolve conflicts

– Propagate updates

8

Outcomes• Two reconciled replicas become

consistent for all files and directories• Some files remain inconsistent and

require user to resolve conflicts

9

Metrics• Time

– Elapsed time • From the beginning to the completion of a

reconciliation request– User time (time spent using CPU)– System time (time spent in the kernel)

• Failure rate– Number of incomplete reconciliations and

infinite loops (none observed)

10

Metrics not Measured• Disk access time

– Require complex instrumentations • E.g., buffering, logging, etc.

• Network and memory resources– Not heavily used

• Correctness– Difficult to evaluate

11

Monitor Implementation

12

Spool-to-dump Spool-to-dumpRecon

Scanner Rfindstored Rrecon Server

Perl library

C++

Reconciliation Process

• Top-level Perl time command

Parameters• System parameters

– CPU (speed of local and remote servers)– Disk (bandwidth, fragmentation level)– Network (type, bandwidth, reliability)– Memory (size, caching effects, speed)– Operating system (type, version, VM

management, etc.)

13

Parameters (Continued)• Workload parameters

– Number of replicas– Number of files and directories– Number of conflicts and updates– Size of volumes (file size)

14

Workloads• Update characteristics extracted from

Geoff Kuenning’s traces

15

File accessRead-only

access

Read-write access

Nonshared access Shared access

Read access

Write access

2-way sharing 3+way sharing

Read access

Write access

Read access

Write access

Experimental Settings• Machine model: Dell Latitude XP• CPU: x486 100 MHz• RAM: 36MB• Ethernet: 10Mb• Operating system: Linux 2.0.x• File system: ext3

16

Experimental Settings• Should have documented the following

as well– CPU: L1 and L2 cache sizes– RAM: Brand and type– Disk: brand, model, capacity, RPM, and

the size of on-disk cache– File system version

17

Experimental Design• 255 full factorial design • Linear regression or multivariate linear

regression to model major factors• Target: 95% confidence interval

18

255 Full Factorial Design

• Number of replicas: 2 and 6• Number of files: 10 and 1,000• File size: 100 and 22,000 bytes• Number of directories: 10 and 100• Number of updates: 10 and 450

– Capped at 10 updates for 10 files• Number of conflicts: 0 /* typical */

19

255 Full Factorial Analysis

• Experiment errors < 3%

20

0 5 10 15 20 25 30 350

20406080

100120140160

elapsed time

measured timepredicted time

experiment number

time (sec-onds)

0 5 10 15 20 25 30 3505

10152025303540

user time

measured timepredicted time

experiment number

time (sec-onds)

0 5 10 15 20 25 30 350123456

system time

measured timepredicted time

experiment number

time (sec-onds)

Variation of Effects• All major effects

significant at 95% confidence interval

21

#files#dirs

file size * #files

file size

#updates0

20

40

60

80

100top 5 effects for elapsed time

% variation

#files

#updates

#files * #updates

file size

file size * #files

020406080

100top 5 effects for system time

% variation

#files

#replicas

#dirs

#replicas *

#files

#files * #updates

020406080

100

top 5 effects for user time

% variantion

Residuals vs. Predicted Time

• Clusters caused by dominating effects of files

22

0 20 40 60 80 100 120 140

-20-15-10

-505

101520

elapsed time

predicted time

residuals

0 5 10 15 20 25 30 35 40

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

user time

predicted time

residuals0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

system time

predicted time

residuals

Residuals vs. Experiment Numbers

• Residuals show homoscedasticity, almost

23

0 20 40 60 80 100 120 140 160 180

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

user time

experiment number

residuals0 20 40 60 80 100 120 140 160 180

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

system time

experiment number

residuals

0 20 40 60 80 100 120 140 160 180

-20-15-10

-505

101520

elapsed time

experiment number

residuals

Quantile-Quantile Plot• Residuals are

normally distributed, almost

24

-3 -2 -1 0 1 2 3 4

-20-15-10

-505

101520

f(x) = 5.61253143490396 x + 4.93495530436048E-16R² = 0.97570585239607

elapsed time

normal quantiles

residual quantiles

-3 -2 -1 0 1 2 3 4

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

f(x) = 0.124183670176851 x − 3.226948188583E-16R² = 0.952366702694788

user time

normal quantiles

residual quantiles-3 -2 -1 0 1 2 3 4

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

f(x) = 0.112484959649303 x − 5.06606047559798E-18R² = 0.986338838838569

system time

normal quantiles

residual quantiles

Multivariate Regression• Number of replicas: 2• Number of files: 4 levels, 10-600• File size: 22,000 bytes• Number of directories: 4 levels, 10-60• Number of updates: 0• Number of conflicts: 0 /* typical */• Number of repetitions: 5 per data point

25

Multivariate Regression• Experiment errors <

7%• All coefficients are

significant

26

0 10 20 30 40 50 60 70 80 9005

10152025303540

user time

measured timepredicted time

experiment number

time (seconds)

0 10 20 30 40 50 60 70 80 900

20406080

100120140

elapsed time

measured timepredicted time

experiment number

time (seconds)

0 10 20 304050 6070 80900

0.51

1.52

2.53

3.5

system time

measured timepredicted time

experiment number

time (sec-onds)

Residuals vs. Predicted Time

• Elapsed time shows a bi-model trend

• User time shows an exponential trend

27

5 10 15 20 25 30 35

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

user time

predicted time

residuals1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8

-0.5-0.4-0.3-0.2-0.1

00.10.20.3

system time

predicted time

residuals

30 40 50 60 70 80 90 100 110 120

-15

-10

-5

0

5

10

15

elapsed time

predicted time

residuals

Residuals vs. Experiment Numbers

• Not so good for elapsed time and user time

28

0 10 20 30 40 50 60 70 80 90

-15

-10

-5

0

5

10

15

elapsed time

experiment number

residuals

0 10 20 30 40 50 60 70 80 90

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

user time

experiment number

residuals0 10 20 30 40 50 60 70 80 90

-0.5-0.4-0.3-0.2-0.1

00.10.20.3

system time

experiment number

residuals

Quantile-Quantile Plot• Residuals are not

normally distributed for elapsed time and user time

29

-3 -2 -1 0 1 2 3

-15

-10

-5

0

5

10

15f(x) = 5.6774814834728 x − 3.74753980933428E-14R² = 0.84068455127645

elapsed time

normal quantiles

residual quantiles

-3 -2 -1 0 1 2 3

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1f(x) = 0.481071580575666 x − 1.8682654604378E-15R² = 0.924255360680913

user time

normal quantiles

residual quantiles

-3 -2 -1 0 1 2 3

-0.5-0.4-0.3-0.2-0.1

00.10.20.3

f(x) = 0.132069999118134 x − 2.51384352224851E-15R² = 0.978920253463901

system time

normal quantiles

residual quantiles

Log Transform (User Time)

• ANOVA tests failed miserably

30

0.9 1 1.1 1.2 1.3 1.4 1.5 1.6

-0.06-0.05-0.04-0.03-0.02-0.01

00.010.020.030.04

user time

predicted time

residuals

0 10 20 30 40 50 60 70 80 90

-0.06-0.05-0.04-0.03-0.02-0.01

00.010.020.030.04

user time

experiment number

residuals -3 -2 -1 0 1 2 3

-0.06-0.05-0.04-0.03-0.02-0.01

00.010.020.030.04

f(x) = 0.0222199973685429 x − 1.28549373927752E-15R² = 0.870897001030419

user time

normal quantiles

residual quantiles

Residual Analyses (User Time)

• No indications that transforms can help…

31

5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

mean user time

standard deviation of

residuals

5 10 15 20 25 30 35 400

0.01

0.02

0.03

0.04

0.05

0.06

mean user time

variance of residuals

0 200 400 600 800 1000 12000

0.05

0.1

0.15

0.2

0.25

mean user time squared

standard deviation of

residuals

Possible Explanations• i-node related factors

– Number of files per directory block– Crossing block boundary may cause

anomalies• Caching effects

– Reboot needed across experiments

32

Linear Regression• Number of files: 100, 150, 200, 250,

252, 253, 300, 350, 400, 450 – Test for the boundary-crossing condition as

the number of files exceeds one block– Note that Rumor has hidden files

• Number of repetitions: 5 per data point• Flush cache (reboot) before each run

33

Linear Regression• R2 > 80%• All coefficients are

significant

34

0100

200300

400500

0

20

40

60

80

100

elapsed time

measured timepredicted time95% confidence interval

number of files

time (seconds)

0100

200300

400500

00.5

11.5

22.5

3

system time

measured timepredicted time95% confidence interval

number of files

time (seconds)

0 1002003004005000

5

10

15

20

25

user time

measured timepredicted time95% confidence interval

number of files

time (seconds)

Residuals vs. Predicted Time

• Elapsed time shows a bi-model trend

• User time shows an exponential trend

35

35 40 45 50 55 60 65 70 75 80 85

-15

-10

-5

0

5

10

15

elapsed time

predicted time

residuals

1.2 1.4 1.6 1.8 2 2.2 2.4

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0.3

system time

predicted time

residuals

8 10 12 14 16 18 20 22 24 26

-0.4-0.3-0.2-0.1

00.10.20.30.40.50.6

user time

predicted time

residuals

Residuals vs. Experiment Numbers

• Elapsed time shows a rising bi-modal trend– Randomization of

experiments may help

36

0 10 20 30 40 50 60

-15

-10

-5

0

5

10

15

elapsed time

experiment number

residuals

0 10 20 30 40 50 60

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0.3

system time

experiment number

residuals

0 10 20 30 40 50 60

-0.4-0.3-0.2-0.1

00.10.20.30.40.50.6

user time

experiment number

residuals

Quantile-Quantile Plot• Error residuals for

elapsed time is not normal – Perhaps piece-wise

normal

37

-3 -2 -1 0 1 2 3

-15

-10

-5

0

5

10

15f(x) = 5.82178334927256 x + 2.58046606262658E-15R² = 0.87800554257113

elapsed time

normal quantiles

residual quantilas

-3 -2 -1 0 1 2 3

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0.3

f(x) = 0.0976338391551245 x − 4.46690697919164E-16R² = 0.969293820421059

system time

normal quantiles

residual quantilas

-3 -2 -1 0 1 2 3

-0.4-0.3-0.2-0.1

00.10.20.30.40.50.6

f(x) = 0.213446556701086 x + 1.49533417053058E-15R² = 0.970879846787612

user time

normal quantiles

residual quantilas

Possible Explanations• i-node related factors: No• Caching effects: No• Hidden factors: Maybe• Bugs: Maybe

38

Conclusion• Identified the number of files as the

dominating factor for Rumor running time

• Observed the existence of an unknown factor in the Rumor performance model

39

40

White Slide