Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis

Example: Rumor Performance Evaluation

Andy WangCIS 5930

Computer SystemsPerformance Analysis

Motivation

• Optimistic peer replication is popular– Intermittent connectivity– Availability of replicas for concurrent

updates– Convergence and correctness for updates

• Example: Rumor, Coda, Ficus, Lotus Notes, Outlook Calendar, CVS

2

Background

• Replication provides high availability• Optimistic replication allows immediate

access to any replicated item, at the risk of permitting concurrent updates

• Reconciliation process makes replicas consistent (i.e., two replicas for peer-to-peer)

3

Background Continued

• Conflicts occur when different replicas of the same file are updated subsequent to the previous reconciliation

4

Optimistic Replication Example

5

Log on Desktop10:00 Update10:25 Update

Log on Portable10:00 Update10:25 Update

connected

Log on Desktop10:00 Update10:25 Update10:40 Update

Log on Portable10:00 Update10:25 Update10:51 Update

disconnected

Example Continued

6

Log on Desktop10:00 Update10:25 Update10:40 Update

Log on Portable10:00 Update10:25 Update10:51 Update

disconnected

Log on Desktop10:00 Update10:25 Update10:40 Update10:51 Update

Log on Portable10:00 Update10:25 Update10:40 Update10:51 Update

connected

• Run reconciliation• Detect a conflict• Propagate updates

Goal

• Understand the cost characteristics of the reconciliation process for Rumor

7

Services

• Reconciliation– Exchange file system states– Detect new and conflicting versions

• If possible, automatically resolve conflicts• Else, prompt user to resolve conflicts

– Propagate updates

8

Outcomes

• Two reconciled replicas become consistent for all files and directories

• Some files remain inconsistent and require user to resolve conflicts

9

Metrics

• Time– Elapsed time

• From the beginning to the completion of a reconciliation request

– User time (time spent using CPU)– System time (time spent in the kernel)

• Failure rate– Number of incomplete reconciliations and

infinite loops (none observed)

10

Metrics not Measured

• Disk access time– Require complex instrumentations

• E.g., buffering, logging, etc.

• Network and memory resources– Not heavily used

• Correctness– Difficult to evaluate

11

Monitor Implementation

12

Spool-to-dump Spool-to-dumpRecon

Scanner Rfindstored Rrecon Server

Perl library

C++

Reconciliation Process

• Top-level Perl time command

Parameters

• System parameters– CPU (speed of local and remote servers)– Disk (bandwidth, fragmentation level)– Network (type, bandwidth, reliability)– Memory (size, caching effects, speed)– Operating system (type, version, VM

management, etc.)

13

Parameters (Continued)

• Workload parameters– Number of replicas– Number of files and directories– Number of conflicts and updates– Size of volumes (file size)

14

Workloads

• Update characteristics extracted from Geoff Kuenning’s traces

15

File access

Read-only

access

Read-write access

Nonshared access Shared access

Read access

Write access

2-way sharing 3+way sharing

Read access

Write access

Read access

Write access

Experimental Settings

• Machine model: Dell Latitude XP• CPU: x486 100 MHz• RAM: 36MB• Ethernet: 10Mb• Operating system: Linux 2.0.x• File system: ext3

16

Experimental Settings

• Should have documented the following as well– CPU: L1 and L2 cache sizes– RAM: Brand and type– Disk: brand, model, capacity, RPM, and

the size of on-disk cache– File system version

17

Experimental Design

• 255 full factorial design • Linear regression or multivariate linear

regression to model major factors• Target: 95% confidence interval

18

255 Full Factorial Design

• Number of replicas: 2 and 6• Number of files: 10 and 1,000• File size: 100 and 22,000 bytes• Number of directories: 10 and 100• Number of updates: 10 and 450

– Capped at 10 updates for 10 files• Number of conflicts: 0 /* typical */

19

255 Full Factorial Analysis

• Experiment errors < 3%

20

0 5 10 15 20 25 30 350

50

100

150

Elapsed time

measured time predicted time

Experimental number

Time (seconds)

0 5 10 15 20 25 30 350

10

20

30

40

User time


Experimental number

Time (seconds)

0 5 10 15 20 25 30 350

1

2

3

4

5

6

System time


Experimental number

Time (seconds)

Variation of Effects

• All major effects significant at 95% confidence interval

21

# files # dirs fileSize x #files

fileSize # updates0

102030405060708090

100

Top 5 effects for elapsed time

Factor

% Variation

# files # updates #files x #updates

fileSize fileSize x #files

0

20

40

60

80

100

Top 5 effects for system time

Factor

% Variation

# files # replicas # dirs #files x #updates

# updates0

20

40

60

80

100

Top 5 effects for user time

Factor

% Variation

Residuals vs. Predicted Time

• Clusters caused by dominating effects of files

22

0 20 40 60 80 100 120 140

-20-15-10

-505

101520

Elapsed time

Predicted time (seconds)

Residuals (seconds)

0 5 10 15 20 25 30 35 40

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

User time


Residuals (seconds)0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

System time


Residuals (seconds)

Residuals vs. Experiment Numbers

• Residuals show homoscedasticity, almost

23

0 20 40 60 80 100 120 140 160 180

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

User time

Experimental number

residuals0 20 40 60 80 100 120 140 160 180

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

System time

Experimental number

residuals

0 20 40 60 80 100 120 140 160 180

-20-15-10

-505

101520

Elapsed time

Experimental number

residuals

Quantile-Quantile Plot

• Residuals are normally distributed, almost

24

-3 -2 -1 0 1 2 3 4

-20-15-10

-505

101520

f(x) = 5.61253143490396 x + 4.93495530436048E-16R² = 0.97570585239607

Elapsed time

Normal quantiles

Residual quantiles

-3 -2 -1 0 1 2 3 4

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

f(x) = 0.124183670176851 x − 3.226948188583E-16R² = 0.952366702694788

User time

Normal quantiles

Residual quantiles-3 -2 -1 0 1 2 3 4

-0.5-0.4-0.3-0.2-0.1

00.10.20.30.40.5

f(x) = 0.112484959649303 x − 5.06606047559798E-18R² = 0.986338838838569

System time

Normal quantiles

Residual quantiles

Multivariate Regression

• Number of replicas: 2• Number of files: 4 levels, 10-600• File size: 22,000 bytes• Number of directories: 4 levels, 10-60• Number of updates: 0• Number of conflicts: 0 /* typical */• Number of repetitions: 5 per data point

25

Multivariate Regression

• Experiment errors < 7%

• All coefficients are significant

26

0 10 20 30 40 50 60 70 80 900

10

20

30

40

User time


Experiment number

Time (seconds)

0 10 20 30 40 50 60 70 80 900

20406080

100120140

Elapsed time


Experiment number

Time (seconds)

0 10 20 30 40 50 60 70 80 900

0.51

1.52

2.53

3.5

System time


Experiment number

Time (seconds)


• Elapsed time shows a bi-model trend

• User time shows an exponential trend

27

5 10 15 20 25 30 35

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

User time


Residuals (seconds)

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8

-0.5-0.4-0.3-0.2-0.1

00.10.20.3

System time


Residuals (seconds)

30 40 50 60 70 80 90 100 110 120

-15

-10

-5

0

5

10

15

Elapsed time


Residuals (seconds)


• Not so good for elapsed time and user time

28

0 10 20 30 40 50 60 70 80 90

-15

-10

-5

0

5

10

15

Elapsed time

Experiment number

Residuals

0 10 20 30 40 50 60 70 80 90

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

User time

Experiment number

residuals0 10 20 30 40 50 60 70 80 90

-0.5-0.4-0.3-0.2-0.1

00.10.20.3

System time

Experiment number

residuals


• Residuals are not normally distributed for elapsed time and user time

29

-3 -2 -1 0 1 2 3

-15

-10

-5

0

5

10

15f(x) = 5.6774814834728 x − 3.74753980933428E-14R² = 0.84068455127645

Elapsed time

Normal quantiles

Residual quantiles

-3 -2 -1 0 1 2 3

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1f(x) = 0.481071580575666 x − 1.8682654604378E-15R² = 0.924255360680913

User time

Normal quantiles

Residual quantiles

-3 -2 -1 0 1 2 3

-0.5-0.4-0.3-0.2-0.1

00.10.20.3

f(x) = 0.132069999118134 x − 2.51384352224851E-15R² = 0.978920253463901

System time

Normal quantiles

Residual quantiles

Log Transform (User Time)

• ANOVA tests failed miserably

30

0.9 1 1.1 1.2 1.3 1.4 1.5 1.6

-0.06-0.05-0.04-0.03-0.02-0.01

00.010.020.030.04

User time


Residuals (seconds)

0 10 20 30 40 50 60 70 80 90

-0.06-0.05-0.04-0.03-0.02-0.01

00.010.020.030.04

User Time

Experiment number

residuals -3 -2 -1 0 1 2 3

-0.06-0.05-0.04-0.03-0.02-0.01

00.010.020.030.04

f(x) = 0.0222199973685429 x − 1.28549373927752E-15R² = 0.870897001030419

User time

Normal quantiles

Residual quantiles

Residual Analyses (User Time)

• No indications that transforms can help…

31

5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

Mean user time

Standard deviation of

residuals

5 10 15 20 25 30 35 400

0.01

0.02

0.03

0.04

0.05

0.06

Mean user time

Variance of residuals

0 200 400 600 800 1000 12000

0.05

0.1

0.15

0.2

0.25

stdev errors

Mean user time squared

Standard deviation of

residuals

Possible Explanations

• i-node related factors– Number of files per directory block– Crossing block boundary may cause

anomalies• Caching effects

– Reboot needed across experiments

32

Linear Regression

• Number of files: 100, 150, 200, 250, 252, 253, 300, 350, 400, 450 – Test for the boundary-crossing condition as

the number of files exceeds one block– Note that Rumor has hidden files

• Number of repetitions: 5 per data point• Flush cache (reboot) before each run

33

Linear Regression

• R2 > 80%• All coefficients are

significant

34

50 100 150 200 250 300 350 400 450 5000

20406080

100

Elapsed time

measured timepredicted time95% confidence interval

Number of files

Time (seconds)

50 100 150 200 250 300 350 400 450 5000

1

2

3

System time


Number of files

Time (seconds)

50 100 150 200 250 300 350 400 450 50005

10152025

User time


Number of files

Time (seconds)


• Elapsed time shows a bi-model trend

• User time shows an exponential trend

35

35 40 45 50 55 60 65 70 75 80 85

-15

-10

-5

0

5

10

15

Elapsed time


Residuals (seconds)

1.2 1.4 1.6 1.8 2 2.2 2.4

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0.3

System time


Residuals (seconds)

8 10 12 14 16 18 20 22 24 26

-0.4-0.3-0.2-0.1

00.10.20.30.40.50.6

User time


Residuals (seconds)


• Elapsed time shows a rising bi-modal trend– Randomization of

experiments may help

36

0 10 20 30 40 50 60

-15

-10

-5

0

5

10

15

Elapsed time

Experiment number

residuals

0 10 20 30 40 50 60

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0.3

System time

Experiment number

residuals

0 10 20 30 40 50 60

-0.4-0.3-0.2-0.1

00.10.20.30.40.50.6

User time

Experiment number

residuals


• Error residuals for elapsed time is not normal – Perhaps piece-wise

normal

37

-3 -2 -1 0 1 2 3

-15

-10

-5

0

5

10

15f(x) = 5.82178334927256 x + 2.58046606262658E-15R² = 0.87800554257113

Elapsed time

Normal quantiles

Residual quantils

-3 -2 -1 0 1 2 3

-0.2-0.15

-0.1-0.05

00.05

0.10.15

0.20.25

0.3

f(x) = 0.0976338391551245 x − 4.46690697919164E-16R² = 0.969293820421059

System time

Normal quantiles

Residual quantiles

-3 -2 -1 0 1 2 3

-0.4-0.3-0.2-0.1

00.10.20.30.40.50.6

f(x) = 0.213446556701086 x + 1.49533417053058E-15R² = 0.970879846787612

User time

Normal quantiles

Residual quantiles

Possible Explanations

• i-node related factors: No• Caching effects: No• Hidden factors: Maybe• Bugs: Maybe

38

Conclusion

• Identified the number of files as the dominating factor for Rumor running time

• Observed the existence of an unknown factor in the Rumor performance model

39

40

White Slide

Documents

Example: Rumor Performance Evaluation Andy Wang CIS 5930 Computer Systems Performance Analysis