Upload
magdalen-sara-welch
View
219
Download
0
Embed Size (px)
Citation preview
Example: Rumor Performance Evaluation
Andy WangCIS 5930
Computer SystemsPerformance Analysis
Motivation
• Optimistic peer replication is popular– Intermittent connectivity– Availability of replicas for concurrent
updates– Convergence and correctness for updates
• Example: Rumor, Coda, Ficus, Lotus Notes, Outlook Calendar, CVS
2
Background
• Replication provides high availability• Optimistic replication allows immediate
access to any replicated item, at the risk of permitting concurrent updates
• Reconciliation process makes replicas consistent (i.e., two replicas for peer-to-peer)
3
Background Continued
• Conflicts occur when different replicas of the same file are updated subsequent to the previous reconciliation
4
Optimistic Replication Example
5
Log on Desktop10:00 Update10:25 Update
Log on Portable10:00 Update10:25 Update
connected
Log on Desktop10:00 Update10:25 Update10:40 Update
Log on Portable10:00 Update10:25 Update10:51 Update
disconnected
Example Continued
6
Log on Desktop10:00 Update10:25 Update10:40 Update
Log on Portable10:00 Update10:25 Update10:51 Update
disconnected
Log on Desktop10:00 Update10:25 Update10:40 Update10:51 Update
Log on Portable10:00 Update10:25 Update10:40 Update10:51 Update
connected
• Run reconciliation• Detect a conflict• Propagate updates
Goal
• Understand the cost characteristics of the reconciliation process for Rumor
7
Services
• Reconciliation– Exchange file system states– Detect new and conflicting versions
• If possible, automatically resolve conflicts• Else, prompt user to resolve conflicts
– Propagate updates
8
Outcomes
• Two reconciled replicas become consistent for all files and directories
• Some files remain inconsistent and require user to resolve conflicts
9
Metrics
• Time– Elapsed time
• From the beginning to the completion of a reconciliation request
– User time (time spent using CPU)– System time (time spent in the kernel)
• Failure rate– Number of incomplete reconciliations and
infinite loops (none observed)
10
Metrics not Measured
• Disk access time– Require complex instrumentations
• E.g., buffering, logging, etc.
• Network and memory resources– Not heavily used
• Correctness– Difficult to evaluate
11
Monitor Implementation
12
Spool-to-dump Spool-to-dumpRecon
Scanner Rfindstored Rrecon Server
Perl library
C++
Reconciliation Process
• Top-level Perl time command
Parameters
• System parameters– CPU (speed of local and remote servers)– Disk (bandwidth, fragmentation level)– Network (type, bandwidth, reliability)– Memory (size, caching effects, speed)– Operating system (type, version, VM
management, etc.)
13
Parameters (Continued)
• Workload parameters– Number of replicas– Number of files and directories– Number of conflicts and updates– Size of volumes (file size)
14
Workloads
• Update characteristics extracted from Geoff Kuenning’s traces
15
File access
Read-only
access
Read-write access
Nonshared access Shared access
Read access
Write access
2-way sharing 3+way sharing
Read access
Write access
Read access
Write access
Experimental Settings
• Machine model: Dell Latitude XP• CPU: x486 100 MHz• RAM: 36MB• Ethernet: 10Mb• Operating system: Linux 2.0.x• File system: ext3
16
Experimental Settings
• Should have documented the following as well– CPU: L1 and L2 cache sizes– RAM: Brand and type– Disk: brand, model, capacity, RPM, and
the size of on-disk cache– File system version
17
Experimental Design
• 255 full factorial design • Linear regression or multivariate linear
regression to model major factors• Target: 95% confidence interval
18
255 Full Factorial Design
• Number of replicas: 2 and 6• Number of files: 10 and 1,000• File size: 100 and 22,000 bytes• Number of directories: 10 and 100• Number of updates: 10 and 450
– Capped at 10 updates for 10 files• Number of conflicts: 0 /* typical */
19
255 Full Factorial Analysis
• Experiment errors < 3%
20
0 5 10 15 20 25 30 350
50
100
150
Elapsed time
measured time predicted time
Experimental number
Time (seconds)
0 5 10 15 20 25 30 350
10
20
30
40
User time
measured time predicted time
Experimental number
Time (seconds)
0 5 10 15 20 25 30 350
1
2
3
4
5
6
System time
measured time predicted time
Experimental number
Time (seconds)
Variation of Effects
• All major effects significant at 95% confidence interval
21
# files # dirs fileSize x #files
fileSize # updates0
102030405060708090
100
Top 5 effects for elapsed time
Factor
% Variation
# files # updates #files x #updates
fileSize fileSize x #files
0
20
40
60
80
100
Top 5 effects for system time
Factor
% Variation
# files # replicas # dirs #files x #updates
# updates0
20
40
60
80
100
Top 5 effects for user time
Factor
% Variation
Residuals vs. Predicted Time
• Clusters caused by dominating effects of files
22
0 20 40 60 80 100 120 140
-20-15-10
-505
101520
Elapsed time
Predicted time (seconds)
Residuals (seconds)
0 5 10 15 20 25 30 35 40
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
User time
Predicted time (seconds)
Residuals (seconds)0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
-0.5-0.4-0.3-0.2-0.1
00.10.20.30.40.5
System time
Predicted time (seconds)
Residuals (seconds)
Residuals vs. Experiment Numbers
• Residuals show homoscedasticity, almost
23
0 20 40 60 80 100 120 140 160 180
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
User time
Experimental number
residuals0 20 40 60 80 100 120 140 160 180
-0.5-0.4-0.3-0.2-0.1
00.10.20.30.40.5
System time
Experimental number
residuals
0 20 40 60 80 100 120 140 160 180
-20-15-10
-505
101520
Elapsed time
Experimental number
residuals
Quantile-Quantile Plot
• Residuals are normally distributed, almost
24
-3 -2 -1 0 1 2 3 4
-20-15-10
-505
101520
f(x) = 5.61253143490396 x + 4.93495530436048E-16R² = 0.97570585239607
Elapsed time
Normal quantiles
Residual quantiles
-3 -2 -1 0 1 2 3 4
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
f(x) = 0.124183670176851 x − 3.226948188583E-16R² = 0.952366702694788
User time
Normal quantiles
Residual quantiles-3 -2 -1 0 1 2 3 4
-0.5-0.4-0.3-0.2-0.1
00.10.20.30.40.5
f(x) = 0.112484959649303 x − 5.06606047559798E-18R² = 0.986338838838569
System time
Normal quantiles
Residual quantiles
Multivariate Regression
• Number of replicas: 2• Number of files: 4 levels, 10-600• File size: 22,000 bytes• Number of directories: 4 levels, 10-60• Number of updates: 0• Number of conflicts: 0 /* typical */• Number of repetitions: 5 per data point
25
Multivariate Regression
• Experiment errors < 7%
• All coefficients are significant
26
0 10 20 30 40 50 60 70 80 900
10
20
30
40
User time
measured time predicted time
Experiment number
Time (seconds)
0 10 20 30 40 50 60 70 80 900
20406080
100120140
Elapsed time
measured time predicted time
Experiment number
Time (seconds)
0 10 20 30 40 50 60 70 80 900
0.51
1.52
2.53
3.5
System time
measured time predicted time
Experiment number
Time (seconds)
Residuals vs. Predicted Time
• Elapsed time shows a bi-model trend
• User time shows an exponential trend
27
5 10 15 20 25 30 35
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
User time
Predicted time (seconds)
Residuals (seconds)
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
-0.5-0.4-0.3-0.2-0.1
00.10.20.3
System time
Predicted time (seconds)
Residuals (seconds)
30 40 50 60 70 80 90 100 110 120
-15
-10
-5
0
5
10
15
Elapsed time
Predicted time (seconds)
Residuals (seconds)
Residuals vs. Experiment Numbers
• Not so good for elapsed time and user time
28
0 10 20 30 40 50 60 70 80 90
-15
-10
-5
0
5
10
15
Elapsed time
Experiment number
Residuals
0 10 20 30 40 50 60 70 80 90
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
User time
Experiment number
residuals0 10 20 30 40 50 60 70 80 90
-0.5-0.4-0.3-0.2-0.1
00.10.20.3
System time
Experiment number
residuals
Quantile-Quantile Plot
• Residuals are not normally distributed for elapsed time and user time
29
-3 -2 -1 0 1 2 3
-15
-10
-5
0
5
10
15f(x) = 5.6774814834728 x − 3.74753980933428E-14R² = 0.84068455127645
Elapsed time
Normal quantiles
Residual quantiles
-3 -2 -1 0 1 2 3
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1f(x) = 0.481071580575666 x − 1.8682654604378E-15R² = 0.924255360680913
User time
Normal quantiles
Residual quantiles
-3 -2 -1 0 1 2 3
-0.5-0.4-0.3-0.2-0.1
00.10.20.3
f(x) = 0.132069999118134 x − 2.51384352224851E-15R² = 0.978920253463901
System time
Normal quantiles
Residual quantiles
Log Transform (User Time)
• ANOVA tests failed miserably
30
0.9 1 1.1 1.2 1.3 1.4 1.5 1.6
-0.06-0.05-0.04-0.03-0.02-0.01
00.010.020.030.04
User time
Predicted time (seconds)
Residuals (seconds)
0 10 20 30 40 50 60 70 80 90
-0.06-0.05-0.04-0.03-0.02-0.01
00.010.020.030.04
User Time
Experiment number
residuals -3 -2 -1 0 1 2 3
-0.06-0.05-0.04-0.03-0.02-0.01
00.010.020.030.04
f(x) = 0.0222199973685429 x − 1.28549373927752E-15R² = 0.870897001030419
User time
Normal quantiles
Residual quantiles
Residual Analyses (User Time)
• No indications that transforms can help…
31
5 10 15 20 25 30 35 400
0.05
0.1
0.15
0.2
0.25
Mean user time
Standard deviation of
residuals
5 10 15 20 25 30 35 400
0.01
0.02
0.03
0.04
0.05
0.06
Mean user time
Variance of residuals
0 200 400 600 800 1000 12000
0.05
0.1
0.15
0.2
0.25
stdev errors
Mean user time squared
Standard deviation of
residuals
Possible Explanations
• i-node related factors– Number of files per directory block– Crossing block boundary may cause
anomalies• Caching effects
– Reboot needed across experiments
32
Linear Regression
• Number of files: 100, 150, 200, 250, 252, 253, 300, 350, 400, 450 – Test for the boundary-crossing condition as
the number of files exceeds one block– Note that Rumor has hidden files
• Number of repetitions: 5 per data point• Flush cache (reboot) before each run
33
Linear Regression
• R2 > 80%• All coefficients are
significant
34
50 100 150 200 250 300 350 400 450 5000
20406080
100
Elapsed time
measured timepredicted time95% confidence interval
Number of files
Time (seconds)
50 100 150 200 250 300 350 400 450 5000
1
2
3
System time
measured timepredicted time95% confidence interval
Number of files
Time (seconds)
50 100 150 200 250 300 350 400 450 50005
10152025
User time
measured timepredicted time95% confidence interval
Number of files
Time (seconds)
Residuals vs. Predicted Time
• Elapsed time shows a bi-model trend
• User time shows an exponential trend
35
35 40 45 50 55 60 65 70 75 80 85
-15
-10
-5
0
5
10
15
Elapsed time
Predicted time (seconds)
Residuals (seconds)
1.2 1.4 1.6 1.8 2 2.2 2.4
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0.3
System time
Predicted time (seconds)
Residuals (seconds)
8 10 12 14 16 18 20 22 24 26
-0.4-0.3-0.2-0.1
00.10.20.30.40.50.6
User time
Predicted time (seconds)
Residuals (seconds)
Residuals vs. Experiment Numbers
• Elapsed time shows a rising bi-modal trend– Randomization of
experiments may help
36
0 10 20 30 40 50 60
-15
-10
-5
0
5
10
15
Elapsed time
Experiment number
residuals
0 10 20 30 40 50 60
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0.3
System time
Experiment number
residuals
0 10 20 30 40 50 60
-0.4-0.3-0.2-0.1
00.10.20.30.40.50.6
User time
Experiment number
residuals
Quantile-Quantile Plot
• Error residuals for elapsed time is not normal – Perhaps piece-wise
normal
37
-3 -2 -1 0 1 2 3
-15
-10
-5
0
5
10
15f(x) = 5.82178334927256 x + 2.58046606262658E-15R² = 0.87800554257113
Elapsed time
Normal quantiles
Residual quantils
-3 -2 -1 0 1 2 3
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0.3
f(x) = 0.0976338391551245 x − 4.46690697919164E-16R² = 0.969293820421059
System time
Normal quantiles
Residual quantiles
-3 -2 -1 0 1 2 3
-0.4-0.3-0.2-0.1
00.10.20.30.40.50.6
f(x) = 0.213446556701086 x + 1.49533417053058E-15R² = 0.970879846787612
User time
Normal quantiles
Residual quantiles
Possible Explanations
• i-node related factors: No• Caching effects: No• Hidden factors: Maybe• Bugs: Maybe
38
Conclusion
• Identified the number of files as the dominating factor for Rumor running time
• Observed the existence of an unknown factor in the Rumor performance model
39
40
White Slide