Upload
jonathan-beard
View
94
Download
2
Embed Size (px)
DESCRIPTION
Presentation for 11th European Workshop on Performance Engineering. Topic: Use of a Levy Distribution for Modeling Best Case Execution Time Variation
Citation preview
Use of a Levy Distribution for Modeling Best Case Execution Time Variation
Jonathan Beard, Roger Chamberlain
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Work also supported by:
1
Outline• Motivation!
• Stream Processing!
• Optimization Goals!
• Methodology!
• Distributions!
• Results
2
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Streaming Computing
3
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Streaming Computing
Kernel
3
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Streaming Computing
Kernel 1
Kernel 2
Kernel 3
Kernel 2
Stream
Stream
Stream
Stream
4
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Streaming Languages
StreamIt, Auto-Pipe, Brook, Cg, S-Net, Scala-Pipe, Streams-C and
many others
5
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Optimization
KernelFast
Slow
Super Fast
Medium
6
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A1 2
3 4
multi-core B1 2
3 4
More allocation choices, NUMA node A or B to
allocate stream.
7
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A1 2
3 4
multi-core B1 2
3 4
More allocation choices, NUMA node A or B to
allocate stream.
1 2
7
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Optimization
Kernel 1
Kernel 2
Kernel 3
Kernel 2
multi-core A1 2
3 4
multi-core B1 2
3 4
More allocation choices, NUMA node A or B to
allocate stream.
1 2
7
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Optimization
B CQ1 Q2A
A B C
“Stream” is modeled as a Queue
8
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Optimization
B CQ1 Q2A
A B C
“Stream” is modeled as a Queue
8
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Streaming on Multi-core Systems
• Commodity multi-core timer availability and latency • Frequency scaling and core migration • Measuring modifies the application behavior
Problem: Accurate measurement is very difficult. Is there a way to decide on a model without it.
9
We want good models for streaming systems on shared multi-core systems (i.e., a cluster)
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Derived Information
10
Expected Observed
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Derived Information
10
Expected Observed
Is there a pattern of minimal variation within the systems we’re running on?
Avg. Service Time = E[ X ] + Error
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
GoalFind a distribution that characterizes the minimum expected variation of a
hardware and software system
Use this characterization to accept or reject models
11
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Process
12
• Measurement!• Workload definition!• Find a distribution!• Utilize the distribution to aid model selection
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Timer Mechanism
13
Ask for Time
Receive Time
Timer Thread Code
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Timer Mechanism
14
Timer Thread
rdtsc clock_gettime• x86 assembly • varying methods
to serialize • relatively fast • multiple drift
issues
• POSIX standard • relatively accurate • portable • slower than rdtsc
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Two Timing Choices
15
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
NUMA Node Variations
16
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Minimize Variation• Restricting timer to single core
!• Use the x86 rdtsc instruction with processor
recommended serializers for each processor type !
• Keeping processes under test on the same NUMA node as timer !
• Run timer thread with altered priority to minimize core context swaps
17
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Best Case Execution Time Variation
• no-op instruction implemented in most processors !• usually takes exactly 1 cycle !• no real functional units are involved, so least
taxing !• variation observed in execution time should be
external to process
18
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Data Collection• no-op loops calibrated for various nominal
times, tied to a single core and run thousands of times
!• Execution time measured end to end for
each run, environment collected !• Parameters include:
Number of processes executing on core Number of context swaps (voluntary, involuntary) Many others
19
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Levy Distribution
20
Execution Time Error ( obs - mean )
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Levy Distribution
21
Normal Distribution
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Levy Distribution
22
Gumbel Distribution
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Levy Distribution
23
Levy Distribution
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Levy Distribution
23
Levy Distribution
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Levy Distribution• Truncation enables mean calculation, but
requires fitting to each dataset to find where to truncate
!• The truncation parameters are correlated to
both the number of processes per core and the expected execution time
!• Roughly linear relationship gives an
approximate solution to truncation parameters without refitting
24
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Levy Fit
25
!0.000025 !0.00001
!0.0000155!0.000015!0.0000145!0.000014
!0.000025 !0.00001 0
!0.000014!0.0000135!0.000013!0.0000125
!0.00006 !0.00003 0!0.00005!0.000045!0.00004!0.000035!0.00003!0.000025!0.00002
!0.00005 !0.00002 0!0.00003!0.000025!0.00002!0.000015!0.00001
1 - 5 processes 6 - 10 processes
16 - 20 processes11 - 15 processes
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Test Setup
26
BQ1A
Question: Can we use an M/M/1 queueing model to estimate the mean queue occupancy of this system?
!Hypothesis: Lower Kullback-Leibler (KL) divergence
between expected and realized distribution is associated with higher model accuracy.
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Test Setup
27
BQ1A
1. Dedicated thread of execution monitors queue occupancy
2. Calculate the estimated mean queue occupancy using the M/M/1 model
3. Calculate KL Divergence for the arrival process distribution using the truncated Levy distribution noise model
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Convolution with Exponential
28
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Conclusions• The truncated Levy distribution can be used to
approximate BCETV !• The distribution of BCETV can be used as a tool
to accept or reject a stochastic queueing model based on distributional assumptions
!• KL divergence between the expected and
convolved distribution highly correlates with queue model accuracy
29
SBSStream Based
Supercomputing Labhttp://sbs.wustl.edu
Parting NotesSlides available here:
sbs.wust.edu !
Timer C++ template code: http://goo.gl/ItJ3jP
!
Test harness used to collect data: http://goo.gl/U1VG6N
30