View
44
Download
2
Category
Tags:
Preview:
DESCRIPTION
Sampling-based Program Locality Approximation. Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008. Outline. Background information Motivation Our sampling approach Experimental results. 2. Starting Point. Ending Point. 2. - PowerPoint PPT Presentation
Citation preview
1
Sampling-based Program Locality Approximation
Yutao Zhong, Wentao Chang
Department of Computer ScienceGeorge Mason University
June 8th,2008
2
Outline
• Background information
• Motivation
• Our sampling approach
• Experimental results
3
Reuse distance and reuse signature
a b c a a c b
• Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element
• Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths
2
2
Starting Point
Ending Point
4
Reuse signature application
• Relationship to cache behavior :• Capacity miss
<= reuse distance ≥ cache size• Reduce reuse distance
=> improve cache effectiveness• Current applications :
• Predict cache miss rate [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07]• Reorganize data [Zhong+04] • Provide caching hint [Beyls & D’Hollander 02]• Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]
5
Reuse distance measurement
AccessTime Table
AccessTrace
DistanceHistogram
GetAccessed Memory
Address
Search Update
Address Search, Count Update
Last Record distance
Distance
① Large space and a long counting time required to store traces and count memory access
② Enormous efforts for memory-intensive program
Data Structure:
a c a b b aStarting Point
Ending Point
1
6
Motivation
• Sampling is generally effective to reduce the overhead of program behavior profiling
• We are devoted to balance efficiency and accuracy• Sample only 1% memory accesses• Improve measurement speed by 7.5 times in
average• Achieve over 99% accuracy
7
Sampling algorithms
• Utilize common structure of bursty tracing [Hirzel &
Chilimbi 01]
• Sampling rate r =|Is|/(|Is| +|IH|)
• Naïve sampling• Turn off profiling during hibernating intervals
• Non guarantee of accuracy
8
Naive sampling
. . c a b c a c a b c a c a b c d a . . . .
Memory access trace:
IH IS
Naïve sampling:
IH IS
① ② ③ ④1
Inaccurate measurement
⑤3
9
Biased sampling• Ignore datum that has been referenced within
the current hibernating period
• Measured distance always larger than or equal to actual distance
• Probability of being sampled not uniform
• Probability of being sampled not uniform
10
Biased sampling
. . c a b c a f a b c a c a b f d a . . . .
Memory access trace:
IH IS
Biased sampling:
IH IS
① ② ③ ④
⑤
11
History-preserved representative sampling
• Add an additional tag for each address in access trace
• Mark references within a sampling period as sampled in the tag
• Reuse will only be sampled when starting point marked sampled
12
History-preserved representative sampling
. . c a b c a f a b c a c a b f d a . . . .
Memory access trace:
IH IS
History-preserved representative sampling:
IH IS
① ② ③ ④
⑤
13
Further improvements
• Simplifying maintenance in hibernating intervals• Reference trace implementation: splay tree [Ding & Zhong
03]
• In sampling period, full tree maintenance
• In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses
• Fast sample tag marking and checking• To save space cost, we fix the length of sampling and
hibernating period, avoid additional tag
14
Experiments
• Benchmarks from SPEC 2006, Olden, Chaos:• Floating point programs: CactusADM, Milc,
Soplex, Apsi, MolDyn• Integer programs: Bzip2,Gcc, Libquatum,
Perimeter, TSP
• Instrumentation tool: Valgrind 3.2.3• Sampling rate : 1%• We run each individual benchmark with 3 to 6
different inputs• Repeat three time for each input
15
Experiments cont’d• Comparison of accuracy and efficiency
• Ding and Zhong ’s approximation method [Ding & Zhong 03]
• Time distance measurement [Shen+07]
• Implementation of four algorithms:• Naive sampling, biased sampling, basic and
optimized representative sampling
16
Accuracy
17
Efficiency
Sampling even outperforms the lower bound :time distance measurement
Generally, speedup is less when the input size is small
18
Efficiency
• Speedup of basic representative sampling : around 4-5 times for most cases
• Speedup of optimized representative sampling: • around 7-10 for most cases, up to 33 times • geometric mean is 7.5
• Sampling rate effect (TSP):
19
Related work• Reuse signature collection
• [Mattson+70] [Bennett & Kruskal 75] [Olken81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07]
• Selective monitoring• Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05]
• Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07]
• Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07]
20
Future work
• Dynamically adjust sampling/hibernating lengths
• Store references in temporary buffer and then process them in batch
• Combine time sampling with data sampling
21
Thank you!
Questions?
Recommended