Sampling-based Program Locality Approximation

Yutao Zhong, Wentao Chang

Department of Computer ScienceGeorge Mason University

June 8th,2008

Outline

• Background information

• Motivation

• Our sampling approach

• Experimental results

Reuse distance and reuse signature

a b c a a c b

• Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element

• Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths

Starting Point

Ending Point

Reuse signature application

• Relationship to cache behavior :• Capacity miss

<= reuse distance ≥ cache size• Reduce reuse distance

=> improve cache effectiveness• Current applications :

• Predict cache miss rate [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07]• Reorganize data [Zhong+04] • Provide caching hint [Beyls & D’Hollander 02]• Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]

Reuse distance measurement

AccessTime Table

AccessTrace

DistanceHistogram

GetAccessed Memory

Address

Search Update

Address Search, Count Update

Last Record distance

Distance

① Large space and a long counting time required to store traces and count memory access

② Enormous efforts for memory-intensive program

Data Structure:

a c a b b aStarting Point

Ending Point

Motivation

• Sampling is generally effective to reduce the overhead of program behavior profiling

• We are devoted to balance efficiency and accuracy• Sample only 1% memory accesses• Improve measurement speed by 7.5 times in

average• Achieve over 99% accuracy

Sampling algorithms

• Utilize common structure of bursty tracing [Hirzel &

Chilimbi 01]

• Sampling rate r =|Is|/(|Is| +|IH|)

• Naïve sampling• Turn off profiling during hibernating intervals

• Non guarantee of accuracy

Naive sampling

. . c a b c a c a b c a c a b c d a . . . .

Memory access trace:

Naïve sampling:

① ② ③ ④1

Inaccurate measurement

Biased sampling• Ignore datum that has been referenced within

the current hibernating period

• Measured distance always larger than or equal to actual distance

• Probability of being sampled not uniform

Biased sampling

. . c a b c a f a b c a c a b f d a . . . .

Biased sampling:

① ② ③ ④

History-preserved representative sampling

• Add an additional tag for each address in access trace

• Mark references within a sampling period as sampled in the tag

• Reuse will only be sampled when starting point marked sampled

History-preserved representative sampling

. . c a b c a f a b c a c a b f d a . . . .

History-preserved representative sampling:

① ② ③ ④

Further improvements

• Simplifying maintenance in hibernating intervals• Reference trace implementation: splay tree [Ding & Zhong

• In sampling period, full tree maintenance

• In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses

• Fast sample tag marking and checking• To save space cost, we fix the length of sampling and

hibernating period, avoid additional tag

Experiments

• Benchmarks from SPEC 2006, Olden, Chaos:• Floating point programs: CactusADM, Milc,

Soplex, Apsi, MolDyn• Integer programs: Bzip2,Gcc, Libquatum,

Perimeter, TSP

• Instrumentation tool: Valgrind 3.2.3• Sampling rate : 1%• We run each individual benchmark with 3 to 6

different inputs• Repeat three time for each input

Experiments cont’d• Comparison of accuracy and efficiency

• Ding and Zhong ’s approximation method [Ding & Zhong 03]

• Time distance measurement [Shen+07]

• Implementation of four algorithms:• Naive sampling, biased sampling, basic and

optimized representative sampling

Accuracy

Efficiency

Sampling even outperforms the lower bound :time distance measurement

Generally, speedup is less when the input size is small

Efficiency

• Speedup of basic representative sampling : around 4-5 times for most cases

• Speedup of optimized representative sampling: • around 7-10 for most cases, up to 33 times • geometric mean is 7.5

• Sampling rate effect (TSP):

Related work• Reuse signature collection

• [Mattson+70] [Bennett & Kruskal 75] [Olken81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07]

• Selective monitoring• Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05]

• Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07]

• Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07]

Future work

• Dynamically adjust sampling/hibernating lengths

• Store references in temporary buffer and then process them in batch

• Combine time sampling with data sampling

Thank you!

Questions?

Sampling-based Program Locality Approximation

Documents

1 Sampling Bayesian Networks ICS 275b 2005. 2 Approximation Algorithms Structural Approximations Eliminate some dependencies Remove edges Mini-Bucket

Spatial locality

Sparse Indexing: Large Scale, Inline Deduplication Using ... · Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge †, Kave Eshghi ,

Approximation Algorithms. 2301681Approximation Algorithms2 Outlines Why approximation algorithm? Approximation ratio Approximation vertex cover

Locality / Tiling

Sampling based approximation of confidence intervals for functions of genetic covariance matrices

An eﬃcient algorithm for Kriging approximation and ... of observation...An eﬃcient algorithm for Kriging approximation and optimization with large-scale sampling data S. Sakata

Monte Carlo Linear Algebra: A Review and Recent Resultsdimitrib/Monte_Carlo_Linear_Algebra.pdf · Motivating Framework: Low-Dimensional Approximation Sampling Issues Solution Methods

Efﬁcient Locality Approximation from Timesteffan/workshops/08/cdp/... · 2008-11-04 · Efﬁcient Locality Approximation from Time Xipeng Shen ... An option to resume: change target

Locality versus Anti-locality Effects in Mandarin Sentence

Recursive Sampling for the Nystrom Method€¦ · for Nyström approximation, by employing a fast recursive sampling scheme, our algorithm is the ﬁrst to make the approach scalable

Uniform Sampling for Matrix Approximationcmusco/personal_site/pdfs/... · 2019-04-09 · Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher

drops.dagstuhl.de€¦ · Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization∗ Chaitanya Swamy† David B. Shmoys‡ Abstract Stochastic optimization

Sampling-based Approximation Algorithms for Reachability

arXiv:1511.03189v1 [quant-ph] 10 Nov 2015 · sampling" loophole. While the locality and fair-sampling loopholes have been closed individually in di erent sys-tems [20{24], it has

Net locality

Data Locality

Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization Chaitanya Swamy University of Waterloo Joint work with David Shmoys Cornell

MY LOCALITY

Sampling and approximation theorybig · Sampling and approximation theory Michael Unser Biomedical Imaging Group EPFL, Lausanne Switzerland Tutorial, Inzell Summer School, September