21
1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008

Sampling-based Program Locality Approximation

  • Upload
    aurora

  • View
    44

  • Download
    2

Embed Size (px)

DESCRIPTION

Sampling-based Program Locality Approximation. Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008. Outline. Background information Motivation Our sampling approach Experimental results. 2. Starting Point. Ending Point. 2. - PowerPoint PPT Presentation

Citation preview

Page 1: Sampling-based  Program Locality Approximation

1

Sampling-based Program Locality Approximation

Yutao Zhong, Wentao Chang

Department of Computer ScienceGeorge Mason University

June 8th,2008

Page 2: Sampling-based  Program Locality Approximation

2

Outline

• Background information

• Motivation

• Our sampling approach

• Experimental results

Page 3: Sampling-based  Program Locality Approximation

3

Reuse distance and reuse signature

a b c a a c b

• Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element

• Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths

2

2

Starting Point

Ending Point

Page 4: Sampling-based  Program Locality Approximation

4

Reuse signature application

• Relationship to cache behavior :• Capacity miss

<= reuse distance ≥ cache size• Reduce reuse distance

=> improve cache effectiveness• Current applications :

• Predict cache miss rate [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07]• Reorganize data [Zhong+04] • Provide caching hint [Beyls & D’Hollander 02]• Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]

Page 5: Sampling-based  Program Locality Approximation

5

Reuse distance measurement

AccessTime Table

AccessTrace

DistanceHistogram

GetAccessed Memory

Address

Search Update

Address Search, Count Update

Last Record distance

Distance

① Large space and a long counting time required to store traces and count memory access

② Enormous efforts for memory-intensive program

Data Structure:

a c a b b aStarting Point

Ending Point

1

Page 6: Sampling-based  Program Locality Approximation

6

Motivation

• Sampling is generally effective to reduce the overhead of program behavior profiling

• We are devoted to balance efficiency and accuracy• Sample only 1% memory accesses• Improve measurement speed by 7.5 times in

average• Achieve over 99% accuracy

Page 7: Sampling-based  Program Locality Approximation

7

Sampling algorithms

• Utilize common structure of bursty tracing [Hirzel &

Chilimbi 01]

• Sampling rate r =|Is|/(|Is| +|IH|)

• Naïve sampling• Turn off profiling during hibernating intervals

• Non guarantee of accuracy

Page 8: Sampling-based  Program Locality Approximation

8

Naive sampling

. . c a b c a c a b c a c a b c d a . . . .

Memory access trace:

IH IS

Naïve sampling:

IH IS

① ② ③ ④1

Inaccurate measurement

⑤3

Page 9: Sampling-based  Program Locality Approximation

9

Biased sampling• Ignore datum that has been referenced within

the current hibernating period

• Measured distance always larger than or equal to actual distance

• Probability of being sampled not uniform

• Probability of being sampled not uniform

Page 10: Sampling-based  Program Locality Approximation

10

Biased sampling

. . c a b c a f a b c a c a b f d a . . . .

Memory access trace:

IH IS

Biased sampling:

IH IS

① ② ③ ④

Page 11: Sampling-based  Program Locality Approximation

11

History-preserved representative sampling

• Add an additional tag for each address in access trace

• Mark references within a sampling period as sampled in the tag

• Reuse will only be sampled when starting point marked sampled

Page 12: Sampling-based  Program Locality Approximation

12

History-preserved representative sampling

. . c a b c a f a b c a c a b f d a . . . .

Memory access trace:

IH IS

History-preserved representative sampling:

IH IS

① ② ③ ④

Page 13: Sampling-based  Program Locality Approximation

13

Further improvements

• Simplifying maintenance in hibernating intervals• Reference trace implementation: splay tree [Ding & Zhong

03]

• In sampling period, full tree maintenance

• In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses

• Fast sample tag marking and checking• To save space cost, we fix the length of sampling and

hibernating period, avoid additional tag

Page 14: Sampling-based  Program Locality Approximation

14

Experiments

• Benchmarks from SPEC 2006, Olden, Chaos:• Floating point programs: CactusADM, Milc,

Soplex, Apsi, MolDyn• Integer programs: Bzip2,Gcc, Libquatum,

Perimeter, TSP

• Instrumentation tool: Valgrind 3.2.3• Sampling rate : 1%• We run each individual benchmark with 3 to 6

different inputs• Repeat three time for each input

Page 15: Sampling-based  Program Locality Approximation

15

Experiments cont’d• Comparison of accuracy and efficiency

• Ding and Zhong ’s approximation method [Ding & Zhong 03]

• Time distance measurement [Shen+07]

• Implementation of four algorithms:• Naive sampling, biased sampling, basic and

optimized representative sampling

Page 16: Sampling-based  Program Locality Approximation

16

Accuracy

Page 17: Sampling-based  Program Locality Approximation

17

Efficiency

Sampling even outperforms the lower bound :time distance measurement

Generally, speedup is less when the input size is small

Page 18: Sampling-based  Program Locality Approximation

18

Efficiency

• Speedup of basic representative sampling : around 4-5 times for most cases

• Speedup of optimized representative sampling: • around 7-10 for most cases, up to 33 times • geometric mean is 7.5

• Sampling rate effect (TSP):

Page 19: Sampling-based  Program Locality Approximation

19

Related work• Reuse signature collection

• [Mattson+70] [Bennett & Kruskal 75] [Olken81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07]

• Selective monitoring• Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05]

• Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07]

• Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07]

Page 20: Sampling-based  Program Locality Approximation

20

Future work

• Dynamically adjust sampling/hibernating lengths

• Store references in temporary buffer and then process them in batch

• Combine time sampling with data sampling

Page 21: Sampling-based  Program Locality Approximation

21

Thank you!

Questions?