View
12
Download
0
Category
Preview:
DESCRIPTION
Hit Or miss predictor. Final presentation. Software Engineering Lab. Spring 07/08 Supervised by: Zvika Guz Introduced by: Akram Baransi Amir Salameh. Hit or Miss ? !!!. Introduction Cache Memory. Cache RAM is high-speed memory (usually SRAM) . - PowerPoint PPT Presentation
Citation preview
Final presentation
Hit Or misspredictor
Hit or Miss!!! ?
Software Engineering Lab.Spring 07/08
Supervised by: Zvika Guz
Introduced by: Akram BaransiAmir Salameh
Cache RAM is high-speed memory (usually SRAM) .
The Cache stores frequently requested data.
If the CPU needs data, it will check in the high-speed cache memory first before looking in the slower main memory.
Cache memory may be three to five times faster than system DRAM.
INTRODUCTIONCACHE MEMORY
Most computers have two separate memory caches; L1 cache, located on the CPU, and L2 cache, located between the CPU and DRAM.
L1 cache is faster than L2, and is the first place the CPU looks for its data. If data is not found in L1 cache, the search continues with L2 cache, and then on to DRAM.
INTRODUCTIONCACHE MEMORY
Shared cache: is a cache which shared among several processors.
In multi-core system, the shared cache is usually overloaded with many accesses from the different cores.
Our goal is to reduce the load from the shared cache.
To achieve this goal we will build a predictor which predict if we going to get a hit or miss when we access the shared cache.
INTRODUCTIONSHARED CACHE
Small size. Simple and fast. Implementable with hardware. Does not need too much power. Does not predict miss if we have a hit. Have a high hit rate especially on misses.
Hit or Miss!!! ?
PREDICTOR REQUIREMENTS
Bloom filter: is a method representing a set of N elements (a1,…,an) to support membership queries.
The idea is to allocate a vector v of m bits, initially all set to 0.
Choose k independent hash functions, h1,… ,hk ,each with range 1…m .
For each element a, the bits at positions h1(a), ..., hk(a) in v are set to 1.
SIMPLE PREDICTORBLOOM FILTER
Given a query for b we check the bits at positions h1(b), h2(b), ..., hk(b).
If any of them is 0, then certainly b is not in the set A.
Otherwise we conjecture that b is in the set although there is a certain probability that we are wrong. This is called a “false positive”.
The parameters k and m should be chosen such that the probability of a false positive (and hence a false hit) is acceptable.
SIMPLE PREDICTORBLOOM FILTER
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9 121110 13 1514
A = {123, 456, 764, 227}
H(x) = x % 16A = {123, 456, 764}A = {123, 456}A = {123}A}{ =
Insert (123)Insert (456)Insert (764)Insert (227) H(123) = 11H(456) = 8H(764) = 12H(227) = 3
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 1 2 3 4 5 6 7 8 9 121110 13 1514
0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0
0 1 2 3 4 5 6 7 8 9 121110 13 1514
0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0
0 1 2 3 4 5 6 7 8 9 121110 13 1514
0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0
0 1 2 3 4 5 6 7 8 9 121110 13 1514
Is 227 in A?
Bloom Array
H(227)=3Bloom[3]=1
I think, Yes it is.
Right Prediction
Is 151 in A?
H(151)=7Bloom[7]=0
Right Prediction
Certainly No.
Is 504 in A?
H(504)=8Bloom[8]=1
I think, Yes it is.
!!Ops! False
Positive
BLOOM PREDICTOREXAMPLE
We used a separate predictor for each set in the L2 cache.
Set 0
Set 1
Set N
Set 0
Set 1
Set N
Set 0
Set 1
Set N
Set 0
Set 1
Set N
Array 0
Array 1
Array N
0
1
0
1
0
0
1
1
1
0
1
0
1
0
1
0
1
0
BLOOM PREDICTORIN L2 CACHE
Small size.
Simple and fast.
Implementable with hardware.
Does not need too much power.
Does not predict miss if we have a hit.
BLOOM FILTERADVANTAGES
If A is a dynamic group, and in our case it is a dynamic one, it will be too hard to update the array when removing an element “e” from A, we can’t simply turn off Bloom[H(e)], to do so we must check that there is no element “e1” in A such that H(e)=H(e1). And this take a lot of time.
If we don’t update the array the hit rate will become low.
BLOOM FILTERDISADVANTAGES
Using counters instead of binary cells, so when removing an element we simply reduce the appropriate counters.
The problem with this solution is: The size will become large.
IMPROVEMENTS AND SOLUTIONS
Note that the number of elements in each set is usually small (cache associative) , allow us to use limited counters, for example 2 bit counters.
In this way we get a small predictor, but we still have problem when the counter reached its saturation, and it happened with low probability.
IMPROVEMENTS AND SOLUTIONS
Adding an overflow flag for each bloom array allow us to reduce the counter when it reach its saturation in few cases.
Overflow flag = 1, if and only if we tried to increase a saturated counter in the appropriate array.
How does it help? If the overflow flag is 0, we can reduce a
saturated counter, we were unable to do this before.
IMPROVEMENTS AND SOLUTIONS
How can we solve the problem of the not updates arrays? Entering the arrays that need update to a
queue and every N cycles we update one of them, (in this way the lines in the DRAM updates)
When we enter an array to the queue? After K failed attempts to reduce a counter in
the array due to overflow.
IMPROVEMENTS AND SOLUTIONS
We don’t have an infinity queue in the hardware, so what can we do if the queue is full and we need to enter an array to it?
We turn on a flag which indicate that the array need update and it not entered to the queue yet, and in the next time that we access the array we will try again to enter it to the queue.
IMPROVEMENTS AND SOLUTIONS
We get all the L2 accesses from simics for 9 benchmarks.
We implemented a simulator to the cache and the predictor with Perl.
In the command line we can choose the configuration that we want, by changing the following parameters:
RESULTS ANALYSIS
Cache parameters:
Lines number – the number of the lines in the cache.
Line size – the size of each line in the cache.
Associative – the associative of the cache.
RESULTS ANALYSIS
Predictor parameters:
Bloom array size – The number of entries in bloom array.
Bloom max counter – The counter limit for each entry.
Number of hashes – The number of hash functions that the algorithm use.
RESULTS ANALYSIS
Predictor parameters:
Bloom max not updated - Number of times of fails to decrement the Bloom counter in a specific entry, and failed due to the fact that the counter is saturated.
Enable bloom update – Enable array update.
Bloom update period – Number of L2 accesses between 2 updates.
RESULTS ANALYSIS
In the following graphs we see the hit rate of the predictor versus the cache hit rate.
We configured the predictor and the cache with the following parameters. Bloom array size = 64 Bloom max counter = 3 Associative = 16 Line size = 64 Update period = 1
RESULTS ANALYSIS
RESULTS ANALYSIS
Apache 2M
Apache 4M
Apache 8M
Apache 16M
Barnes 2M
Barnes 4M
Barnes 8M
Barnes 16M
Equak 2M
Equak 4M
Equak 8M
Equak 16M
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
80.12%84.60%
91.85%95.66%
86.00%
95.79%98.96%99.66%
84.53%87.81%
90.49%91.38%
Predector Hit Rate On Misses
Cache Hit Rate
RESULTS ANALYSIS
Fma3d 2M
Fma3d 4M
Fma3d 8M
Fma3d 16M
Lu 2MLu 4MLu 8MLu 16MOcean 2M
Ocean 4M
Ocean 8M
Ocean 16M
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
85.14%88.20%88.72%88.55%
93.19%
99.32%99.86%99.97%
83.66%
90.47%88.09%
95.54%
Predector Hit Rate On Misses
Cache Hit Rate
RESULTS ANALYSIS
Specjbb 2M
Specjbb 4M
Specjbb 8M
Specjbb 16M
Water 2M
Water 4M
Water 8M
Water 16M
Zeus 2M
Zeus 4M
Zeus 8M
Zeus 16M
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
77.73%80.13%82.26%
84.43%
76.70%
87.37%90.63%
93.34%
79.79%84.02%
88.97%92.93%
Predector Hit Rate On Misses
Cache Hit Rate
Project goal achieved:
We saw in the above graphs that we get a high hit rate on misses, for example the average hit rate on misses with 16M cache is 93.5%.
What’s next?
Using the predictor idea to other units in the computer, for example in the DRAM.
CONCLUSIONS
http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
http://www.simmtester.com/page/memory/show_glossary.asp
http://i284.photobucket.com/albums/ll32/kwashecka/thanks.gif
REFERENCES
Recommended