Upload
leegia-jacob
View
88
Download
4
Embed Size (px)
Citation preview
Microarchitecture Independent Characterization Using MICA
Leegia Jacob
OUTLINE• Introduction• Experimental Setup• Problems Encountered Using
MICA• Data Analysis and Results• Conclusion
Introduction
Introduction• Quantifying and analyzing the microarchitecture
characteristics for emerging workloads for next generation microprocessors.
• Typically analyzes microarchitecture dependent characteristics by running benchmarks on real hardware.
• Drawback – programs with completely different inherent behavior may yield similar microarchitecture dependent behavior.
• Workloads are compared using microarchitecture independent characteristics.
• Instrumentation is needed.
Experimental Setup
Experimental Setup
• Instrumentation tool: MICA– MICA is a pin tool that characterizes only
microarchitecture independent characteristics.
• Pin tool: Intel’s pin tool 2.12• GCC C++ Compiler version – gcc 4.6.3• Wide variety of benchmarks are available.• Choose the right benchmark. • Benchmark – MiBench: Typeset and Blowfish.
MICA = Microarchitecture – Independent Characterization of Applications
Problems Encountered Using MICA
Compatibility Issues
• MICA-v0.40 was released in 2012. • Not Compatible with the latest pin tool v2.14• Compatible with pin tool 2.12 (Still had issues).• Had to alter the makefile, mica_init.cpp,
mica_init.h, mica_all.cpp and so forth. • Missing a few of files (makefile.gnu.config,
mica_memoryreusedist.cpp, mica_fullmemoryreusedist.cpp).
Other Issues
• Every time pin is run, need to export pin file path. • MICA is built using GCC C++ compiler.• Need to use gcc Version 4.4.0 or above (patch is
not available for older versions.)• There was a known problem of using Pin on Linux
systems that prevent the use of ptrace attach.• Pin cannot use its default injection mode. • Need to edit the ptrace at root every time.
Data Analysis and Results
1. Instruction Level Parallelism (ILP)
• Number of instructions that are able to run in parallel.
• Measured under an idealized out of order processor model.
• Limits are the instruction window size and the data dependences.
• Goal is to take advantage of ILP as much as possible.
ILP - Result• Blowfish:
• High computational intensity. • More loops to run in parallel.• Higher ILP
•Typeset:• Memory Intensive. • Higher rate of cache misses.• Dependency among Instr:• Lower ILP
•
1 2 3 40
20000
40000
60000
80000
100000
120000
140000
160000
ILP
TypesetBlowfish
32 bits 64 bits 128 bits 256 bits
2. Instruction Mix• Various types of instructions in the benchmark that
represents a class of programs.• The instruction mix is evaluated by categorizing the executed
instructions.
Memory
Read
Memory
Write
Control F
low
Arithmeti
c
Floati
ng Point
Stack
Shift
String sse
other nop0
100000
200000
300000
400000
500000
600000
700000Instruction Mix
TypesetBlowfish
3. Branch Predictability• Predicting the branches wrongly causes performance
degradation. • As the computation intensity increases the branch prediction
can be extremely difficult.• Prediction by Partial Matching (PPM) is used.• Evaluated using 4 different configurations (global/local branch
history, shared/separate prediction table(s)), using 3 different branch history length (4, 8, 12 bits).
4. Register Traffic• Characterized by:
– Average number of register operands.– Average degree of use.– Dependency distances: the number of dynamic instructions
between a write and read of a register. • The dependency distance of the register is selected in powers of 2 (i.e. 1,
2, 4, 8, 16, 32, 64).
1 2 3 4 5 6 70
100000
200000
300000
400000
500000
600000
700000
800000
900000Register Traffic - Dependency Distance
TypesetBlowfish
In powers of 21 = 1, 2 = 2, 3 = 4, 4 = 8, 5 = 16, 6 = 32, 7 = 64
5. Working Set• Working set size of the instruction and data stream is
evaluated.• To quantify the number of unique memory blocks and pages
touched by both instruction and data stream.• MICA characterizes memory block size of 64 bytes and pages
of size 4KB.
6. Data Stream Strides.
• Strides - distance between consequent memory accesses.
• Difference in data memory addresses. • Characterized by:
– local load (memory read) strides– global load (memory read) strides– local store (memory write) strides– global store (memory write) strides
7. Memory Reuse Distance• The number of unique memory locations accessed between two
references to the same memory location.• Distance between the reuses of data.• Buckets = (2^n, 2^(n+1)), n = 0 to 18.
Cold Referen
ce
(2^0, 2
^1)
(2^1, 2
^2)
(2^2, 2
^3)
(2^3, 2
^4)
(2^4, 2
^5)
(2^5, 2
^6)
(2^6, 2
^7)
(2^7, 2
^8)
(2^8, 2
^9)
(2^9, 2
^10)
(2^10, 2
^11)
(2^11, 2
^12)
(2^12, 2
^13)
(2^13, 2
^14)
(2^14, 2
^15)
(2^15, 2
^16)
(2^16, 2
^17)
(2^17, 2
^18)
(2^18, 2
^19)
0
20000
40000
60000
80000
100000
120000
Memory Reuse Distance
TypesetBlowfish
Memory Reuse Distance
Conclusion
Conclusion
• Seven Microarchitecture Independent Characteristics were analyzed and compared.
• Blowfish have higher ILP, instruction mix, branch predictions, register traffic, data stream strides, working set and memory reuse distance over Typeset.
• Key ones to evaluate the emerging workloads with the current ones.
• MICA is a free tool that helped in analyzing these microarchitecture independent characteristics rather than dependent characteristics.
References• Kenneth Hoste and Lieven Eeckhout, “http://boegel.kejo.be/ELIS/mica/”, March, 2012.• Kenneth Hoste and Lieven Eeckhout, “Microarchitecture-Independent Workload
Characterization”, May-June 2007.• Kenneth Hoste and Lieven Eeckhout, “Comparing Benchmarks Using Key Microarchitecture-
Independent Characteristics”, October, 2006.• Lieven Eeckhout, John Sampson and Brad Calder, “Exploiting Program Microarchitecture
Independent Characteristics and Phase Behavior for Reduced Benchmark Suite Simulation”, 2005 IEEE International Symposium on Workload Characterization, October 2005.
• Kenneth Hoste , Aashish Phansalkar , Lieven Eeckhout , Andy Georges, Lizy K. John and Koen De Bosschere, “Performance Prediction based on Inherent Program Similarity”, PACT-2006, Sep, 2006.
• M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “MiBench: A free, commercially representative embedded benchmark suite,” in WWC, Dec. 2001.
• J. J. Yi, H. Vandierendonck, L. Eeckhout, and D. J. Lilja, “The exigency of benchmark and compiler drift: Designing tomorrow’s processors with yesterday’s tools,” in ICS, June 2006, pp. 75–86.
• Alistair Moffat, “Implementing the PPM Data Compression Scheme ”, September, 1990.
THANK YOU