Scholarly Paper MICA

Preview:

Citation preview

Microarchitecture Independent Characterization Using MICA

Leegia Jacob

OUTLINE• Introduction• Experimental Setup• Problems Encountered Using

MICA• Data Analysis and Results• Conclusion

Introduction

Introduction• Quantifying and analyzing the microarchitecture

characteristics for emerging workloads for next generation microprocessors.

• Typically analyzes microarchitecture dependent characteristics by running benchmarks on real hardware.

• Drawback – programs with completely different inherent behavior may yield similar microarchitecture dependent behavior.

• Workloads are compared using microarchitecture independent characteristics.

• Instrumentation is needed.

Experimental Setup

Experimental Setup

• Instrumentation tool: MICA– MICA is a pin tool that characterizes only

microarchitecture independent characteristics.

• Pin tool: Intel’s pin tool 2.12• GCC C++ Compiler version – gcc 4.6.3• Wide variety of benchmarks are available.• Choose the right benchmark. • Benchmark – MiBench: Typeset and Blowfish.

MICA = Microarchitecture – Independent Characterization of Applications

Problems Encountered Using MICA

Compatibility Issues

• MICA-v0.40 was released in 2012. • Not Compatible with the latest pin tool v2.14• Compatible with pin tool 2.12 (Still had issues).• Had to alter the makefile, mica_init.cpp,

mica_init.h, mica_all.cpp and so forth. • Missing a few of files (makefile.gnu.config,

mica_memoryreusedist.cpp, mica_fullmemoryreusedist.cpp).

Other Issues

• Every time pin is run, need to export pin file path. • MICA is built using GCC C++ compiler.• Need to use gcc Version 4.4.0 or above (patch is

not available for older versions.)• There was a known problem of using Pin on Linux

systems that prevent the use of ptrace attach.• Pin cannot use its default injection mode. • Need to edit the ptrace at root every time.

Data Analysis and Results

1. Instruction Level Parallelism (ILP)

• Number of instructions that are able to run in parallel.

• Measured under an idealized out of order processor model.

• Limits are the instruction window size and the data dependences.

• Goal is to take advantage of ILP as much as possible.

ILP - Result• Blowfish:

• High computational intensity. • More loops to run in parallel.• Higher ILP

•Typeset:• Memory Intensive. • Higher rate of cache misses.• Dependency among Instr:• Lower ILP

1 2 3 40

20000

40000

60000

80000

100000

120000

140000

160000

ILP

TypesetBlowfish

32 bits 64 bits 128 bits 256 bits

2. Instruction Mix• Various types of instructions in the benchmark that

represents a class of programs.• The instruction mix is evaluated by categorizing the executed

instructions.

Memory

Read

Memory

Write

Control F

low

Arithmeti

c

Floati

ng Point

Stack

Shift

String sse

other nop0

100000

200000

300000

400000

500000

600000

700000Instruction Mix

TypesetBlowfish

3. Branch Predictability• Predicting the branches wrongly causes performance

degradation. • As the computation intensity increases the branch prediction

can be extremely difficult.• Prediction by Partial Matching (PPM) is used.• Evaluated using 4 different configurations (global/local branch

history, shared/separate prediction table(s)), using 3 different branch history length (4, 8, 12 bits).

4. Register Traffic• Characterized by:

– Average number of register operands.– Average degree of use.– Dependency distances: the number of dynamic instructions

between a write and read of a register. • The dependency distance of the register is selected in powers of 2 (i.e. 1,

2, 4, 8, 16, 32, 64).

1 2 3 4 5 6 70

100000

200000

300000

400000

500000

600000

700000

800000

900000Register Traffic - Dependency Distance

TypesetBlowfish

In powers of 21 = 1, 2 = 2, 3 = 4, 4 = 8, 5 = 16, 6 = 32, 7 = 64

5. Working Set• Working set size of the instruction and data stream is

evaluated.• To quantify the number of unique memory blocks and pages

touched by both instruction and data stream.• MICA characterizes memory block size of 64 bytes and pages

of size 4KB.

6. Data Stream Strides.

• Strides - distance between consequent memory accesses.

• Difference in data memory addresses. • Characterized by:

– local load (memory read) strides– global load (memory read) strides– local store (memory write) strides– global store (memory write) strides

7. Memory Reuse Distance• The number of unique memory locations accessed between two

references to the same memory location.• Distance between the reuses of data.• Buckets = (2^n, 2^(n+1)), n = 0 to 18.

Cold Referen

ce

(2^0, 2

^1)

(2^1, 2

^2)

(2^2, 2

^3)

(2^3, 2

^4)

(2^4, 2

^5)

(2^5, 2

^6)

(2^6, 2

^7)

(2^7, 2

^8)

(2^8, 2

^9)

(2^9, 2

^10)

(2^10, 2

^11)

(2^11, 2

^12)

(2^12, 2

^13)

(2^13, 2

^14)

(2^14, 2

^15)

(2^15, 2

^16)

(2^16, 2

^17)

(2^17, 2

^18)

(2^18, 2

^19)

0

20000

40000

60000

80000

100000

120000

Memory Reuse Distance

TypesetBlowfish

Memory Reuse Distance

Conclusion

Conclusion

• Seven Microarchitecture Independent Characteristics were analyzed and compared.

• Blowfish have higher ILP, instruction mix, branch predictions, register traffic, data stream strides, working set and memory reuse distance over Typeset.

• Key ones to evaluate the emerging workloads with the current ones.

• MICA is a free tool that helped in analyzing these microarchitecture independent characteristics rather than dependent characteristics.

References• Kenneth Hoste and Lieven Eeckhout, “http://boegel.kejo.be/ELIS/mica/”, March, 2012.• Kenneth Hoste and Lieven Eeckhout, “Microarchitecture-Independent Workload

Characterization”, May-June 2007.• Kenneth Hoste and Lieven Eeckhout, “Comparing Benchmarks Using Key Microarchitecture-

Independent Characteristics”, October, 2006.• Lieven Eeckhout, John Sampson and Brad Calder, “Exploiting Program Microarchitecture

Independent Characteristics and Phase Behavior for Reduced Benchmark Suite Simulation”, 2005 IEEE International Symposium on Workload Characterization, October 2005.

• Kenneth Hoste , Aashish Phansalkar , Lieven Eeckhout , Andy Georges, Lizy K. John and Koen De Bosschere, “Performance Prediction based on Inherent Program Similarity”, PACT-2006, Sep, 2006.

• M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, “MiBench: A free, commercially representative embedded benchmark suite,” in WWC, Dec. 2001.

• J. J. Yi, H. Vandierendonck, L. Eeckhout, and D. J. Lilja, “The exigency of benchmark and compiler drift: Designing tomorrow’s processors with yesterday’s tools,” in ICS, June 2006, pp. 75–86.

• Alistair Moffat, “Implementing the PPM Data Compression Scheme ”, September, 1990.

THANK YOU

Recommended