13
Simulation of Decode Filter Cache using SimpleScalar sim ulator Presented by Fei Hong

Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Embed Size (px)

Citation preview

Page 1: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Simulation of Decode Filter Cache using SimpleScalar simulator

Presented by Fei Hong

Page 2: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Motivation & Goals

• Instruction fetches and decodes are the major on-chip power consumers

• Optimize the power consumption by reducing instruction fetches and decodes

• Simulate the DFC architecture using simplescalar

• To test the performance of DFC

Page 3: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Prediction Mechanism Each sector in DFC has the following fields.

(tag, sector_valid, next_address)

If A is not equal to C, a different control path will be taken tag(A) != tag(C) (1)

A and B are consecutively accessed. If they belonged to a small loop

tag(A) == tag(B) (2) Based on (1) and (2), the prediction for next fetch : tag(C) == tag(B) (3)

Next AddressValid bits

Tag Data

B

...

X: A

B

Y: B

X: C

Page 4: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Working Process

last_table_entry

next_fetch_srcfetch

address

...

Fetch from DFC or I-cacheC

next_fetch_srcupdate

update

predict

1

2

3NFPT

Page 5: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

The Platform

• Host computer: ACPI x86-based PC • Host computer operating system: Microsoft Windows V

ista Ultimate• Virtual Machine: VMware Workstation version 6.03• Linux operating system: Fedora Core 6• Simulator: SimpleScalar version 3.0

Page 6: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Work have done so far…

• Setup the platform• Reading the source code of SimpleScalar• Apply my DFC structure and working process to S

impleScalar• Find benchmarks and compile in the platform • Do simulation using given memory hierarchy par

ameters

Page 7: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

MiBench

• dijkstra: it constructs a large graph in an adjacency matrix representation and then calculates the shortest path between every pair of nodes using repeated applications of Dijkstra’s algorithm.

• stringsearch: it searches for given words in phrases using a case insensitive comparison algorithm.

• rijndael encrypt/decrypt: it was selected as the National Institute of Standards and Technologies Advanced Encryption Standard (AES).

• CRC32: This benchmark performs a 32-bit Cyclic Redundancy Check (CRC) on a file. CRC checks are often used to detect errors in data transmission.

Page 8: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Memory hierarchy parameters

Parameter Value

Instr. size 4B

DFC direct-mapped, 32 secotors,4 decoded instr. per sector,

8B per decoded instr.

L1 I-cache 16KB, 2-way, 32B line,1 cycle hit latency

L1 D-cache 8KB, 2-way, 32B line,1-cycle hit latency

Memory 30-cycle latency

Page 9: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Simulation results

% reduction in instruction fetches and decodes

0

20

40

60

80

100

di j kstra stri ngsearch ri j ndael CRC32

fetch and decodereducti on

Page 10: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Simulation results

Prediction hit rate

97

97. 5

98

98. 5

99

99. 5

100

di j kstra stri ngsearch ri j ndael CRC32

predi cti on hi t rate

Page 11: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Simulation results

dijkstra stringsearch rijndael CRC32

sim_num_insn

255620304 4437612 391487315 533385529

il1.accesses 43508918 1605417 236160209 972328

il1.hits 43399500 1568976 228694324 971600

il1.misses 109418 36441 7465885 728

il1.miss_rate 0.0025 0.0227 0.0316 0.0007

dfc.accesses 215740165 3269067 232531480 532674172

dfc.hits 212111386 2832195 155327106 532413201

dfc.misses 3628779 436872 77204374 260971

dfc.miss_rate 0.0168 0.1336 0.3320 0.0005

Page 12: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Conclusion

• The DFC stores decoded instructions and can be very small and energy-efficient.

• Use of the DFC eliminates both the access to a much larger instruction cache and the entire decoding step.

• From the simulation results, we can see that most instruction fetch and decode can be eliminated by using DFC. Therefore, it is a very efficient way to optimize the power consumption of embedded processors.

Page 13: Simulation of Decode Filter Cache using SimpleScalar simulator Presented by Fei Hong

Thank you!