Upload
phelan-bonner
View
40
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Exploiting Value Locality in Physical Register Files. Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison. 36 th Annual International Symposium on Microarchitecture. Motivation. More in-flight instructions (ILP, SMT). Need more physical registers - PowerPoint PPT Presentation
Citation preview
Exploiting Value Locality inPhysical Register Files
Saisanthosh Balakrishnan Guri SohiUniversity of Wisconsin-Madison
36th Annual International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files 2
Motivation
Need more physical registers
Increase in area, access time, power
More in-flight instructions (ILP, SMT)
Access locality: Hierarchical and register caches
Communication locality: Banked and clustered design
Late allocation: Virtual-physical registers
Value locality: Physical register reuse
Optimized Design Optimized Storage
Reduce storage requirements:
1. Exploit register value locality
2. Simplify for common values
Exploiting Value Locality in Physical Register Files 3
Outline
The property: Value locality in register file
Optimized storage schemes
Results
Conclusion
Exploiting Value Locality in Physical Register Files 4
Value Locality
Locality of the results produced by all dynamic instructions
1. Identify the most common results
2. Locality in the results produced (register writes)
3. Duplication in register file
Exploiting Value Locality in Physical Register Files 5
Value Locality
bzip2 crafty gap gcc gzip mcf ammp art 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 4831843632 7 2 4 4 5368778784 5368710000 2 5368712376 3 81 4831839248 32 5 2560 4.59187e+18 62914560 5369648560 5 3 2 4831863760 1884672376 4.58549e+18 65536 8 5369767936 52 3 10 3584 3 5368712432 2 8 -1 5368758224 32 6656 5370448344 32 5369777344 3 59 16 2 5632 32 2 6 4 7 -1 49 48 7 3 5368862128 16 5 8 -1 14848 10
10 most common values in some SPEC CPU2000 benchmarks
0 and 1 are most common results on all benchmarks
Exploiting Value Locality in Physical Register Files 6
Value Locality
Consider valid registers only
Many registers could hold the same value
Duplication in register file
0%
10%
20%
30%
40%
50%
60%
80
128
160 80
128
160
SpecInt Average SpecFP Average
0 1 Other
Locality in all results produced (register writes)
Percentage of values written to registersalready present in the physical register file (80, 128, 160
regs.)
Exploiting Value Locality in Physical Register Files 7
Value Locality
0%
5%
10%
15%
20%
25%
30%
80
12
8
16
0
80
12
8
16
0
SpecInt Average SpecFP Average
0 1 Other
Duplication of values in physical register file
# of duplicate registers = # of valid physical registers – # of unique values in the
register file
Value being held in n registers (n-1) duplicate registers
Number of duplicate registers depends on register lifetime
60% to 85% of duplication because of 0 and 1
Exploiting Value Locality in Physical Register Files 8
Observations
1. Many physical registers hold same value
2. 0’s and 1’s are most common results
Reuse physical registersInstructions with same result mapped to one physical registerFirst proposed by [Jourdan et al. MICRO-31]Reduced storage requirements
Optimized storage for common valuesRegisterless StorageExtended Registers and Tags Simplified micro-architectural changes
Exploiting Value Locality in Physical Register Files 9
Outline
Value locality in register file
Exploiting it: Optimized storage schemesPhysical Register ReuseRegisterless StorageExtended Registers and Tags
Results
Conclusion
Exploiting Value Locality in Physical Register Files 10
Physical Register Reuse
1. Conventional renaming of destination register
2. On execution, detect if result present in register file
3. Free assigned physical register
4. Remap instruction’s destination register
5. Handle dependent instructions
Many instructions with same result map to one physical register
Register file with unique values More free registers
Exploiting Value Locality in Physical Register Files 11
Physical Register Reuse
Fetch Decode Rename Queue Schedule Reg. Read Writeback Commit
ValueCache
Value cache – to detect duplicate results
Maps physical register tags to values CAM structure, returns tag for a value Actions to invalidate / add entries
Execute
Exploiting Value Locality in Physical Register Files 12
Physical Register Reuse
Fetch Decode Rename Queue Schedule Writeback Commit
ValueCache
Reference counter for every register
Increment when mapped Decrement when freed Return to free list when 0
Reference counting in Register File
Refe
ren
ce c
ou
nte
rs
Physical Register
File
ExecuteReg. Read
Exploiting Value Locality in Physical Register Files 13
Physical Register Reuse
Fetch Decode Queue Schedule Reg. Read Execute Writeback Commit
ValueCache
Refe
ren
ce c
ou
nte
rs
Physical Register
File
Update rename map. Remap instruction’s destination register
Handling dependent instructions
Rename
Exploiting Value Locality in Physical Register Files 14
Physical Register Reuse
Fetch Decode Rename Schedule Execute Writeback Commit
ValueCache
Refe
ren
ce c
ou
nte
rs
Physical Register
File
AliasTable
Queue
Re-broadcast new destination register tag
Re-broadcast <invalid_src_tag>. Lookup alias table on register read
Handling dependent instructions
Reg. Read
Exploiting Value Locality in Physical Register Files 15
Physical Register Reuse
Reduced register requirements
Avoids register write of duplicate values
Non-trivial micro-architectural changes Value Cache lookup, Alias Table indirection, Reference countsRecovering from exceptions
Remapping of destination register requires re-broadcast
Exploiting Value Locality in Physical Register Files 16
Outline
Value locality in register file
Optimized storage schemesPhysical register reuseRegisterless Storage (0’s & 1’s)Extended Registers and Tags (0’s & 1’s)
Results
Conclusion
Exploiting Value Locality in Physical Register Files 17
Registerless Storage
Exploit common value locality – state bits for 0 and 1
Register file without 0s and 1s More free registers
1. Conventional renaming of destination register
2. On execution, detect if result is 0 or 1
3. Free assigned physical register
4. Remap instruction’s destination register to reserved tags
5. Communicate value directly to dependent instructions
Exploiting Value Locality in Physical Register Files 18
Registerless Storage
Simplified micro-architectural changesNo Value Cache, Alias Table, Reference counts
No registers for 0 and 1: Reduced register requirements
Eliminates register reads and writes of 0 and 1
Optimize storage for common values. But, avoid remapping destination register tag
Remapping of destination register requires re-broadcast
Exploiting Value Locality in Physical Register Files 19
Extended Registers and Tags
Associate physical register with 2-bit extensionV: Valid and D: data = {0, 1}
Most 0’s and 1’s in register extensions
Physical register V D
Rename: Assign physical register with its extension (if available) Execute: If result is 0 or 1
Use extension, if available. Free physical register.Physical register can be assigned to some other instruction
Exploiting Value Locality in Physical Register Files 20
Extended Register and Tags
Extended tagging scheme eliminates remapping
Increase tag (N-bits) namespace by 1-bit (MSB)
Unmodified free list
To assign a tag:1. Get tag from free list2. Get MSB from the corresponding extension’s valid bit
MSB = 0 {register, extension}MSB = 1 {register}
Tag management
Exploiting Value Locality in Physical Register Files 21
Extended Registers and Tags
Physical register V DMSB = 1
Physical register 1 DMSB = 0Result = {0, 1}
Physical register V DMSB = 1
Register read operation
Physical register 1 DMSB = 0
Physical register 0 DMSB = 0
Register write operation
Physical register 0 DMSB = 0Result != {0, 1}
Never used
Exploiting Value Locality in Physical Register Files 22
Extended Registers and Tags
Simplified micro-architectural changes
Extended registers hold 0’s and 1’s
Better design
Some common values use physical registers
Exploiting Value Locality in Physical Register Files 23
Outline
Value locality
Optimized storage schemes
ResultsPerformance – more in-flight instructionsBenefit – smaller register file
Reduced register traffic
Conclusion
Exploiting Value Locality in Physical Register Files 24
Performance
0%
5%
10%
15%
20%
bzi
p2
craf
ty
eon
gap gcc
gzi
p
mcf
par
ser
per
l
two
lf
vort
ex vpr
amm
p
app
lu
apsi ar
t
equ
ake
face
rec
fma3
d
gal
gel
luca
s
mes
a
mg
rid
sixt
rack
swim
wu
pwis
e
int-
avg
fp-a
vg
Register Reuse (0-c Alias Table)
Relative performance improvement with increased instruction window
Sp
eed
up
Exploiting Value Locality in Physical Register Files 25
Performance
-15%
-10%
-5%
0%
5%
10%
15%
20%
bzi
p2
cra
fty
eo
n
gap
gcc
gzi
p
mc
f
pars
er
perl
two
lf
vo
rtex
vp
r
am
mp
ap
plu
ap
si
art
eq
ua
ke
fac
ere
c
fma3
d
galg
el
luc
as
me
sa
mg
rid
six
trac
k
sw
im
wu
pw
ise
int-
avg
fp-a
vg
Register Reuse (1-c Alias Table) Register Reuse (0-c Alias Table)
Relative performance improvement with increased instruction window
Sp
eed
up
Exploiting Value Locality in Physical Register Files 26
Performance
-15%
-10%
-5%
0%
5%
10%
15%
20%
bzi
p2
cra
fty
eo
n
ga
p
gc
c
gzi
p
mc
f
pa
rse
r
pe
rl
two
lf
vo
rte
x
vp
r
am
mp
ap
plu
ap
si
art
eq
ua
ke
fac
ere
c
fma
3d
ga
lge
l
luc
as
me
sa
mg
rid
six
tra
ck
sw
im
wu
pw
ise
int-
av
g
fp-a
vg
Register Reuse (1-c Alias Table) Registerless StorageRegister Reuse (0-c Alias Table)
Relative performance improvement with increased instruction window
Sp
eed
up
Exploiting Value Locality in Physical Register Files 27
Performance
-15%
-10%
-5%
0%
5%
10%
15%
20%
bzi
p2
cra
fty
eo
n
ga
p
gc
c
gzi
p
mc
f
pa
rse
r
pe
rl
two
lf
vo
rte
x
vp
r
am
mp
ap
plu
ap
si
art
eq
ua
ke
fac
ere
c
fma
3d
ga
lge
l
luc
as
me
sa
mg
rid
six
tra
ck
sw
im
wu
pw
ise
int-
av
g
fp-a
vg
Register Reuse (1-c Alias Table) Extended Regs & TagsRegisterless Storage Register Reuse (0-c Alias Table)
Relative performance improvement with increased instruction window
Sp
eed
up
Exploiting Value Locality in Physical Register Files 28
Register Traffic
0%
4%
8%
12%
16%
bzip
2
cra
fty
eo
n
ga
p
gc
c
gzip
mc
f
pa
rse
r
pe
rl
two
lf
vo
rte
x
vp
r
am
mp
ap
plu
ap
si
art
eq
ua
ke
fac
ere
c
fma
3d
ga
lge
l
luc
as
me
sa
mg
rid
six
tra
ck
sw
im
wu
pw
ise
int-
av
g
fp-a
vg
0 1
Register read reduction with Registerless storage
0%
10%
20%
30%
40%
bzip
2
craf
ty
eon
gap
gcc
gzip
mcf
pars
er
perl
twol
f
vort
ex vpr
amm
p
appl
u
apsi art
equa
ke
face
rec
fma3
d
galg
el
luca
s
mes
a
mgr
id
sixt
rack
swim
wup
wis
e
int-
avg
fp-a
vg
Register write reduction with Registerless storage
Exploiting Value Locality in Physical Register Files 29
Conclusions
Value localityObserve duplication in physical register fileSignificant common value locality
Optimization schemesPhysical Register Reuse0’s and 1’s: Registerless Storage and Extended Registers and Tags
BenefitsReduced register requirementsReduced register trafficPower savings, better design
Questions
Exploiting Value Locality in Physical Register Files 31
Comparison
Value cache Identify {0, 1} Identify {0, 1}
Reuse register holding the same value. Free reg.
Free register Write 0 or 1 to extension, if available. Free register
Update rename map. Alias table or Re-broadcast
Update rename map. Re-broadcast {0, 1}
Recover ref. counts, aliastable, value cache
Reg. File with unique values
Reg. File without 0, 1 0, 1 in extensions
Exploit
Detect
Handle dep instns
Handle exception
Outcome
Physical Reg. Reuse Reg.less storage ERT
Exploiting Value Locality in Physical Register Files 32
Simulation Methodology
Alpha ISA. SimpleScalar based.Nops removed at the front-endRegister r31 not included in value locality measurements
Detailed out-of-order simulator4-wide machine, 14-stage pipeline, 1-cycle register file accessBase case: 128 physical registers, 256 entry instruction windowInstruction and data caches
64KB, 2-way set associative, 64-byte line size, 3-cycle access L2 is 2MB unified with 128-byte line size, 6-cycle access
BenchmarksSPEC CPU2000 suite (12 int and 14 fp benchmarks)Reference inputs run to completion
Exploiting Value Locality in Physical Register Files 33
Performance
-15%
-10%
-5%
0%
5%
10%
15%
20%
bzi
p2
craf
ty
eon
gap gcc
gzi
p
mcf
par
ser
per
l
two
lf
vort
ex vpr
amm
p
app
lu
apsi art
equ
ake
face
rec
fma3
d
gal
gel
luca
s
mes
a
mg
rid
sixt
rack
swim
wu
pw
ise
int-
avg
fp-a
vg
Register Reuse (1-c Alias Table)
Relative performance improvement with increased instruction window