Upload
martha-perkins
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Transparent Pointer CompressionTransparent Pointer Compressionfor Linked Data Structuresfor Linked Data Structures
June 12, 2005June 12, 2005
MSP 2005MSP 2005
http://llvm.cs.uiuc.edu/http://llvm.cs.uiuc.edu/
Chris LattnerChris [email protected]@cs.uiuc.edu
Vikram AdveVikram [email protected]@cs.uiuc.edu
Chris Lattner
Growth of 64-bit computingGrowth of 64-bit computing
64-bit architectures are increasingly common:64-bit architectures are increasingly common: New architectures and chips (G5, IA64, X86-64, …)New architectures and chips (G5, IA64, X86-64, …) High-end systems have existed for many years nowHigh-end systems have existed for many years now
64-bit address space used for many purposes:64-bit address space used for many purposes: Address space randomization (security)Address space randomization (security) Memory mapping large files (databases, etc)Memory mapping large files (databases, etc) Single address space OS’sSingle address space OS’s Many 64-bit systems have < 4GB of phys memoryMany 64-bit systems have < 4GB of phys memory
64-bits is still useful for its 64-bits is still useful for its virtual address spacevirtual address space
Chris Lattner
Cost of a 64-bit virtual address spaceCost of a 64-bit virtual address space
Pointers must be 64 bits (8 bytes) instead of 32 bits:Pointers must be 64 bits (8 bytes) instead of 32 bits: Significant impact for pointer-intensive programs!Significant impact for pointer-intensive programs!
Pointer intensive programs suffer from:Pointer intensive programs suffer from: Reduced effective L1/L2/TLB cache sizesReduced effective L1/L2/TLB cache sizes Reduced effective memory bandwidthReduced effective memory bandwidth Increased alignment requirements, etcIncreased alignment requirements, etc
Pointer intensive programs are increasingly common:Pointer intensive programs are increasingly common: Recursive data structures (our focus)Recursive data structures (our focus) Object oriented programsObject oriented programs
BIGGER POINTERSBIGGER POINTERS
Chris Lattner
Previously Published ApproachesPreviously Published Approaches
Simplest approaches: Use 32-bit addressingSimplest approaches: Use 32-bit addressing Compile for 32-bit pointer size “-m32”Compile for 32-bit pointer size “-m32” Force program image into 32-bits [Adl-Tabatabai’04]Force program image into 32-bits [Adl-Tabatabai’04] Loses advantage of 64-bit address spaces!Loses advantage of 64-bit address spaces!
Other approaches: Exotic hardware supportOther approaches: Exotic hardware support Compress pairs of values, speculating that pointer Compress pairs of values, speculating that pointer
offset is small [Zhang’02]offset is small [Zhang’02] Compress arrays of related pointers [Takagi’03]Compress arrays of related pointers [Takagi’03] Requires significant changes to cache hierarchyRequires significant changes to cache hierarchy
No previous fully-automatic compiler No previous fully-automatic compiler technique to shrink pointers in RDS’stechnique to shrink pointers in RDS’s
Chris Lattner
Our Approach (1/2)Our Approach (1/2)
1.1. Use Automatic Pool Allocation [PLDI’05] to Use Automatic Pool Allocation [PLDI’05] to partition heap into memory pools:partition heap into memory pools: Infers and captures pool homogeneity informationInfers and captures pool homogeneity information
Original heap Original heap layoutlayout
Layout after Layout after pool allocationpool allocation
Chris Lattner
Our Approach (2/2)Our Approach (2/2)
2.2. Replace pointers with 64-bit integer offsets Replace pointers with 64-bit integer offsets from the start of the poolfrom the start of the pool
Change *Ptr into *(PoolBase+Ptr)Change *Ptr into *(PoolBase+Ptr)
3.3. Shrink 64-bit integers to 32-bit integersShrink 64-bit integers to 32-bit integers Allows each pool to be up to 4GB in sizeAllows each pool to be up to 4GB in size
Layout after pointer Layout after pointer compressioncompression
Chris Lattner
Talk OutlineTalk Outline
Introduction & MotivationIntroduction & Motivation Automatic Pool Allocation BackgroundAutomatic Pool Allocation Background Pointer Compression TransformationPointer Compression Transformation Experimental ResultsExperimental Results ConclusionConclusion
Chris Lattner
1.1. Compute points-to graph:Compute points-to graph: Ensure each pointer has one targetEnsure each pointer has one target
““unification-based” approachunification-based” approach
2.2. Infer pool lifetimes:Infer pool lifetimes: Uses escape analysisUses escape analysis
3.3. Rewrite program:Rewrite program: malloc malloc poolalloc, free poolalloc, free poolfree poolfree Insert calls to poolinit/pooldestroyInsert calls to poolinit/pooldestroy Pass pool descriptors to functionsPass pool descriptors to functions
Automatic Pool AllocationAutomatic Pool Allocation
Pool 1Pool 1
Pool 2Pool 2
Pool 3Pool 3
Pool 4Pool 4
For more info: see MSP paper or talk at PLDI tomorrowFor more info: see MSP paper or talk at PLDI tomorrow
L1
List
L2
List
Pool 1Pool 1 Pool 2Pool 2
Chris Lattner
A Simple Pointer-intensive ExampleA Simple Pointer-intensive Example
A list of pointers to doublesA list of pointers to doubles
List *L = 0;List *L = 0;
for (…) {for (…) {
List *N = List *N = mallocmalloc(List);(List);
N->Next = L;N->Next = L;
N->Data = N->Data = mallocmalloc(double);(double);
L = N;L = N;
}}
L
double
List List
List
double
L
double
List
Points-to graphPoints-to graph
summarysummary
List *L = 0;List *L = 0;
for (…) {for (…) {
List *N = List *N = poolallocpoolalloc((PD1PD1, List);, List);
N->Next = L;N->Next = L;
N->Data = N->Data = poolallocpoolalloc((PD2PD2,double);,double);
L = N;L = N;
}}
Pool allocatePool allocate
PD1
PD2
Chris Lattner
Effect of Automatic Pool Allocation 1/2Effect of Automatic Pool Allocation 1/2
Heap is partitioned into separate poolsHeap is partitioned into separate pools Each individual pool is smaller than total heapEach individual pool is smaller than total heap
L
double
List
List *L = 0;List *L = 0;
for (…) {for (…) {
List *N = List *N = mallocmalloc(List);(List);
N->Next = L;N->Next = L;
N->Data = N->Data = mallocmalloc(double);(double);
L = N;L = N;
}}
List *L = 0;List *L = 0;
for (…) {for (…) {
List *N = List *N = poolallocpoolalloc((PD1PD1, List);, List);
N->Next = L;N->Next = L;
N->Data = N->Data = poolallocpoolalloc((PD2PD2,double);,double);
L = N;L = N;
}}
PD1
PD2 …
…
Chris Lattner
Effect of Automatic Pool Allocation 2/2Effect of Automatic Pool Allocation 2/2
Each pool has a descriptor associated with it:Each pool has a descriptor associated with it: Passed into poolalloc/poolfreePassed into poolalloc/poolfree
We know which pool each pointer points into:We know which pool each pointer points into: Given the above, we also have the pool descriptorGiven the above, we also have the pool descriptor e.g. “e.g. “NN”, “”, “LL” ” PD1 and PD1 and N->DataN->Data PD2 PD2
List *L = 0;List *L = 0;
for (…) {for (…) {
List *N = malloc(List);List *N = malloc(List);
N->Next = L;N->Next = L;
N->Data = malloc(double);N->Data = malloc(double);
L = N;L = N;
}}
List *List *LL = 0; = 0;
for (…) {for (…) {
List *List *NN = poolalloc( = poolalloc(PD1PD1, List);, List);
N->NextN->Next = = LL;;
N->DataN->Data = poolalloc( = poolalloc(PD2PD2,double);,double);
LL = = NN;;
}}
Chris Lattner
Talk OutlineTalk Outline
Introduction & MotivationIntroduction & Motivation Automatic Pool Allocation BackgroundAutomatic Pool Allocation Background Pointer Compression TransformationPointer Compression Transformation Experimental ResultsExperimental Results ConclusionConclusion
Chris Lattner
Index Conversion of a PoolIndex Conversion of a Pool
Force pool memory to be contiguous:Force pool memory to be contiguous: Normal PoolAlloc runtime allocates memory in chunksNormal PoolAlloc runtime allocates memory in chunks Two implementation strategies for this (see paper)Two implementation strategies for this (see paper)
Change pointers into the pool to integer Change pointers into the pool to integer offsets/indexes from pool base:offsets/indexes from pool base: Replace “Replace “*P*P” with “” with “*(PoolBase + P)*(PoolBase + P)””
A pool can be index converted if A pool can be index converted if pointers into it only point to heap pointers into it only point to heap memory (no stack or global mem)memory (no stack or global mem)
Chris Lattner
Index Compression of a PointerIndex Compression of a Pointer
Shrink indexes in type-homogenous poolsShrink indexes in type-homogenous pools Shrink from 64-bits to 32-bitsShrink from 64-bits to 32-bits
Replace structure definition & field accesses Replace structure definition & field accesses Requires accurate type-info and type-safe accessesRequires accurate type-info and type-safe accesses
struct List {struct List {
struct List *Next;struct List *Next;
int Data;int Data;
};};
struct List {struct List {
int64 Next;int64 Next;
int Data;int Data;
};};
struct List {struct List {
int32 Next;int32 Next;
int Data;int Data;
};};
List *L = malloc(16);List *L = malloc(16); L = malloc(16);L = malloc(16); L = malloc(8);L = malloc(8);
index conversionindex conversion index compressionindex compression
Chris Lattner
IndexIndex ConversionConversion Example 1 Example 1
L
double
List
Two pointers are Two pointers are compressiblecompressible
Previous ExamplePrevious Example
Index conversion changes Index conversion changes pointer dereferences, but pointer dereferences, but
not memory layoutnot memory layout
64-bit 64-bit integer integer indexesindexes
L
double
ListIndex Index ConvertConvert
PoolsPools
After Index ConversionAfter Index Conversion
Chris Lattner
IndexIndex CompressionCompression Example 1 Example 1Example afterExample after
index conversionindex conversion
64-bit 64-bit integer integer indicesindices
L
double
List
Compress pointers, Compress pointers, change accesses to change accesses to
and size of and size of structurestructure
L
double
List
32-bit 32-bit integer integer indicesindices
‘‘Pointer’ Pointer’ registersregisters remain 64-bitsremain 64-bits
Compress both indexes Compress both indexes from 64 to 32-bit intsfrom 64 to 32-bit ints
Chris Lattner
Impact of Type-Homogeneity/safety Impact of Type-Homogeneity/safety
Compression requires rewriting structures:Compression requires rewriting structures: e.g. malloc(32) e.g. malloc(32) malloc(16) malloc(16) Rewriting depends on type-safe memory accessesRewriting depends on type-safe memory accesses
We can’t know how to rewrite unions and other casesWe can’t know how to rewrite unions and other cases Must verify that memory accesses are ‘type-safe’Must verify that memory accesses are ‘type-safe’
Pool allocation infers type homogeneity:Pool allocation infers type homogeneity: Unions, bad C pointer tricks, etc Unions, bad C pointer tricks, etc non-TH non-TH Some pools may be TH, others notSome pools may be TH, others not
Can’t index compress ptrs in non-TH pools!Can’t index compress ptrs in non-TH pools!
Chris Lattner
Index Conversion Example 2Index Conversion Example 2
L
List
unknowntype
Heap pointer points from TH Heap pointer points from TH pool to a heap-only pool: pool to a heap-only pool: compress this pointer!compress this pointer!
L
List
unknowntype
Index conversion changes Index conversion changes pointer dereferences, but pointer dereferences, but
not memory layoutnot memory layout
64-bit 64-bit integer integer indexesindexes
Compression of TH memory, pointed to by non-TH memory
Chris Lattner
Index Compression Example 2Index Compression Example 2
L
List
unknowntype
Example afterExample after
index conversionindex conversion
64-bit 64-bit integer integer indexesindexes
Next step: compress the Next step: compress the pointer in the heappointer in the heap
unknowntype
Compress pointer in type-Compress pointer in type-safe pool, change offsets safe pool, change offsets
and size of structureand size of structure
64-bit 64-bit integer integer indexesindexes
32-bit 32-bit integer integer indexindex
L
List
Compression of TH memory, pointed to by non-TH memory
Chris Lattner
Pointer Compression Example 3Pointer Compression Example 3
L
Array
unknowntype
L
Array
unknowntype
Index convert non-TH Index convert non-TH pool to shrink TH pool to shrink TH
pointerspointers
L
Array
unknowntype
Index compress Index compress array of pointers!array of pointers!
Compression of TH pointers, pointing to non-TH memory
Chris Lattner
Static Pointer Compression Impl.Static Pointer Compression Impl.
Inspect graph of pools provided by APA:Inspect graph of pools provided by APA: Find compressible pointersFind compressible pointers Determine pools to index convertDetermine pools to index convert
Use rewrite rules to ptr compress program:Use rewrite rules to ptr compress program: e.g. if Pe.g. if P11 and P and P22 are compressed pointers, change: are compressed pointers, change:
PP11 = *P = *P22 P P11’ = *(int*)(PoolBase+P’ = *(int*)(PoolBase+P22’)’)
Perform interprocedural call graph traversal:Perform interprocedural call graph traversal: Top-down traversal from main()Top-down traversal from main()
Chris Lattner
Talk OutlineTalk Outline
Introduction & MotivationIntroduction & Motivation Automatic Pool Allocation BackgroundAutomatic Pool Allocation Background Pointer Compression TransformationPointer Compression Transformation Experimental ResultsExperimental Results ConclusionConclusion
Chris Lattner
Experimental Results: 2 QuestionsExperimental Results: 2 Questions
1.1. Does Pointer Compression improve the Does Pointer Compression improve the performance of pointer intensive programs?performance of pointer intensive programs?
Cache miss reductions, memory bandwidth Cache miss reductions, memory bandwidth improvementsimprovements
Memory footprint reductionMemory footprint reduction
2.2. How does the impact of Pointer Compression How does the impact of Pointer Compression vary across 64-bit architectures?vary across 64-bit architectures?
Do memory system improvements outweigh overhead?Do memory system improvements outweigh overhead?
Built in the LLVM Compiler Infrastructure: Built in the LLVM Compiler Infrastructure: http://llvm.cs.uiuc.edu/http://llvm.cs.uiuc.edu/
Chris Lattner
Static PtrComp Performance ImpactStatic PtrComp Performance Impact
UltraSPARC IIIi w/1MB Cache
PA PC PC/PA
bh 8MB 8MB 1.00
bisort 64MB 32MB 0.500.50
em3d 47MB 47MB 1.00
perimeter 299MB 171MB 0.570.57
power 882KB 816KB 0.93
treeadd 128MB 64MB 0.500.50
tsp 128MB 96MB 0.750.75
ft 9MB 4MB 0.510.51
ks 47KB 47KB 1.00
llubench 4MB 2MB 0.500.50
Peak Memory Usage1.0 = Program compiled with LLVM & PA but no PC
0
0.2
0.4
0.6
0.8
1
1.2
1.4
bh
bis
ort
em
3d
pe
rim
ete
r
po
we
r
tre
ea
dd
tsp ft
ks
llub
en
ch
Ru
nti
me
Ra
tio
(P
A =
1.0
)
NoPA Runtime Ratio
PtrComp Runtime Ratio
Chris Lattner
Evaluating Effect of ArchitectureEvaluating Effect of Architecture
Pick one program that scales easily:Pick one program that scales easily: llubench – Linked list micro-benchmarkllubench – Linked list micro-benchmark llubench has little computation, many dereferencesllubench has little computation, many dereferences
Best possible case for pointer compressionBest possible case for pointer compression
Evaluate how ptrcomp impacts scalability:Evaluate how ptrcomp impacts scalability: Compare to native and pool allocated versionCompare to native and pool allocated version
Evaluate overhead introduced by ptrcomp:Evaluate overhead introduced by ptrcomp: Compare PA32 with PC32 (‘compress’ 32 Compare PA32 with PC32 (‘compress’ 32 32 bits) 32 bits)
How close is ptrcomp to native 32-bit pointers?How close is ptrcomp to native 32-bit pointers? Compare to native-32 and poolalloc-32 for limit studyCompare to native-32 and poolalloc-32 for limit study
Chris Lattner
SPARC V9 PtrComp (1MB Cache)SPARC V9 PtrComp (1MB Cache)
0.E+00
1.E-08
2.E-08
3.E-08
4.E-08
5.E-08
6.E-08
7.E-08
0 500 1000 1500 2000 2500
Heap Size (no units)
Tim
e p
er n
od
e d
eref
eren
ce Normal 64
PoolAlloc 64
PtrComp 64
Normal 32
PoolAlloc 32
PtrComp 32
PtrComp overhead PtrComp overhead is very lowis very low
Significant Significant scalability impactscalability impact
Chris Lattner
AMD64 PtrComp (1MB Cache)AMD64 PtrComp (1MB Cache)
0.E+00
1.E-08
2.E-08
3.E-08
4.E-08
5.E-08
6.E-08
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Heap Size (no units)
Tim
e p
er
no
de
de
refe
ren
ce
Normal 64
PoolAlloc 64
PtrComp 64
Normal 32
PoolAlloc 32
PtrComp 32
PtrComp is PtrComp is fasterfaster than than PA32 on AMD64!PA32 on AMD64!
Similar scalability Similar scalability impactimpact
See paper for See paper for IA64 and IBM-SPIA64 and IBM-SP
Chris Lattner
Pointer Compression ConclusionPointer Compression Conclusion
Pointer compression can substantially reduce Pointer compression can substantially reduce footprint of pointer-intensive programs:footprint of pointer-intensive programs: … … without specialized hardware support!without specialized hardware support!
Significant perf. impact for some programs:Significant perf. impact for some programs: Effectively higher memory bandwidthEffectively higher memory bandwidth Effectively larger cachesEffectively larger caches
Dynamic compression for full generality:Dynamic compression for full generality: Speculate that pools are small, expand if notSpeculate that pools are small, expand if not More investigation needed, see paper!More investigation needed, see paper!
Questions?Questions?
http://llvm.cs.uiuc.edu/http://llvm.cs.uiuc.edu/