24
C C M M L L C C M M L L Dynamic Code Mapping Dynamic Code Mapping Techniques for Limited Techniques for Limited Local Memory Systems Local Memory Systems Seungchul Jung Compiler Microarchitecture Lab Department of Computer Science and Engineering Arizona State University 06/21/22 1

Dynamic Code Mapping Techniques for Limited Local Memory Systems

  • Upload
    kiele

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Dynamic Code Mapping Techniques for Limited Local Memory Systems. Seungchul Jung Compiler Microarchitecture Lab Department of Computer Science and Engineering Arizona State University. Multicore Landscape. Singlecore Architecture. Power. Temperature. NVidia. IBM. Heat. Reliability. - PowerPoint PPT Presentation

Citation preview

Page 1: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Dynamic Code Mapping Dynamic Code Mapping Techniques for Limited Techniques for Limited Local Memory SystemsLocal Memory Systems

Seungchul JungCompiler Microarchitecture Lab

Department of Computer Science and EngineeringArizona State University

04/22/231

Page 2: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Multicore LandscapeMulticore Landscape

04/22/232

Singlecore ArchitectureSinglecore Architecture

Multicore ArchitectureMulticore Architecture

Power Temperature

Heat ReliabilityNVidia IBM

Intel

Page 3: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Multi-core MemoryMulti-core Memory

• Critical issues with cache in DMS– Scalability– Coherency Protocols

04/22/233

SPUSPU

LSLS

DMADMA

Element Interconnect BusElement Interconnect Bus

PPUPPU MemoryControllerMemory

ControllerBus Interface

ControllerBus Interface

Controller

SPUSPU

LSLS

DMADMA

SPUSPU

LSLS

DMADMA

SPUSPU

LSLS

DMADMA

SPUSPU

LSLS

DMADMA

SPUSPU

LSLS

DMADMA

SPUSPU

LSLS

DMADMA

SPUSPU

LSLS

DMADMA

Page 4: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Core Memory Core Memory ManagementManagement

• Local Memory size is limiting factor

• Need for Automatic Management– Application developers are already busy

• Code, Variable, Heap and stack

04/22/234

int global;

F1(){int var1, var2;global = var1 + var2;F2();

}

int global;

F1(){int var1, var2;DLM.fetch(global);global = var1 + var2;DLM.writeback(global);

ILM.fetch(F2);F2();

}

Page 5: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

(c) Local Memory

F2F2

F3F3

F1F1

Code Management Code Management MechanismMechanism

04/22/235

(d) Main Memory

heapheap

variablevariable

stackstack

codeF2F2

F1F1F3F3

F1F1

F2F2

F3F3

F1

F2

F3

(a) Application Call Graph

SECTIONS { OVERLAY { F1.o F3.o } OVERLAY { F2.o }}

(b) Linker.Script

Page 6: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Code Management Code Management ProblemProblem

04/22/236

REGION

REGION

REGION

•••

• # of Regions and Function-To-Region Mapping– Two extreme cases

• Wise code management due to NP-Complete– Minimum data transfer with given space

Local Memory Code Section

Page 7: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Related WorkRelated Work• Cluster functions into regions

– minimize the intra-cluster interference

• ILP formulation [2,3,4,5]– Intractable for large application

• Heuristics are proposed– Best-fit[6], First-Fit[4], SDRM[5]

04/22/237

1. Egger et al. Scratchpad memory management.. EMSOFT '06

2. Steinke et al. Reducing energy consumption.. ISSS '02

3. Egger et al. A dynamic code placement.. CASES '06

4. Verma et al. Overlay techniques.. VLSI’06

5. Pabalkar et al. SDRM: Simultaneous.. HIPC’08

6. Udayakumaran et al. Dynamic allocation for.. ACM Trans.’06

Page 8: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Limitations of Previous Limitations of Previous Works 1Works 1

04/22/238

F1(){ F2(); F3();}

F2(){ for(i=0;i<10;i++){ F4(); } for(i=0;i<100;i++){ F5(); }}

F3(){ for(i=0;i<10;i++){ F6(); F7(); }}

F1

F3

F6 F7

F2

F4 F5

1 1

10 10 10 100

F1

F2 F3

1 1

L1 L2

F4 F5

L3

F6 F7

1 1 1

1KB

1KB 1KB

1KB 1KB 1KB 1KB

1KB

1KB 1KB

1KB 1KB 1KB 1KB

10 100 10

(b) Call Graph

(a) Example Application(c) GCCFG

1 1 1 1

Clear Execution

Order

Page 9: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Limitations of Previous Limitations of Previous Works 2Works 2

04/22/239

F1, 2KB

F2, 1.5KB

1

1

1

F3, 0.4KB

F4, 0.2KB

(a) Call Graph

F1(){ F2();}

F2(){ F3();}

F3(){ F4();}

(a) Example

F1

F2

Regoin 02Kb

Region 11.5Kb F2,F3

F1,F4

F2,F3

2Kb + 0.4Kb

1.5Kb + 0.4Kb

Regoin 0

Region 1

(b) Intermediate Mapping

F1

F2,F3,F4

2Kb + 0.2Kb

1.5Kb + 0.2Kb+

0.4Kb + 0.2Kb

Regoin 0

Region 1

(c) NOT considering other

functions

(d) Considering other functions

Regoin 02Kb

Region 11.5Kb

Regoin 02Kb

Region 11.5Kb

2Kb + 0.2Kb

0+

0.4Kb + 0.2Kb

Regoin 0

Region 1

21%

Page 10: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Our ApproachOur Approach

04/22/2310

F2

F1

L2

F3 L1 F5

F4

1

1 1

100

1

200

F1() { F2(); for(int I = 0; I < 200; i++){ F5(); }}

F2() { F3(); for(int I = 0; I < 100; i++){ F4(); }}

Page 11: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

FMUM HeuristicFMUM Heuristic

04/22/2311

1KB

1.5KB

0.5KB

2KB

1.5KB

1KB

F1

F2

F3

F4

F5

F6

F2

1.5KB

1.5KB

F3

F4

F6

0.5KB

2KB

1KB

F1,F5 1.5KB

1.5KB

F3 0.5KB

F4 2KB

(a) Start (b) Next step (c) Final

F1

F5

F2

F6

Maximum (7.5KB) Given (5.5KB)

Page 12: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

New Region

New Region

FMUP HeuristicFMUP Heuristic• Minimum (2KB) Given Size

(5KB)

04/22/2312

2KB

2KB

1.5KB

(a) START

(b) STEP1

(e) FINAL

1.5KB

F1 F2 F3

F4 F5 F6

1.5KB

(c) STEP2

(d) STEP3

Page 13: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Interference Cost Interference Cost CalculationCalculation

04/22/2313

F2

F1

L2

F3 L1 F5

F4

1

3 1

100

1

200

Page 14: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Experiments SetupExperiments Setup

04/22/2314

FMUM FMUP SDRM

Page 15: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Typical Performance Typical Performance ResultResult

04/22/2315

FMUP performs

better

FMUM performs

better

Page 16: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Number of Times of Number of Times of ImprovementImprovement

04/22/2316

Pick the better of FMUM and FMUP

82% of time, FMUM + FMUP gives better result

Page 17: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Average 12% reduction in Average 12% reduction in runtimeruntime

04/22/2317

FMUM + FMUP gives better Perf. by

12%

Page 18: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Utilizing Given Code Utilizing Given Code SpaceSpace

04/22/2318

Given code space is fully

utilized

Page 19: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Efficient in-Loop-Functions Efficient in-Loop-Functions MappingMapping

04/22/2319

In-loop-functions are

mapped separatly

Page 20: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Increase Map-abilityIncrease Map-ability

04/22/2320

100% mappability Guarantee

d

Page 21: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Impact of Different GCCFG Weight Impact of Different GCCFG Weight AssignmentAssignment

04/22/2321

1.04

0.96 Can Reduce compile time

overhead

Page 22: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Performance w/ Increased SPU Performance w/ Increased SPU ThreadsThreads

04/22/2322

Scalability with increased number of

cores

Page 23: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

ConclusionConclusion• Trend of Computer Architecture

– Multicore with Limited Local Memory System

• Memory Management is required– Code, Variable, Heap and Stack– Better performance with limited resource

• Limitations of previous works– Call Graph and fixed Interference Cost

• Two new heuristics (FMUM, FMUP)– Overall Performance Increase by 12%– Tolerable Compile Time Overhead

04/22/2323

Page 24: Dynamic Code Mapping Techniques for Limited Local Memory Systems

CCMMLLCCMMLL

Contributions and Contributions and OutcomesOutcomes

• Contributions– Problem formulation using GCCFG– Updating interference cost between

functions

• Outcomes– Software release

(www.public.asu.edu/~sjung)– Paper submission to ASAP2010

• Plans– Journal submission prepared

04/22/2324