41
Faculty of Computer Science José Nelson Amaral © 2008 MPADS: Memory- Pooling-Assisted Data Splitting Stephen Curial - Xymbiant Systems Inc. Peng Zhao - Intel Corporation J. Nelson Amaral - University of Alberta Yaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory FROM SUN MICROSYSTEMS

Faculty of Computer Science © 2008 José Nelson Amaral MPADS: Memory- Pooling-Assisted Data Splitting Stephen Curial - Xymbiant Systems Inc. Peng Zhao -

Embed Size (px)

Citation preview

Faculty of Computer Science

José Nelson Amaral © 2008

MPADS: Memory-Pooling-Assisted Data SplittingStephen Curial - Xymbiant Systems Inc.Peng Zhao - Intel CorporationJ. Nelson Amaral - University of AlbertaYaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory

FROM SUN MICROSYSTEMS

© 2006

Department of Computing Science

ISMM 2008

Goal

What:

– Improve spatial locality

Where:

– Linked-based data structures

How:

– Pooling similar structures together

– Grouping same fields from multiple objects together

© 2006

Department of Computing Science

ISMM 2008

Goal (cont.)

Why:

– Because we can

– Allow easy-to-write, easy-to-read, easy-to-maintain code to improve performance

What compiler:

– IBM XL compiler suite

Limitation:

– Needs more precise pointer analysis to benefit from more opportunities

© 2006

Department of Computing Science

ISMM 2008

Most Relevant Earlier Work

Pool Allocation

– Lattner and Adve (CGO 04, PLDI 05)

Reference Affinity

– Zhong, Orlovich, Shen, Ding (PLDI 04)

– Rabbah and Palem (TECS 03)

Array Reshaping

– Zhao, Cui, Gao, Silvera, Amaral (TOPLAS 07)

© 2006

Department of Computing Science

ISMM 2008

A refreshing outcome

“MPADS is not the first implementation of the

combination of memory pools and splitting of

pointer-based data structures.”

“MPADS is still not delivering its full

potential on standard benchmarks in the

IBM XL compiler.”

Reviewer’s Comment:

“The technique only worked for Olden, and did nothing for

SPECcpu2000 (but the authors get bonus points for being honest

about that.)”

© 2006

Department of Computing Science

ISMM 2008

The Cost of Programming Productivity

Easy-to-read and easy-to-maintain code often

results in lower runtime performance.

StudentClass University

© 2006

Department of Computing Science

ISMM 2008

The Cost of Programming Productivity

Abstraction

Inheritance

StudentProfessor Support Staff

Person

© 2006

Department of Computing Science

ISMM 2008

The Cost of Programming Productivity

Data Encapsulation

Person

Date of BirthAddress

Driver Lic.

Citizenship

Name

Gender

Student

FacultyDate of Adm

DepartmentProgram

Univ. ID

Classes Enr.Grades

© 2006

Department of Computing Science

ISMM 2008

A possible data layout

FacultyDate of Adm

DepartmentProgram

Univ. ID

Classes Enr.Grades

Student:

1 byte4 bytes

1 byte2 bytes

4 bytes

4 bytes4 bytes4 bytes

Date of BirthAddress

Driver Lic.Gender

Name

Citizenship

Person:

4 bytes32 bytes

3 bytes1 byte

32 bytes

16 bytes

© 2006

Department of Computing Science

ISMM 2008

Data in Memory

Mem

ory

Add

ress

0 1 2 3 4 5 6 7

Univ. ID Date of Adm.

Fa. De Progr. Classes Enr.

Grades

Univ. ID Date of Adm.

Univ. ID Date of Adm.

Fa. De Progr. Classes Enr.

Grades

0

8

16

24

32

40

48

Mem

ory

Add

ress

0 1 2 3 4 5 6 7

Name

Date of Birth

Address

Dr. Lic. Ge

Citizenship

8000

8008

8016

8024

8032

8040

8048

8056

8064

8072

8080

© 2006

Department of Computing Science

ISMM 2008

Assume a Cache Organization

POWER5 Cache Organization

– L1 Data Cache: 32 Kbytes, 128-byte cache lines

– L2 Cache: 1.44 Mbytes, 128-byte cache lines

– L3 Cache: 32 Mbytes, 512-byte cache lines

© 2006

Department of Computing Science

ISMM 2008

Cache OrganizationBytes

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127

0

1

•••

255

Cac

he L

ines

2

3

4

5

© 2006

Department of Computing Science

ISMM 2008

Example: A search through the data structuresBytes

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127

0

1

•••

255

Cac

he L

ines

2

3

4

5

How many Computing Science students are younger

than 23 year old?

Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.

© 2006

Department of Computing Science

ISMM 2008

Example: A search through the data structuresBytes

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127

0

1

•••

255

Cac

he L

ines

2

3

4

5

Student structure: For every 24 bytes loaded, reads

either 1 or 5.

Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.

© 2006

Department of Computing Science

ISMM 2008

Example: A search through the data structuresBytes

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127

0

1

•••

255

Cac

he L

ines

2

3

4

5

Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.

Name DofB G Citizens. Address DL.

0 32 64 68 72 ••• 127

© 2006

Department of Computing Science

ISMM 2008

Example: A search through the data structuresBytes

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127

0

1

•••

255

Cac

he L

ines

2

3

4

5

Person structure: For every 88 bytes loaded, reads 4.

Univ.ID Adm. F. D. Prg •Class. Grades Univ.ID Adm. F. D. Prg Class.

Name DofB G Citizens. Address DL.

0 32 64 68 72 ••• 127

© 2006

Department of Computing Science

ISMM 2008

Data Reshaping for Arrays of StructuresStudent *ListOfStudents;

….

ListOfStudents = (Student*)malloc(….);

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Univ. ID Date of Adm. Fa. De Progr. •••Classes Enr. Grades

Univ. ID

Date of Adm.

Fa.

De

Progr.

Univ. ID

Date of Adm.

Fa.

De

Progr.

Univ. ID

Date of Adm.

Fa.

De

Progr.

••• •••

•••

•••

•••

•••

•••

•••

© 2006

Department of Computing Science

ISMM 2008

Maximal Structure Splitting

ID1 Adm1 Dep1Fac1 Clas1

ID2 Adm2 Dep2Fac2 Clas2

ID3 Adm3 Dep3Fac3 Clas3

ID1 ID2 ID3

Adm1 Adm2 Adm3

Fac1 Fac2 Fac3

Dep1 Dep2 Dep3

Clas1 Clas2 Clas3

Grad1 1

Grad2 2

Grad3 3

Grad1 Grad2 Grad3

1 2 3

© 2006

Department of Computing Science

ISMM 2008

Implementation of Pool Allocation

Intercept mallocs and

replace by pool

allocation: each

structure layout gets

its own pool.

If pool is full another

pool can be allocated

ID1

Adm1

Fac1

Dep1

Clas1

Grad1

1

ID2

Adm2

Fac2

Dep2

Clas2

Grad2

2

ID3

Adm3

Fac3

Dep3

Clas3

Grad3

3

ID4

Adm4

Fac4

Dep4

Clas4

Grad4

4

ID5

Adm5

Fac5

Dep5

Clas6

Grad5

6

ID7

Adm7

Fac7

Dep7

Clas7

Grad7

7

© 2006

Department of Computing Science

ISMM 2008

Implementing Pool Allocation

The following types of statements need to be

transformed:

– Memory allocation statements

– Memory reference statements

© 2006

Department of Computing Science

ISMM 2008

Transforming Memory Allocation Statements

Extended pointer analysis to maintain a set of

allocation sites associated with each alias set.

When an alias set is selected for transformation:

– Replace each associated allocation with a call to the pool

allocation function.

© 2006

Department of Computing Science

ISMM 2008

Transforming Memory References

Update address calculation for loads and stores:

– Uniform splitting --- all fields are the same size

• Address calculation is simpler

• Restricts application of technique or

• Requires memory padding

– Non-uniform splitting --- fields of different size

• Address calculation is more involved

• Can be applied more generally

© 2006

Department of Computing Science

ISMM 2008

Non-UniformExample

struct example { type_3 a; /* 3 bytes */type_7 b; /* 7 bytes */type_5 c; /* 5 bytes */};

s

How can the compiler

find the address to

access:

s->c

pool_base = s & 0xF…F000

index = (s – pool_base) / 3

field_base = (3+7)*num_structs_per_pool

s->c = *(s + field_base - 3*index + 5*index)

s->c = *(s + field_base + (5-3)*index)

field_base

pool_base

© 2006

Department of Computing Science

ISMM 2008

Data Transformation Safety

How the compiler decide whether it is safe to

transform a given structure?

– Based on the results of the pointer analysis.

© 2006

Department of Computing Science

ISMM 2008

Is it safe to transform a given data structure?

Structure layout: two structures have the same layout if

each field has the same offset and the same length.

Build alias set

– If a pointer P may point to the structure

• Then all the objects in the points-to set of the alias set of P

must have the same layout.

Data Struct 1

Data Struct 2

P

Q

Alias set

Points-to set

© 2006

Department of Computing Science

ISMM 2008

Experimental Results - Micro Benchmarks (Speedup)

Power 4 Power 5

© 2006

Department of Computing Science

ISMM 2008

Experimental Results - Micro Benchmarks(Instruction Count)

Power 4 Power 5

© 2006

Department of Computing Science

ISMM 2008

Experimental Results - Micro Benchmarks(L2 Cache Misses)

Power 4 Power 5

© 2006

Department of Computing Science

ISMM 2008

Experimental Study - Olden & LLU (Speedup)Power 4 Power 5

bhem

3d

healt

h

power tsp llu bh

em3d

healt

h

power tsp llu

© 2006

Department of Computing Science

ISMM 2008

Active Hardware Prefetch Streams

0

5

10

15

20

25

30

35

40

45

bh em3d health power tsp llu

Benchmark

Prefetches to L2 (in Millions)

Baseline

Pool Alloc

MPADS

Active Prefetching Streams from Memory to L2 (in POWER4)

© 2006

Department of Computing Science

ISMM 2008

Related Work

Pool Allocation– Lattner & Adve - PLDI 2005

• Data Structure Analysis Array Based Structure Splitting

– Zhong et al. - PLDI 2004• Reference affinity / affinity based splitting• Memory Trace

Safe Pointer Based Structure Splitting– Jeon, Shin and Han - CC 2007

• Similar to non-uniform splitting• Affinity based splitting uses static analysis

– Regular expression framework– Guarantee Safety with regular expressions

© 2006

Department of Computing Science

ISMM 2008

Final Remarks

Our Compiler-Research Guiding Principles

– Programming productivity

• Enables programmers to be efficient

• Enables easy-to-write/easy-to-maintain programs

– Execution Time Performance

• Recover runtime efficiency (time, storage or energy) through

– Code analysis

– Improved code generation

– Knowledge of computer architecture and memory hierarchy

© 2006

Department of Computing Science

ISMM 2008

© 2006

Department of Computing Science

ISMM 2008

© 2006

Department of Computing Science

ISMM 2008

Pointer Analysis Primer

The following statement:

int *a = malloc(…);

Creates:

• a memory object (A),

• a pointer (a),

• and a points-to relation (a,A):

a A

© 2006

Department of Computing Science

ISMM 2008

Alias Analysis Primer: Andersen’s X Steensgaard’s

a = &b;

Program: Steensgaard (unification-based):

Andersen:

S = {(a,b)}

S = {(a,b)}

a

b

ba

(Shapiro/Horwitz, PPL97)

© 2006

Department of Computing Science

ISMM 2008

a = &b;b = &c;

Program:

Andersen:

S = {(a,b); (b,c)}

S = {(a,b); (b,c)}c

a

b

cba

(Shapiro/Horwitz, PPL97)

Alias Analysis Primer: Andersen’s X Steensgaard’s

Steensgaard (unification-based):

© 2006

Department of Computing Science

ISMM 2008

a = &b;b = &c;a = &d;

Program:

Andersen:

S = {(a,b); (b,c)}

S = {(a,b); (b,c); (a,d)}c

a

b

d

cba

(Shapiro/Horwitz, PPL97)

Alias Analysis Primer: Andersen’s X Steensgaard’s

Steensgaard (unification-based):

What should happenin the Steensgaard analysis?

© 2006

Department of Computing Science

ISMM 2008

a = &b;b = &c;a = &d;

Program:

Andersen:

S = {(a,b); (b,c); (a,d); (d,c)}

S = {(a,b); (b,c); (a,d)}c

a

b

d

c(b,d)a

(Shapiro/Horwitz, PPL97)

Alias Analysis Primer: Andersen’s X Steensgaard’s

Steensgaard (unification-based):

© 2006

Department of Computing Science

ISMM 2008

a = &b;b = &c;a = &d;d = &e;

Program:

Andersen:

S = {(a,b); (b,c); (a,d); (d,c)}

S = {(a,b); (b,c); (a,d)}c

a

b

d

c(b,d)a

(Shapiro/Horwitz, PPL97)

And now?

Alias Analysis Primer: Andersen’s X Steensgaard’s

Steensgaard (unification-based):

© 2006

Department of Computing Science

ISMM 2008

a = &b;b = &c;a = &d;d = &e;

Program:

Andersen:

S = {(a,b); (b,c); (a,d); (d,c); (d,e); (b,e)}

S = {(a,b); (b,c); (a,d); (d,e)}c

a

b

d e

(c,e)(b,d)a

(Shapiro/Horwitz, PPL97)

Alias Analysis Primer: Andersen’s X Steensgaard’s

Steensgaard (unification-based):