30
Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

  • Upload
    others

  • View
    28

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Code Specialization for Memory Efficient

Hash Tries

Michael Steindorfer, Jurgen Vinju Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Page 2: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

0%

25%

50%

75%

100%

MapGeneric Specialized

45%

100%

2

Page 3: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

0%

25%

50%

75%

100%

SetGeneric Specialized

22%

100%

3

Page 4: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

• memory usage vs runtime

• size of source code or binary

• platform specifics

Page 5: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Hash Tries

Page 6: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Hash TriesFast Immutable Data Structures on the JVM

Page 7: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Hash Tries(Wide) Hash-Prefix Trees with Array Nodes

Page 8: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

{32, 2, 4098, 34}

8

Page 9: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

2

320 1 2 3 4 5 6 7 8

...... 31

034

1 2 3 4 5 6 7 8...

... 31

20 1 2 3

40984 5 6 7 8

...... 31

(a) 33 and 2

320 1 2 3 4 5 6

...... 31

034

1 2 3 4 5 6...

... 31

20 1 2 3

40984 5 6

...... 31

(b) 33 and 2

Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.

insertion calculates the number’s hash code⇤:

hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032

hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232

hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232

hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232

We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.

To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.

Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.

In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):

• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.

⇤We assume a hash function for integers returns the argument, i.e. identity.

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

2

320 1 2 3 4 5 6 7 8

...... 31

034

1 2 3 4 5 6 7 8...

... 31

20 1 2 3

40984 5 6 7 8

...... 31

(a) 33 and 2

320 1 2 3 4 5 6

...... 31

034

1 2 3 4 5 6...

... 31

20 1 2 3

40984 5 6

...... 31

(b) 33 and 2

Figure 1. Inserting a sequence of numbers in a hash array mapped trie. The small index numbers refer to theposition in a sparse array.

insertion calculates the number’s hash code⇤:

hash(32) = . . . 000 00001 000002 = 0 0 0 0 0 1 032

hash(2) = . . . 000 00000 000102 = 0 0 0 0 0 0 232

hash(4098) = . . . 100 00000 000102 = 0 0 0 0 4 0 232

hash(34) = . . . 000 00001 000102 = 0 0 0 0 0 1 232

We first separate the hash codes in chunks of 5-bit to notate chunks as decimal values with rangesfrom 0 to 31. Then insertion places the values in a 32-nary tree (where each is encoded as a sparsearray), based on the hash code prefixes. The tree structure gets expanded until every prefix can beunambiguously stored.

To continue our example: 32 is inserted at the root node; 2 as well (because they do not sharea common prefix). 4098 shares the prefix path !0!2 with value 2, consequently it is placedunambiguously on level 3. 32 shares the prefix path !2 with 2 and 4098, but can be differentiatedon level 2 from both.

Note, that a chunk size of 5-bit for 32-bit hash codes results in trees with a maximal depth ofdlog32(232)e = 7.

In contrast Figure 3 illustrates an array-based hash-table. By comparing the visualizations of bothdata structures we can identify the following list of disadvantages of HAMTs over array-based hashtables (Section ?? provides evidence for our claims):

• Lookup has to follow a tree path of length between 1–7 nodes answer containment queries.Memory indirections and indexing into sparse arrays is less efficient than performing a singleindex-based lookup in continuous array.

⇤We assume a hash function for integers returns the argument, i.e. identity.

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

9

Page 10: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

abstract class TrieSet implements java.util.Set {

TrieNode root; int size;

class TrieNode { int bitmap; Object[] contentAndSubTries; } }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

10

Page 11: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

abstract class TrieSet implements java.util.Set {

TrieNode root; int size;

class TrieNode { int bitmap; Object[] contentAndSubTries; } }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

11

Page 12: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

abstract class TrieSet implements java.util.Set {

TrieNode root; int size;

class TrieNode { int bitmap; Object[] contentAndSubTries; } }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

12

Page 13: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

13

... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...

class TrieNode { int bitmap; Object[] contentAndSubTries; }

Page 14: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

class TrieNode { int bitmap; Object[] contentAndSubTries; }

14

... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } ...

Page 15: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

ExponentialNumber of Specializations

15

Page 16: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Memory Overhead per Pointer (Set, 32-bit)

0 Bytes

8 Bytes

16 Bytes

24 Bytes

32 Bytes

40 Bytes

48 Bytes

1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary

7,38,08,08,99,010,310,712,814,0

18,7

24,0

48,0

16

Page 17: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Frequency by Node Arity

0%

10%

20%

30%

40%

50%

60%

70%

0-ary 1-ary 2-ary 3-ary 4-ary 5-ary 6-ary 7-ary 8-ary 9-ary 10-ary 11-ary 12-ary

1%1%1%1%1%1%1%1%3%

14%

63%

1%0%

17

Page 18: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Arities % of Nodes

≤4 82%

≤8 86%

≤12 90%

18

Page 19: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Arities Specializations

≤4 31

≤8 511

≤12 8191

19

Page 20: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Avoiding Permutations

20

Page 21: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Arities Specializations

≤4 15 (31)

≤8 45 (511)

≤12 91 (8191)

21

Page 22: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

abstract class TrieSet implements java.util.Set { TrieNode root; int size;

interface TrieNode { ... } ... class NodeNode extends TrieNode { int bitmap; TrieNode nodeAtIndex0; TrieNode nodeAtIndex1; } class ElementNode extends TrieNode { int bitmap; Object keyAtIndex0; TrieNode nodeAtIndex1; } class NodeElement extends TrieNode { int bitmap; TrieNode nodeAtIndex0; Object keyAtIndex1; } class ElementElement extends TrieNode { int bitmap; Object keyAtIndex0; Object keyAtIndex1; } ... }

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

22

Page 23: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

23

4

0 2

320 1

0 434

2 4098

(a) Scala

320 2

034

1

20

40984

(b) Clojure

232

0

034

1

20

40984

(c) Clojure

Figure 3. Conceptual difference in tree layout between Clojure’s and Scala’s HAMT implementations.

Figure 4. Footprints of HAMT sets and HAMT maps in 32-bit and 64-bit environments. Defaulting to 5-bitprefix chunks.

subnode pointers. Michael IDon’t talk explicitly about Clojure/Scala, rather about mixed/separatedHAMT node designs.I

Division between internal nodes and leaf nodes. Scala’s HAMT implementations divide the treestructure into internal nodes and leaf nodes. The internal nodes amount for the hash-based prefixtree structure. The leaf nodes encapsulate the data tuple (i.e, a key in case of a set, a key/value pair

Copyright c� 0000 John Wiley & Sons, Ltd. Softw. Pract. Exper. (0000)Prepared using speauth.cls DOI: 10.1002/spe

abstract class TrieSet implements java.util.Set { TrieNode root; int size;

interface TrieNode { ... } ... class NodeNode extends TrieNode {

byte pos1; TrieNode nodeAtPos1; byte pos2; TrieNode nodeAtPos2; } class ElementNode extends TrieNode {

byte pos1; Object keyAtPos1; byte pos2; TrieNode nodeAtPos2; } class NodeElement extends TrieNode {

byte pos1; TrieNode nodeAtPos1; byte pos2; Object keyAtPos2; } class ElementElement extends TrieNode {

byte pos1; Object keyAtPos1; byte pos2; Object keyAtPos2; } ... }

Page 24: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Lookup Performance (lower is better)

0%

25%

50%

75%

100%

125%

150%

MapGeneric Specialized 0-4 Specialized 0-8 Specialized 0-12

138%138%130%

100%

24

Page 25: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Memory Usage (lower is better)

0%

25%

50%

75%

100%

Map Set

22%

45%

23%

46%52%

62%

100%100% Generic0-40-80-12

25

Page 26: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Memory Usage (lower is better)

0%

25%

50%

75%

100%

Map Set

22%

45%

23%

46%52%

62%

100%100% Generic0-40-80-12

26

Page 27: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

Memory Footprint Compared To Competition (lower is better)

0x

1x

2x

3x

4x

5x

Specialized 0-8 Clojure Scala

3,75x

2,2x

1x

4,9x

1,6x

1x

MapSet

27

Page 28: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

worst hash distribution ->

good memory performance

28

Page 29: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

best hash distribution ->

worst memory performance

29

Page 30: Code Specialization for Memory Efficient Hash Tries · Code Specialization for Memory Efficient Hash Tries Michael Steindorfer, Jurgen Vinju ... based on the hash code prefixes

best hash distribution ->

worst memory performancebest

30