ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS

Preview:

DESCRIPTION

ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS. Authors: Steffen Zeuch, Frank Huber, Johann-Christoph Freytag Humboldt-Universität zu Berlin {zeuchste,huber,freytag}@informatik.hu-berlin.de. 1. Motivation. B + -Tree: common index structure - PowerPoint PPT Presentation

Citation preview

ADAPTING TREE STRUCTURES FOR PROCESSING WITH SIMD INSTRUCTIONS

1

Authors: Steffen Zeuch, Frank Huber, Johann-Christoph FreytagHumboldt-Universität zu Berlin

{zeuchste,huber,freytag}@informatik.hu-berlin.de

MotivationB+-Tree: common index structure

Common node-internal search algorithm:

Binary search in O(log2n)

2

Can we do better?

Yes with SIMD!

Outline1. Background

2. Binary Search and SIMD

3. Segmented Tree

4. Segmented Trie

5. Evaluation

6. Conclusion

3

Single Instruction Multiple Data:

Available on CPU and GPU

Arithmetical, comparison, conversion, logical

SIMD

4

3 2

5 4

+2 +2

Add const to vector

Add two vectors

3 2 65 67

67 69

+

Compare two vectors

0 -1

673 265

Binary Search

5SeparatorSearch KeyExcludedSearch Space

Iteration

1

2

3

4

5

Search Key = 9

Outline1. Background

2. Binary Search and SIMD

3. Segmented Tree

4. Segmented Trie

5. Evaluation

6. Conclusion

6

Binary Search - two Separator

7SeparatorSearch KeyExcludedSearch Space

Search Key = 9Iteration

1

2

3

Binary Search + SIMD

8

0 -1

SIMD Register C

>=

Separator

Search KeyExcludedSearch Space

8 17 9 9SIMD

Register A

SIMD Register

B

Problem: SIMD on CPU

SIMD on CPU do not support Scatter and Gather functionality.

9

8 94 x 32-bit

SIMD Register 10 11

SIMD load(start position)

Solution: K-ary Search by Schlegel et al.

10SeparatorSearch KeyExcludedSearch Space

3-ary Search Tree(k = 3)

Linearized Order

Search Key = 9

Applied K-ary Search

11SeparatorSearch KeyExcludedSearch Space

3-ary Search Tree

Linearized Order Search Key = 91

2

3

Steffen Zeuch
Offset Calculation

Degree of Parallelism

12

SIMDBandwidth

SearchMethod

DataType

ParallelComparisons

128-bit

K-arySearch

8-bit 16

16-bit 8

32-bit 4

64-bit 2

BinarySearch

All 1

Outline1. Background

2. Binary Search and SIMD

3. Segmented Tree

4. Segmented Trie

5. Evaluation

6. Conclusion

13

Segmented Tree

14

Change inner-node search algorithm from commonlybinary search to k-ary search.

Problem: Unfilled Nodes

15

3-ary Search Tree

Linearized Order

K-ary requirement: multiple of k-1 keys Smax+1

ReorderingNew keys require reordering:

Sorting → Inserting → Linearizing

Exceptions:

Empty Node

Key is greater than the largest existing key

16

Segmented TreeAdvantages:

High resource utilization

Less iterations required

Binary Search: log2n vs. K-ary Search logkn

Disadvantages:

Reordering overhead

Large data types decrease performance

17

Outline1. Background

2. Binary Search and SIMD

3. Segmented Tree

4. Segmented Trie

5. Evaluation

6. Conclusion

18

Segmented Trie

19

Key (Dec)

Key (Hex)

Partial Key(Hex)

Level 1

Level 2

Segmented Trie

20

Segmented TrieAdvantages:

High SIMD search performance

Prefix compression

Early termination

Disadvantages:

Fix level count

Reordering overhead

21

Steffen Zeuch
64-bit data type: 2 parallel comparisons8-bit data type: 17 parallel comparisons

SegTree vs. SegTrie

22

SegTreeSegTree SegTrieSegTrie

Derived From

B+-Tree Prefix B-Tree

Number of Iterations Tree Height Max. #Level

(Early termination)

Number of Level Dynamic Static (Pre-defined)

DOP Depends onData Type

16 (8-bit)

Outline1. Background

2. Binary Search and SIMD

3. Segmented Tree

4. Segmented Trie

5. Evaluation

6. Conclusion

23

Test SetupHW/SW Configuration:

CPU: Intel Xeon 5520, 4 x 2,26 GHz

L1: 32KB, L2: 256 KB, L3: 8 MB, MM: 8 GB

Cacheline: 128 Byte, SIMD bandwidth: 128 Bit

Windows 7 64-bit Professional

Test Dataset:

Synthetically generated, ascending, starting at 024

Evaluation: Bitmask

25

Three Algorithms:

1. Bit Shifting

2. Case-Switch

3. PopCnt

0 -1 SIMD Register C

>=

8 17 9 9SIMD Register A

SIMD Register B

Evaluation: SegTree

26

Evaluation: SegTrie

27

Outline1. Background

2. Binary Search and SIMD

3. Segmented Tree

4. Segmented Trie

5. Evaluation

6. Conclusion

28

Our Contributions B+-Tree and prefix B-Tree using SIMD

Transformation and search algorithm for breadth-first and depth-first data layout

Three algorithms for interpreting a SIMD comparison result

Solution for an arbitrary key count

29

Thanks

Backup

30

Recommended