The Adaptive Radix Tree: ARTful Indexing for Main- Memory ... The Adaptive Radix Tree: ARTful Indexing

  • View
    1

  • Download
    0

Embed Size (px)

Text of The Adaptive Radix Tree: ARTful Indexing for Main- Memory ... The Adaptive Radix Tree: ARTful...

  • The Adaptive Radix Tree: ARTful Indexing for Main- Memory Databases

    Presentation by Aaron Kabcenell

    The adaptive radix tree: ARTful indexing for main-memory databases. Viktor Leis, Alfons Kemper, Thomas Neumann. International Conference on Data Engineering (ICDE), 2013

  • https://en.wikipedia.org/wiki/The_Starry_Night#/media/ File:Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg

  • What is the problem?

  • Main Memory Indexing

    Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and

    Compilation. Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann,

    Alfons Kemper. ACM SIGMOD International Conference on Management of Data. 2016

    ?

  • Why is it important?

  • OLTP Workloads limited by Index Performance

    Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and

    Compilation. Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann,

    Alfons Kemper. ACM SIGMOD International Conference on Management of Data. 2016

  • Why is it hard?

  • Hash Tables

    Fast, O(1) access time

    x Point queries only

    x Overflow causes periodic latency

    A study of index structures for main memory database management systems. T. J. Lehman and M. J. Carey. International Conference on Very Large Databases (VLDB),1986

  • Trees

    Keeps elements ordered

    x Not ideal for modern hardware (cache misses, pipeline stalls)

    Modern B-Tree Techniques, by Goetz Graefe, Foundations and Trends in Databases, 2011

  • Can we get fast, fully featured indexing?

  • Why do existing solutions not work?

  • T-Trees

    A study of index structures for main memory database management systems. T. J. Lehman and M. J. Carey. International Conference on Very Large Databases (VLDB),1986

    Balance of space overhead and search speed

    x Significant amounts of data stored per node, but only two pointers used

    x Poor cache behavior

  • Cache Sensitive B+-Trees

    J. Rao and K. A. Ross, “Making B+ trees cache conscious in main memory”, SIGMOD, 2000.

    Stores only one child pointer per node

    Can fan out more and keep more nodes in cache line

    x Many comparisons cause pipeline stalls

  • Fast Architecture Sensitive Trees

    C. Kim et al., “FAST: fast architecture sensitive tree search on modern cpus and gpus”, SIGMOD 2010.

    Binary Tree

    SIMD Blocking

    Three Level Hierarchy

    Reduce comparisons by matching structure to SIMD vector size

    Reduce cache misses by matching structure to cache line size

    Pointer-free, stored in arrays and use offset calculations

    x Expensive updates

  • Radix Tree

    • Two factors determine performance: • k: key length in bits

    • s: span – number of bits in key stored in each node

    • Tree has k/s levels

    • Node has 2s pointers

    A

    N R

    D T Y E T

  • Radix Tree

    Complexity of operations based on key length, not key number

    Keys are ordered and stored implicitly

     Insertion order independent creation with no rebalancing

    x Mostly studied for character strings

    x Poor space usage due to large number of null paths

    A

    N R

    D T Y E T

  • What is the core intuition for the solution?

  • Adaptive Nodes to Reduce Space Consumption

  • Adaptive Node Types

  • Path Compression

  • Worst-Case Space Consumption

  • Binary-Comparable Keys

    • Unsigned integers: • Binary representation already sorted

    • Signed Integers: • Flip sign bit and store as unsigned integers

    • Floating Point Numbers: • Separate into positive, negative, normalized, denormalized, NaN, Inf, or 0

    • Reorder and store as unsigned integers

    • Character Strings: • Standard libraries available

    • Null: • Add one byte to key length to encode Null value

    • Compound Keys: • Transform attributes individually and concatenate

  • What is the setup of analysis/experiments? Is it sufficient?

  • Micro Benchmarks

    • Use 32-bit integers as keys • Path compression disabled for short keys

    • Two different key distributions • Dense – keys ranging from [1,tree size]

    • Sparse – each bit equally likely either 0 or 1

  • Search Performance 65K Single Thread 16M Single Thread

    256M Single Thread 16M Multi-Thread

  • Caching Effects

  • Updates

  • TPC-C Benchmark

    • OLTP benchmark describing a merchandising company • Includes selects, inserts, deletes

    • Write-heavy

    • Integrates ART into HyPer • Depends heavily on index performance

  • TPC-C Benchmark Using HyPer

  • Gaps and Next steps?

  • Gaps and Next Steps

    • Own implementation of competing data structures

    • Sparse vs dense key performance

    • Ideal node number and size?

    • Synchronizing concurrent updates