47
Vectorization vs. Compilation in Query Execution Juliusz Sompolski Peter Boncz Marcin Zukowski June 13th, 2011 DaMoN 2011, Athens, Greece

Vectorization vs. Compilation in Query Execution · 3 Vectorization vs. Compilation Vectorization CIDR 2005 P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

  • Vectorization vs. Compilationin Query Execution

    Juliusz SompolskiPeter BonczMarcin Zukowski

    June 13th, 2011DaMoN 2011, Athens, Greece

  • 2

    Interpreted DBMS

  • 3

    Vectorization vs. Compilation

    Vectorization

    CIDR 2005

    P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100:Hyper-Pipelining Query Execution. In Proc. CIDR,Asilomar, CA, USA, 2005.

  • 4

    Vectorization vs. Compilation

    JIT compilation

    Vectorization

    CIDR 2005

    P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100:Hyper-Pipelining Query Execution. In Proc. CIDR,Asilomar, CA, USA, 2005.

  • 5

    Vectorization vs. Compilation

    • Sure :-). These are orthogonal techniques, and they can be combined.

    • Our study: Is it worth combining them?– If you have vectorization (us!), should you do

    compilation?– If you have compilation, should you process

    data in vectors?• Our answer: Yes it is!

  • 6

    Compilation: Single-loop

    • Compilation as proposed so far is “single-loop” compilation.– Processing as in tuple-at-a-time system.

    for each tuple if(oid >= 100 && oid = 100 AND oid

  • 7

    Vectorization: Multi-loop

    • Vectorization is “multi-loop” by definition.– Basic operations performed vector-at-a-time.– Interpretation overhead amortized.– Materialization of each step’s result.

    while(tuples) Get vector of n tuples; for(i = 0,m=0; i= 100) sel[m++] = i; for(i = 0,k=0; i

  • 8

    Multi-loop compilation

    • Multi-loop compilation is often best!– Compiling small fragments takes less compilation

    time and is more reusable.– Sometimes benefits of a tight loop are bigger than

    materialization cost.while(tuples) Get vector of n tuples; for(i = 0,m=0; i= 100) sel[m++] = i; for(i = 0,k=0; i

  • 9

    Case studies

    • Projections

    • Selections

    • Hash lookups

  • 10

    • Projections

    • Selections

    • Hash lookups

    Case studies

    Multi-loop on modern hardware:

    Easier SIMD

    Avoids branch mispredictions

    Improves memoryaccess pattern

  • 11

    Hash lookup algorithm

    pos = B[hash_keys(probe_keys)] if (pos) { do { // pos == 0 reserved for miss. if (keys_equal(probe_keys, V[pos].keys)) { fetch_value_columns(V[pos]); break; // match } } while(pos = next in chain); // collision or miss }

  • 12

    Hash lookup algorithm

    pos = B[hash_keys(probe_keys)] if (pos) { do { // pos == 0 reserved for miss. if (keys_equal(probe_keys, V[pos].keys)) { fetch_value_columns(V[pos]); break; // match } } while(pos = next in chain); // collision or miss }

    Interpretation:•Type of keys.•Multi-attribute keys.•Type of fetched columns.•Number of fetched columns.

  • 13

    Single-loop hash lookup:avoid interpretation

    for (i=0; i

  • 14

    Single-loop hash lookup:dependencies

    for (i=0; i

  • 15

    Single-loop hash lookup:dependencies

    for (i=0; i

  • 16

    Single-loop hash lookup:dependencies

    for (i=0; i

  • 17

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 18

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 19

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 20

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 21

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 22

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 23

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 24

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 25

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 26

    Single-loop hash lookup:branch predictability

    for (i=0; i

  • 27

    for (i=0; i

  • 28

    for (i=0; i

  • 29

    for (i=0; i

  • 30

    for (i=0; i

  • 31

    for (i=0; i

  • 32

    for (i=0; i

  • 33

    Multi-loop hash lookup

    Check k1 for pos[] Recheck k2 for pos[]

    Fetch v1 for match[] Fetch v2 for match[]

    Fetch v3 for match[]Fetch new pos[] from next in miss[]Loop untilpos[] empty

    miss[]

    match[]

    Hash vector of k1 Rehash vector of k2 Fetch vector of pos[] from B

    Selectmiss

    match

    // base = &V[0].key1;for(i=0;i

  • 34

    Multi-loop hash lookup

    Check k1 for pos[] Recheck k2 for pos[]

    Fetch v1 for match[] Fetch v2 for match[]

    Fetch v3 for match[]Fetch new pos[] from next in miss[]Loop untilpos[] empty

    miss[]

    match[]

    Hash vector of k1 Rehash vector of k2 Fetch vector of pos[] from B

    Selectmiss

    match

    // base = &V[0].key1;for(i=0;i

  • 35

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 36

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 37

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 38

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 39

    Multi-loop hash lookup

    // base = &V[0].key1;for(i=0;i

  • 40

    Single-loop hash lookup

    for (i=0; i

  • 41

    for (i=0; i

  • 42

    Multi-loop compiled hash lookupHash/rehash and fetch vector of Pos[] from B

    For each element pos in Pos[]:Check keys of V[pos].if(match):

    fetch V[pos] val1, val2, val3 into resultelse:

    fetch V[pos] next into new Pos[]

    Repeat untilPos[] empty

  • 43

    Multi-loop compiled hash lookupHash/rehash and fetch vector of Pos[] from B

    For each element pos in Pos[]:Check keys of V[pos].if(match):

    fetch V[pos] val1, val2, val3 into resultelse:

    fetch V[pos] next into new Pos[]

    Repeat untilPos[] empty

    Independent memory accessesIn different loop iterations

    Reads tuple once.

  • 44

    Hash lookup benchmarks

    • Experiment 1:Probing with varying match-ratio.

    • Multi-loop compiled is most robust.

  • 45

    Hash lookup benchmarks

    • Experiment 2:Reduced size of B[ ] array = more hash collisions

    • Multi-loop compiled is most robust.

  • 46

    Conclusions

    • Multi-loop compilation is often the best solution!– Better than vectorization alone.– Better than compilation working tuple-at-a-

    time.• More examples and case studies proving

    this point in the paper.

  • 47

    Thank you!