36
column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 5

class 5 column stores 2daslab.seas.harvard.edu/classes/cs165/doc/class_slides/... · 2017. 9. 6. · MonetDB/X100: Hyper-Pipelining Query Execution Peter A. Boncz, Marcin Zukowski,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • column stores 2.0prof. Stratos Idreos

    HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

    class 5

    http://daslab.seas.harvard.edu/classes/cs165/

  • CS165, Fall 2015 Stratos Idreos /312

    what just happened?where is my data?

    email, cloud, social media, …

    can we design systems that let us know what is going on?

    worth thinking about 2.0

  • CS165, Fall 2015 Stratos Idreos /313

    cool papers 2.0

    The Case for RodentStore: An Adaptive, Declarative Storage SystemPhilippe Cudré-Mauroux, Eugene Wu, Samuel Madden In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2009

    Abstraction Without Regret in Database Systems Building: a ManifestoChristoph KochIEEE Data Eng. Bull. 37(1): 70-79 (2014)

    declarative processing and design

  • CS165, Fall 2015 Stratos Idreos /314

    design doc (optional)think, design, create 1-2 page PDF doc and ask for feedback

    by email or ideally during office hours or sections

    do not worry about perfection: fail fast wrong ideas ok if you eventually find out they are wrong :) (holds for midterms as well)

  • CS165, Fall 2015 Stratos Idreos /31

    am I keeping up ok?

    5

    1) follow concepts in class 2) keep up with project timeline & readings

    if not, then OH & sections for more help

  • CS165, Fall 2015 Stratos Idreos /31

    feedback on starter code, api & tests

    6

  • CS165, Fall 2015 Stratos Idreos /317

    registers

    on chip cache2x

    on board cache10x

    memory100x

    disk100Kx

    Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Inovations award

    Pluto2 years

    New York1.5 hours

    this building10 min

    this room1 min

    my head~0

  • CS165, Fall 2015 Stratos Idreos /31

    the way we store data defines the possible (efficient) access methods

    8

  • CS165, Fall 2015 Stratos Idreos /319

    employee(id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int)

    (1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, tel9, city9, salary9)

    data storage blocks < pages < files

    file

  • CS165, Fall 2015 Stratos Idreos /3110

    free_offset, N, offset1-length1, offset2-lenght2,…

    free space

    slotted page

    scan null

    update var length

  • CS165, Fall 2015 Stratos Idreos /3111

    row-store column-storeABCD A B C D

  • CS165, Fall 2015 Stratos Idreos /3112

    a1 a2 a3 a4 a5 a6

    b1 b2 b3 b4 b5 b6

    c1 c2 c3 c4 c5 c6

    virtual ids/ positional alignment

    positional lookups/joinsA(i) = A + i * width(A)

    tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6

    A B C

    fixed-width + dense

    columns do not need to have the

    same width

  • CS165, Fall 2015 Stratos Idreos /31

    todaycolumn-stores 2.0

    13

  • CS165, Fall 2015 Stratos Idreos /3114

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2015 Stratos Idreos /3114

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2015 Stratos Idreos /3114

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2015 Stratos Idreos /3114

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2015 Stratos Idreos /3114

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2015 Stratos Idreos /3114

    select min(C) from R where A min

    sequential access patterns, max 1 if

  • CS165, Fall 2015 Stratos Idreos /3115

    working over fixed width & dense columns

    for (i=0;iv

    res[j++]=i

    no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values

    for (i=0;i

  • CS165, Fall 2015 Stratos Idreos /3116

    B

  • CS165, Fall 2015 Stratos Idreos /3117

    B

  • CS165, Fall 2015 Stratos Idreos /3118

    disk memoryA B C D

    A

    ABCrow-store

    engineearly tuple

    reconstruction/materialization

    option1

    option2

    column-store

    engine

  • CS165, Fall 2015 Stratos Idreos /3119

    possible data flow patternstuple at a time block/vector at a time column at a time

    B

  • CS165, Fall 2015 Stratos Idreos /3120

    select min(C) from R where A

  • CS165, Fall 2015 Stratos Idreos /3121

    CEO/Co-founder of Vectorwise (now Actian) now: “changing the world, one terabyte at a time” co-founder of Snowflake

    the beer analogy

    Marcin Zukowski, PhD

  • CS165, Fall 2015 Stratos Idreos /3122

    registers

    on chip cache

    on board cache

    memory

    disk

    CPU

    chea

    per

    fast

    erop1 op2

    query plan

    A B

    A Bop3

    A

    size of vector

  • CS165, Fall 2015 Stratos Idreos /3123

    tuple at a time - good for minimizing memory footprint bulk processing - good minimizing functional overhead

    vectorized processing - somewhere in the between

  • CS165, Fall 2015 Stratos Idreos /3124

    history/timeline

    ~1960s

    tuple at a time

    1980s: ideas about block processing

    2005: vectorwise

    tuple at a time tuple at a time

    >2010: industry adoption

  • CS165, Fall 2015 Stratos Idreos /31

    project: column-at-a-time

    bonus: vectorized processing

    25

  • CS165, Fall 2015 Stratos Idreos /3126

    update row7=(A=a,B=b,C=c,D=d)

    row-store column-storeABCD A B C D

    vs

    which is better to update and why? how much does it cost to update a single row? (think about pages, data movement) how to update in column-stores? (query plan + algorithms)

  • CS165, Fall 2015 Stratos Idreos /31

    A

    27

    A B C D

    B C D

    base data pending updates

    updatequery

    periodically

  • CS165, Fall 2015 Stratos Idreos /3128

    A B C D

    columns copy rows copy

    fractured mirrors

    ABCD

    optimizer

    query

  • CS165, Fall 2015 Stratos Idreos /3129

    reading

    The Design and Implementation of Modern Column-store Database Systems (Sections: all -4.6 & 4.8)by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden

    IEEE Data Engineering Bulletin, 35(1), March 2012 Special Issue on Column-stores (9 short overview papers)

  • CS165, Fall 2015 Stratos Idreos /3130

    research papers

    Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz, Stefan Manegold, Martin Kersten In Proc. of the Very Large Databases Conference (VLDB), 1999

    MonetDB/X100: Hyper-Pipelining Query Execution Peter A. Boncz, Marcin Zukowski, Niels NesIn Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2005Materialization Strategies in a Column-Oriented DBMSDaniel Abadi, Daniel Myers, David DeWitt, Samuel Madden In Proc. of the Inter. Conference on Data Engineering (ICDE), 2007

    Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009

  • DATA SYSTEMSprof. Stratos Idreos

    class 5

    column-stores 2.0