18
Cloud Computing Lecture Column Store – alternative organization for big relational data

Cloud Computing Lecture Column Store – alternative organization for big relational data

Embed Size (px)

Citation preview

Page 1: Cloud Computing Lecture Column Store – alternative organization for big relational data

Cloud Computing Lecture

Column Store – alternative organization for big relational data

Page 2: Cloud Computing Lecture Column Store – alternative organization for big relational data

C-store C-store is Read-optimized, for OLAP

type apps Traditional DBMS, write-optimized

(optimized for online transactions) Based on records(rows)

Page 3: Cloud Computing Lecture Column Store – alternative organization for big relational data

C-Store What are the cost-sensitive major

factors in query processing? Size of database Index or not Join

Current hardware configuration and what a DBMS can do… Cheap storage – allow distributed redundant

data store Fast CPUs – compression/decompression Limited disk bandwidth – reduce I/O

Page 4: Cloud Computing Lecture Column Store – alternative organization for big relational data

C-store Supporting OLAP (online analytic

processing) operations Optimized read operations Balanced write performance Address the conflict between writes and reads

Fast write – append records Fast read – indexed, compressed

Think if data organized in columns, what are the

unique challenges (different from the row-organization)?

Page 5: Cloud Computing Lecture Column Store – alternative organization for big relational data

C-store’s features Column based store saves space

Compression is possible Index size is smaller

Multiple projections Allow multiple indices Parallel processing on the same attributes Materialized join results

Separation of writeable store and read-optimized store Both write/read are optimized Transactions are not blocked by write locks

Page 6: Cloud Computing Lecture Column Store – alternative organization for big relational data

Data model Same as relational data model

Tables, rows, columns Primary keys and foreign keys Projections

From single table Multiple joined tables

Example

EMP1 (name, age)EMP2 (dept, age, DEPT.floor)EMP3 (name, salary)DEPT1(dname, floor)

EMP(name, age, dept, salary)DEPT(dname, floor)

Normal relational model Possible C-store model

Page 7: Cloud Computing Lecture Column Store – alternative organization for big relational data

Physical projection organization Sort key

each projection has one Rows are ordered by sort key Partitioned by key range

Linking columns in the same projection Storage key – (segment id, key, i.e.,offset in

segment)

Linking projections To reconstruct a table Join index

Page 8: Cloud Computing Lecture Column Store – alternative organization for big relational data

Conceptual organization

column

Segment:by sort keyrange

Sort key column

Seg id offset

Join index

Projection 1

Projection 2

Page 9: Cloud Computing Lecture Column Store – alternative organization for big relational data

Architectural consideration between writes and reads

Read often needs indices to speedup

Write often index unfriendly: needs to update indices frequently

Use “read store” and “write store”

Page 10: Cloud Computing Lecture Column Store – alternative organization for big relational data

Read store: Column encoding Use compression schemes and indices

Self-order (key), few distinct values (value, position, # items) Indexed by clustered B-tree

Foreign-order (non-key), few distinct values (value, bitmap index) B-tree index: position values

Self-order, many distinct values Delta from the previous value B-tree index

Foreign-order, many distinct values Unencoded

Page 11: Cloud Computing Lecture Column Store – alternative organization for big relational data

Write Store Same structure, but explicitly use

(segment, key) to identify records Easier to maintain the mapping Only concerns the inserted records

Tuple mover Copies batch of records to RS

Delete record Mark it on RS Purged by tuple mover

Page 12: Cloud Computing Lecture Column Store – alternative organization for big relational data

Tuple mover Moves records in WS to RS Happens between read-only

transactions Use merge-out process

Page 13: Cloud Computing Lecture Column Store – alternative organization for big relational data

How to solve read/write conflict Situation: one transaction updates the

record X, while another transaction reads X.

Use snapshot isolation

Page 14: Cloud Computing Lecture Column Store – alternative organization for big relational data

Benefits in query processing Selection – has more indices to use Projection – some “projections”

already defined Join – some projections are

materialized joins Aggregations – works on required

columns only

Page 15: Cloud Computing Lecture Column Store – alternative organization for big relational data

Evaluation Use TPC-H – decision support queries Storage

Page 16: Cloud Computing Lecture Column Store – alternative organization for big relational data

Query performance

Page 17: Cloud Computing Lecture Column Store – alternative organization for big relational data

Query performance Row store uses materialized views

Page 18: Cloud Computing Lecture Column Store – alternative organization for big relational data

Summary: the performance gain Column representation – avoids reads

of unused attributes Storing overlapping projections –

multiple orderings of a column, more choices for query optimization

Compression of data – more orderings of a column in the same amount of space

Query operators operate on compressed representation