Upload
maurice-randall
View
213
Download
0
Embed Size (px)
Citation preview
Cloud Computing Lecture
Column Store – alternative organization for big relational data
C-store C-store is Read-optimized, for OLAP
type apps Traditional DBMS, write-optimized
(optimized for online transactions) Based on records(rows)
C-Store What are the cost-sensitive major
factors in query processing? Size of database Index or not Join
Current hardware configuration and what a DBMS can do… Cheap storage – allow distributed redundant
data store Fast CPUs – compression/decompression Limited disk bandwidth – reduce I/O
C-store Supporting OLAP (online analytic
processing) operations Optimized read operations Balanced write performance Address the conflict between writes and reads
Fast write – append records Fast read – indexed, compressed
Think if data organized in columns, what are the
unique challenges (different from the row-organization)?
C-store’s features Column based store saves space
Compression is possible Index size is smaller
Multiple projections Allow multiple indices Parallel processing on the same attributes Materialized join results
Separation of writeable store and read-optimized store Both write/read are optimized Transactions are not blocked by write locks
Data model Same as relational data model
Tables, rows, columns Primary keys and foreign keys Projections
From single table Multiple joined tables
Example
EMP1 (name, age)EMP2 (dept, age, DEPT.floor)EMP3 (name, salary)DEPT1(dname, floor)
EMP(name, age, dept, salary)DEPT(dname, floor)
Normal relational model Possible C-store model
Physical projection organization Sort key
each projection has one Rows are ordered by sort key Partitioned by key range
Linking columns in the same projection Storage key – (segment id, key, i.e.,offset in
segment)
Linking projections To reconstruct a table Join index
Conceptual organization
column
Segment:by sort keyrange
Sort key column
Seg id offset
Join index
Projection 1
Projection 2
Architectural consideration between writes and reads
Read often needs indices to speedup
Write often index unfriendly: needs to update indices frequently
Use “read store” and “write store”
Read store: Column encoding Use compression schemes and indices
Self-order (key), few distinct values (value, position, # items) Indexed by clustered B-tree
Foreign-order (non-key), few distinct values (value, bitmap index) B-tree index: position values
Self-order, many distinct values Delta from the previous value B-tree index
Foreign-order, many distinct values Unencoded
Write Store Same structure, but explicitly use
(segment, key) to identify records Easier to maintain the mapping Only concerns the inserted records
Tuple mover Copies batch of records to RS
Delete record Mark it on RS Purged by tuple mover
Tuple mover Moves records in WS to RS Happens between read-only
transactions Use merge-out process
How to solve read/write conflict Situation: one transaction updates the
record X, while another transaction reads X.
Use snapshot isolation
Benefits in query processing Selection – has more indices to use Projection – some “projections”
already defined Join – some projections are
materialized joins Aggregations – works on required
columns only
Evaluation Use TPC-H – decision support queries Storage
Query performance
Query performance Row store uses materialized views
Summary: the performance gain Column representation – avoids reads
of unused attributes Storing overlapping projections –
multiple orderings of a column, more choices for query optimization
Compression of data – more orderings of a column in the same amount of space
Query operators operate on compressed representation