Upload
marlene-smith
View
308
Download
2
Tags:
Embed Size (px)
Citation preview
Sándor HémanMarcin ZukowskiNiels NesLefteris SidirourgosPeter Boncz
Positional Update Handlingin Column Stores
UPDATE IN PLACE:A Poison Apple?Jim Gray, 1981
“..for performance reasons, most disc-based systems have been seduced into updating the data in place.”
30 years of hardware improvements in sequential/throughput beating random/latency…. in-place less feasible every year.
alternative: differential approach.In column stores, in-place updating is by now clearly infeasible
Problem: Column Store Updates
• I/O proportional to number of attributes– I/O blocks large and compressed– Sometimes even replicated– Read-Optimized Update-Unfriendly
• Table often kept ordered on sort-key (SK) attributes– Uniform update load scattered write access
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan
• Challenges:– Efficiently maintainable data-structure– Minimize Merge impact for read-only queries
Naïve Approach: Delta Tables
• For each table, maintain two update friendly row-store tables:– INS(C1..Cn)– DEL(SK1..SKm)– MOD = DEL + INS
store prod new qty
London stool N 10
London table N 20
Paris rug N 1
Paris stool N 5
Base table: inventorySort-Key (SK): [store, prod]
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
Inserts table: INS
store prod
Paris rug
Deletes table: DEL
Naïve Approach: Delta Tables
store prod new qty
London stool N 10
London table N 20
Paris rug N 1
Paris stool N 5
Base table: inventorySort-Key (SK): [store, prod]
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
Inserts table: INS
store prod
Paris rug
Deletes table: DEL
• Rewrite table scans:
MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))
Naïve Approach: Delta Tables
• Rewrite table scans:
MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))
for up-to-date image• Expensive!
– I/O to scan SK ‘merge’ columns; also if querydoes not need SK cols
– Each query pays CPU effort to locate the same change positions over and over again
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
London stool N 10
London table N 20
Paris stool N 5
Actual table: inventorySort-Key (SK): [store, prod]
The Idea: Positional Updates
• Remember the position of an update rather than its SK values– Merge once at write Read-Optimized approach– No need to scan SK columns– Scan can skip less CPU overhead
Notation:• TABLEx state of TABLE at time x• SID(t): StableID
– Position of tuple t in immutable base TABLE0 Stable
• RIDx(t): RowID– Position of visible tuple t at time x VOLATILE!– SID(t) = RID0(t)
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 0
0 Berlin cloth Y 20 1
0 Berlin table Y 10 2
0 London chair N 30 3
1 London stool N 10 4
2 London table N 20 5
3 Paris rug N 1 6
4 Paris stool N 5 7
SID/RID Example
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
TABLE1
SID STORE PROD NEW QTY RID
0 London chair N 30 0
1 London stool N 10 1
2 London table N 20 2
3 Paris rug N 1 3
4 Paris stool N 5 4
TABLE0
SIDs and RIDs
• RID(t) = SID(t) + ∆(t)• ∆(t) = #inserts before t – #deletes before t
= RID(t) – SID(t)• SID and RID are monotonically increasing
– organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆)• Positional Delta Tree (PDT)
– SIDs are stable– Only need to maintain cumulative ∆ on path root leaf
PDT Example
STORE PROD NEW QTY
Berlin table Y 10
Berlin cloth Y 20
Berlin chair Y 5
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
0
2 1
SID
∆
0 0
ins ins
i2 i1
SID
type
value
0
ins
i0
SID
type
value
Insert Value Table
i0
i1
i2
SID STORE PROD NEW QTY RID
0 London chair N 30 0
1 London stool N 10 1
2 London table N 20 2
3 Paris rug N 1 3
4 Paris stool N 5 4
TABLE0
PDT Example
TABLE1
DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 0
0 Berlin cloth Y 20 1
0 Berlin table Y 10 2
0 London chair N 30 3
1 London stool N 10 4
2 London table N 20 5
3 Paris rug N 1 6
4 Paris stool N 5 7
STORE PROD NEW QTY
Berlin table Y 10
Berlin cloth Y 20
Berlin chair Y 5
0
2 -1
SID
∆
0 0
ins ins
i2 i1
SID
type
value
3
del
d0
SID
type
value
Insert Value Table
i0
i1
i2
PDT Example
TABLE2
INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4)
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 0
0 Berlin cloth Y 20 1
0 London chair N 30 2
1 London stool N 10 3
2 London table N 20 4
4 Paris stool N 5 5
Insert at RID = 5
STORE PROD NEW QTY
Berlin table Y 20
Berlin cloth Y 5
Berlin chair Y 10
0
2 -1
SID
∆
0 0
ins ins
i2 i1
SID
type
value
3
del
d0
SID
type
value
Insert Value Table
i0
i1
i2
STORE PROD NEW QTY
Paris rack Y 4
Berlin cloth Y 20
Berlin chair Y 5
0
2 -1
SID
∆
0 0
ins ins
i2 i1
SID
type
value
3 3
ins del
i0 d0
SID
type
value
Insert Value Table
i0
i1
i2
RID 5 > 0 + 2
PDT Example
0 0
ins ins
i2 i1
SID
type
value
0 1
0 1RID
∆
0
ins
i4
SID
type
value
2
2RID
∆
1
ins
i3
SID
type
value
3
4RID
∆
3 3
ins del
i0 d0
SID
type
value
4 5
7 8RID
∆
0
2 1
SID
∆
RID
∆ 2
2
3
1 0
SID
∆
RID
∆ 4
7
1
3 1
SID
∆
RID
∆ 3
4
INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4)INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4)
Separator SIDs
Subtree ∆
Separator RIDs
Running ∆
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDT
generalization:• PDT contains all differences in time [lo,hi]
Table
PDT
PDT
PDT
lohi
PDTt1t2
PDTt0t1
consecutive t2=t1
PDT t2t3
PDT t0t1
vs are
Table
PDT
PDT
PDT
PDT
PDT
PDT PDTt2t3
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDT
generalization:• PDT contains all differences in time [lo,hi]
Table
lohi
consecutive t2=t1
aligned t2=t0
“same base”
PDT t2t3
PDT t0t1
vs are
Table
PDT PDT
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDT
generalization:• PDT contains all differences in time [lo,hi]
Table
lohi
consecutive t2=t1
aligned t2=t0
“same base”
overlapping [t2,t3] overlaps [t0,t1]“uncomparable” / “incompatible”
PDT t2t3
PDT t0t1
vs are
Table
PDT PDT
PDT
Stacking for Isolation
• ‘lock’ PDT down for further updates– Immutable read-PDT BIG: main memory resident
• ‘stack’ empty PDT on top– Updateable write-PDT SMALL: L2 cache resident– Note: PDTs are consecutive
• once in a while changes are propagated– Propagate() operation
• Requires consecutive PDTs
Stable Table
Read-PDT
Write-PDT
TABLEx
Propagate()
Read-PDT
Stable Table
Read-PDT
Write-PDT
TABLEx
Write-PDT
Trans PDT
CopyWrite-PDT
TransactionState
Snapshot Isolation
• Transaction creates snapshot copy of write-PDT
• Updates go into trans-PDT
• On commit, Propagate() trans-PDT into write-PDT
Pro
pag
ate
()
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
CopyWrite-PDT
TransA Trans
PDT
CopyWrite-PDT
TransB
• Two concurrent transactions
Optimistic Concurrency Control
Stable Table
Read-PDT
Trans PDT
TABLEx
CopyWrite-PDT
TransA Trans
PDT
CopyWrite-PDT
TransB
Propagate()
Write-PDT
• Two concurrent transactions• A commits before B
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
TransA Trans
PDT
CopyWrite-PDT
TransB
Pro
paga
te()
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
TransA
TransB
Serialize()Trans PDT
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort
= succeeds if no conflict= write set intersection
Consecutive!Trans PDT
Optimistic Concurrency Control
Stable Table
Read-PDT
Trans PDT
TABLEx
TransA Trans
PDT
TransB
Write-PDT
Prop
agat
e()
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort
= succeeds if no conflict= write set intersection
Extend to any number of concurrent transactions by serializing against all PDTs of transactions that committed during its lifetime
(a.k.a. backward looking OCC)
Serialize()
Concluding..
• PDTs speed-up differential update merging– Reduced I/O volume– Reduced CPU merge overhead
• Tree structure – logarithmic lookup & maintenance of volatile RIDs– main operations: Merge(), Propagate(), Serialize()
• PDTs are stackable, and capture Write-Set– Great structure for Snapshot Isolation
• Formal definitions, algorithms and benchmarks in paper