Boost Write Performance for DBMS on Solid State Drive
Yu LI
Backgrounds (1)
SSD is a complex storage device flash chips (i.e., NAND) controller hardware proprietary software (i.e., firmware) block device interface via a standard
interconnect (e.g., USB, IDE, SATA). In general:
Sequential read/write, random read is fast. Random write is slow.
Backgrounds (2)
Some DBMS applications trend to generate random write stream Online Transaction Processing (OLTP)
Small and frequent insert/delete/update Concurrence
In-Page Logging Approach
In-Page Logging Approach [Lee, Sigmod 07]
Idea: turn random write to log appending
However In-page logging area nee
ds hardware support. For SSD, not practical.
Backgrounds (3)
Question: is there any solution to improve write performance without modifying the firmware of SSD ?
Systemetic performance studies show that not all kinds of “random write” on SSD are slow.
Write performance depends more on write pattern on SSD. [uFlip CIDR2009]
uFlip resultsFocused write
e.g., write inside a <8MB file
Partitioned Sequential Write writee.g., 1,50,2,51,3,52,…
Ordered Sequential Write writee.g., 1,3,5,7,9,…
Our Idea (1)
Write StreamDecomposition
If we can collect enough write requests:
• Isolate the write request of good write patterns
• Cluster write requests to form instance of focused write
SSD
Our Idea (2)
StableBuffer1
3
SSD
Decomposition 2
Through StableBuffer:• Two writes (1,3) in good write pattern
(1x~4x)• One random read (2) (at most 1x)=> Total 9x
Directly:=> 17x~30x
StableBuffer
DBMS Buffer ManagerDBMS
Transactions
StableBuffer Translation Table
Write
Write Stream Decompositors
Main Memory SSD
Write
Read
System Overview
Components of StableBuffer Manager
StableBuffer: pre-allocated focused are on SSD. E.g., pre-allocated file < 8MB.
StableBuffer Translation Table: A table for entries like “<12345678AB, 32>” Fast lookup, insert and delete
Write Stream Decompositors: A group programs running in concurrent threads Decomposite instance of good write pattern
More on StableBuffer Translation Table
Reverse index embedded in pages for StableBuffer Translation Table Destinations and timestamp For recovery in case of system crush
When recovery, page at offset O whose destination is D, compare its timestamp T to the latest update time T0 of page at destination D If T> T0 , insert <D,O> into table. Otherwise, the slot O is free.
Query on StableBuffer
When get a request of retrieving some page at D we need to check whether there is an entry “<D,
O>” in StableBuffer Translation Table. If there is, return page at Oth slot in StableBuffer. Otherwise issue a read request to SSD for the page at
D.
So it is better to implement StableBuffer Translation Table as a hash table on D.
index
Sequential Write Stream
Partitioned Sequential Write Stream
Focused Write Stream
StableBuffer Translation Table
Decomposite
Sequential Write Decompositor
Petitioned Sequential Write Decompositor
Focused Write Decompositor
Decompositors
Share
Ordered Sequential Write Stream
Ordered Sequential Write Decompositor
Share
index
index
index
Decompositors (1)
Decompositors (2)
Decompositors run in concurrent threads. The results could share same entries of StableB
uffer Translation Table. Select the results of decompositors
select the instance of write pattern which performs better on SSD.
select bigger instance. E.g., 1,2,56,57,6,7,42,43,3,4,...
We select the results according to
}min{i
i
L
T
Decompositors (3)
Sequential Write Decompositor Maintain a search tree index on the destination addresses
of mapping entries Partitioned Write Decompositor
share the search tree index of Sequential Write Decompositor
Ordered Write Decompositor share the search tree index of Sequential Write Decompo
sitor Focused Write Decompositor
maintains a hash index of entries of StableBuffer TranslationTable. entry “< D;O >” will be hashed into bucket MD /
Preliminary Result of Evaluation
Prototype of StableBuffer manager Accept write trace file On Windows desktop pc, 16GB MTron MSD-SATA-3525
SSD page size 4KB StableBuffer is 8MB = 2048 pages
Trace Oracle 11g running TPC-C benchmark simulates an enterprise OLTP retailing system, which kee
ping insert/delete/update records from a 8GB database 488623 write requests
Preliminary Result of Evaluation
0
0.5
1
1.5
2
2.5
3
Direct StableBuffer
Ban
dwid
th (M
B/s
)
1.5x
Q & A
Thanks