Upload
darlene-johns
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
1
Wenguang Wang Richard B. BuntDepartment of Computer Science
University of Saskatchewan
November 14, 2000
Simulating DB2 Buffer Pool Management
2
Outline
• Background
• Problem
• Methodology
• Simulation results
• Future work
3
Background• What is buffer pool
Buffer Pool
Database on Disks
Writes Reads
Applications
Upper layer of DBMSDBMS
4
Problem
• Buffer pool management is important to the performance of any DBMS
• The config and tuning of buffer pool is not a easy problem for the database administrator
• The buffer pool management of a DBMS is very complex
• It is hard to study and test the buffer pool management algorithm directly
5
Methodology
• Trace-driven simulation provides an effective approach
• Compare to the real DBMS:– Simulator is easier to be controlled
– Simulator requires much lower computing resources (CPU, memory, disk, running time)
– New algorithm is easier to be implemented and tested in the simulator
– Changes that cannot be done or are not easy to do in the real system can be simulated in the simulator
6
Methodology
• Create trace-driven simulation tools– Collect trace– Process trace– Develop simulator– Verify simulator
• Perform experiments by the simulator– Understand the effect of buffer pool parameters– Give suggestions to the tuning of buffer pool– Design and test alternate buffer pool algorithms
7
System Configuration
• DBMS — IBM DB2– Relational DBMS– Distributed DBMS which supports multiple
nodes. Because buffer pools on different nodes are independent, only the single node DB2 is studied
• Workload — the TPC-C benchmark– An On-Line Transaction Processing benchmark– Many clients send simple queries simultaneously
to the DBMS on the server side– A large amount of data are updated by the queries
8
System Configuration (cont.)
• DB2 version 6.1 running on Windows NT Server 4.0
• TPC-C database – Small application: 50-warehouses (5GB data)
spanning over 9 physical disks
9
Trace Collection• Trace tools of DB2
• Suspend the TPC-C benchmark periodically to record big enough trace
Buffer Pool
Database on Disks
Writes Reads
Applications
Upper layer of DBMSDBMSTrace point
10
Trace Volume
• 60M buffer pool requests
• 200K TPC-C transactions
• Equivalent to 30 minutes TPC-C run when no traces are recorded
11
Buffer Pool Simulator
• To simulate the buffer pool management algorithm and the disk activities
• About 8000 lines C++ code
12
Architecture of the Simulator
13
Clock-Pointer
Page Cleaners
Cleaning pages
DB2 Buffer Pool Algorithm
Clock Algorithm
Threshold: triggers the page cleaning activity
Database on Disks
14
Buffer Pool Algorithm (cont.)
Threshold
Asynchronous writesperformed by page
cleaners
Dirty pages
Clean Region
Dirty Region
Expand
BufferPool
Reads
Synchronouswrites
Database on Disks
TPC-C
15
Simulator Verification
• Compare the throughput curve
• Compare the run-time statistics– Hit ratio– Dirty page percentage
• Test the effect of parameters– Dirty page threshold– Number of page cleaners
16
Simulator Verification— Similar Throughput Curve
17
Page Distribution of Buffer Pool
18
I/O Activities of the Buffer Pool
19
Simulation Results Under Default Configuration
• Page cleaners cannot clean out pages fast enough
under the default configuration (2 page cleaners)
• Too many dirty pages (87%) in the buffer pool
under the default configuration
• The existence of too many dirty pages lowers the
buffer pool hit ratio and performance
20
Effect of More Page Cleaners
21
IO Activities Under More Page Cleaners
22
Effect of Number of Page Cleaners
23
Effect of Buffer Pool Parameters
• Threshold cannot affect performance when the number of page cleaners is small
• Setting an appropriate number of page cleaners is important to performance
• Appropriate number of page cleaners are different for different workloads
24
Future work• Gain more understanding of the buffer pool
algorithm from the simulator and DB2
• Extend the work to a much larger TPC-C database
• Investigate alternative algorithms of the buffer pool management algorithm which are easier to be managed and tuned
• Test the alternative algorithms first in the simulator and then in the real system
25
Questions?