Advanced Topics in Databases

Advanced Topics in Databases

Fahime Raja

Niloofar Razavi

Melody Siadaty

Summer 2005

Technical Report II : Main memory Databases

Outline:

Introduction MMDB vs. DRDB

Storage Issues Reliability Issues Data Structures

Concurrency Control Commit Processing Data Representation Access Methods

T-tree definition & structure Query Processing Recovery

Introduction:

In a Main Memory DBMS (MMDBMS) there is no central role for I/O management.

Still, an important question in a MMDBMS is how to do transactions and recovery in an efficient way: Some of the proposed algorithms assume that a (small) stable subset

of the main memory exists, a piece of memory whose content will not be lost in a power outage through

a battery backup. These stable memories can be used to place, e.g., a redo log.

Others do not assume stable memories, and still use I/O to write transaction information to stable storage. These algorithms hence do not eliminate I/O, but minimize it, as the critical

path in a MMDBMS transaction only needs to write the log; not data pages from the buffer manager.

Introduction:

Motivations: Requirement of short Access/Response time Telecommunication applications Applications handling high traffic of data, e.g. Router. Real-Time applications

End of popularity of MMDBMS techniques in the early 1990s

MMDBMS were thereafter only considered of specific interest to real-time database applications, like above

Main Memory Databases Vs.Disk Resident Databases M : the primary copy of data live permanently in

main memory D: the primary copy of data is permanently disk

resident M: there can be a backup copy ,resident on disk

D: data can be temporarily cached in main memory for access speed up.

Main Memory Databases Vs.Disk Resident Databases Storage issues:

Access time of MM, orders of magnitude less than for disks (100 nsec vs. 10 msec)

MMDBMS footprints range between 200 KB and 2 MB MM is normally volatile

stable memory still expensive! Disks have high fixed dcost per access independent of the amount

of the retrieved data (Block Oriented) MM does not care of sequential access MM data are more vulnerable to software errors, since they can

be directly accessed by the processor.

Main Memory Databases Vs.Disk Resident Databases Reliability issues:

even if special hardware can enhance MM reliability, periodic backup is necessary

MM content is lost if system crashes If a single memory board fails, the entire machine must

be powered down loosing all data!

Whatever power backup for main memory is, in turn, is less reliable than passive magnetic media.

Main Memory Databases Vs.Disk Resident Databases Data Structures:

MMDB are not DRDB with a very large cache!

DRDB: Cached data are accessed through

indexes designed for disk access. Access is made through a buffer

manager ,which, given the disk address, checks if the relevant block is in the MM-Cache and then copies it to the MM application working area .

MMDB: data are accessed by directly

referring to their memory address. The following pictures illustrate this fact.

API for DRDB

API for MMDB

MMDBMS Concurrency Control:

Lock duration is short : reduced contention The advantage of small locking granules (fields or records) (when data

are memory resident) ,is removed, Very large lock granules (e.g., relations) are most appropriate (up to

the entire database) for memory resident data.

Serial transaction processing almost eliminate the need of concurrency control highly reduces cache flushes.

CC is still necessary when, Mixed length transactions coexist A multiprocessor system shares the DB among different units.

MMDBMS Concurrency Control:

Implementation : Traditional:

a hash table that contains entries for the objects currently locked . No lock information attached to data

MMDB: Stuff some small number of locking information to data. the first bit is set the object is locked, If it is locked and the second bit is set,

there are one or more waiting transaction else it is free.

MMDBMS Commit Processing necessary to have a backup copy and to keep a log of

transaction activity Before a transaction can commit, its activity records must

be written to the log. Logging can impact response time, since each transaction

must wait for at least one stable write before committing. Logging can also affect throughput if the log becomes a

bottleneck In MMDBs, the logging represents the only disk

operation each transaction will require.

MMDBMS Commit Processing Typical log length 400 bytes

40 bytes for Begin/End 360 bytes for Old/New values

stable the log tail (< 100 pages ) in a small amount of stable main memory Reduces response time Doesn’t affect bottlenecks

Pre-committing: releases transaction’s locks as soon as its log record is placed in

the log, without waiting for the information to be propagated to the disk.

Doesn’t affect serialization Because the log is sequential

MMDBMS Commit Processing

Doesn’t reduce response time Enhances concurrency

Response time of others.

Group committing: accumulates enough commit records to fill up a log

page and then flushes it to disk Reduces the total # of disk accesses can be used to relieve a log bottleneck .

MMDB Data Representation:

Relational data are usually represented as flat files Tuples are stored sequentially Attribute values are embedded in the tuples. Access is local

Space consuming due to duplicate values Inefficient due to sequentially of computations Need of indexes :

Relational tuples can be represented as a set of pointers to data values. use of pointers is space efficient when large values appear multiple

times in the database, (the actual value needs to only be stored once.)

Pointers also simplify the handling of variable length fields since variable length data can be represented using pointers into a heap.

MMDB Access Methods:

Hashing: Fast lookup & updating Not space efficient Doesn’t support range queries

Tree indexing: With a single pointer get access both to an attribute

value and the entire tuple Pointers are of fixed short lengths Suited for range queries


The T-Tree : Is a data structure whose ancestors are B-Trees and AVL-trees It is binary like AVL-trees

search is essentially binary A T-node contains many elements like B-tree

storage & update are done efficiently Insertions and deletions usually move data between a single node

like in B-trees Rebalancing is done by node rotation

Like in AVL-trees, but much less frequent.


T-Tree

MMDB Query Processing:

Query processing for DRDB focus on reducing disk access costs

Query processing for MMDB must focus on reducing processing costs Operation costs vary from system to system No general optimization technique

Implementation for relational operators should benefit of main memory data 7 index representation Nested loop-join is preferred to sort-merge join:

Although the sorted relations could be represented easily in a main memory database using pointer lists, there is really no need for this since much of the motivation for sorting is already lost .

MMDBMS Backup and Recovery: Backups of memory resident databases must be

maintained on disk or other stable storage to insure against loss of the volatile data .

1.the procedure used during normal database operation to keep the backup up-to-date (Checkpointing),

2.the procedure used to recover from a failure. (Failure Recovery) loading blocks of the database “on demand” until all of the

data has been loaded. using disk striping or disk array

there must be independent paths from the disks to memory

MMDBMS Backup and Recovery:3 major steps in a recovery process:1. Logging log information can be recorded at two different levels of

abstraction, physical logging: records, for each update, the state of the modified database

and the physical location (such as page id and offset) to which the update is applied.

logical logging: records the state transition of the database and the logical location affected by the update.

The Write Ahead Logging (WAL) is the most popular and accepted type of logging rules

MMDBMS Backup and Recovery:2.Checkpointing MMDB checkpointing approaches fall into three

categories: non-fuzzy checkpoint

the checkpointer first obtains locks on the data items to be checkpointed so that consistency of the database is preserved

fuzzy checkpoint the checkpointer does not secure any locks. To synchronize with

normal transactions, the checkpointer writes a record to the log after a checkpoint is completed. In addition, the database is generally dumped in segments.

log-driven checkpoint the checkpointer applies the log to a previous dump to generate a

new dump rather than dumping from the main-memory database.

MMDBMS Backup and Recovery:3. Reloading is the last and the one of the most important parts of the recovery

process. Reloading must ensure that the database is consistent after a crash. The database is loaded from the last checkpointed image and the undo

and redo operations are applied to restore it to the most recent consistent state.

The existing reloading schemes can be divided into two classes: simple reloading

the system does not resume its normal operation until the entire database is brought back into the main memory

concurrent reloading reload activities and transaction processing are performed in parallel.

Documents

Advanced Topics in Databases